getPotentialDIMP {MethylIT} | R Documentation |
This function perform a selection of the cytosine sites carrying the potential methylation signal. The potential signals from controls and treatments are used as prior classification in further step of signal detection.
getPotentialDIMP(LR, nlms = NULL, div.col, dist.name = "Weibull2P", absolute = FALSE, alpha = 0.05, pval.col = NULL, tv.col = NULL, tv.cut = NULL, min.coverage = NULL, hdiv.col = NULL, hdiv.cut = NULL, pAdjustMethod = NULL)
LR |
An object from 'InfDiv' or "testDMP" class. These objects are
previously obtained with function |
nlms |
A list of distribution fitted models (output of 'fitNonlinearWeibullDist' function) or NULL. If NULL, then empirical cumulative distribution function is used to get the potential DMPs. |
div.col |
Column number for divergence variable is located in the meta-column. |
dist.name |
Name of the fitted distribution. This could be the name of one distribution or a characters vector of length(nlms). Default is two paramaters Weibull distribution: "Weibull2P". The available options are Weibull three- parameters ("Weibull3P"), gamma with three-parameter ("Gamma3P"), gamma with two-parameter ("Gamma2P"), generalized gamma with three-parameter ("GGamma3P") or four-parameter ("GGamma4P"), the empirical cumulative distribution function ("ECDF") or "None". If dist.name != "None", and nlms != NULL, then a column named "wprob" with a probability vector derived from the application of model "nlms" will be returned. |
absolute |
Logic (default, FALSE). Total variation (TV, the difference of methylation levels) is normally an output in the downstream MethylIT analysis. If 'absolute = TRUE', then TV is transformed into |TV|, which is an information divergence that can be fitted to Weibull or to Generalized Gamma distribution. So, if the nonlinear fit was performed for |TV|, then absolute must be set to TRUE. |
alpha |
A numerical value (usually alpha < 0.05) used to select cytosine sites k with information divergence (DIV_k) for which Weibull probability P[DIV_k > DIV(alpha)]. |
pval.col |
An integer denoting a column from each GRanges object from LR where p-values are provided when dist.name == "None" and nlms == NULL. Default is NULL. If NUll and dist.name == "None" and nlms == NULL, then a column named adj.pval will used to select the potential DMPs. |
tv.col |
Column number for the total variation to be used for filtering cytosine positions (if provided). |
tv.cut |
If tv.cut and tv.col are provided, then cytosine sites k with abs(TV_k) < tv.cut are removed before to perform the ROC analysis. |
min.coverage |
Cytosine sites with coverage less than min.coverage are discarded. Default: 0 |
hdiv.col |
Optional. A column number for the Hellinger distance to be used for filtering cytosine positions. Default is NULL. |
hdiv.cut |
If hdiv.cut and hdiv.col are provided, then cytosine sites k with hdiv < hdiv.cut are removed. |
pAdjustMethod |
method used to adjust the p-values from other approaches like Fisher's exact test, which involve multiple comparisons Default is NULL. Do not apply it when a probability distribution model is used (when nlms is given), since it makes not sense. |
The potential signals are cytosine sites k with information divergence (DIV_k) values greater than the DIV(alpha = 0.05). The value of alpha can be specified. For example, potential signals with DIV_k > DIV(alpha = 0.01) can be selected. For each sample, cytosine sites are selected based on the corresponding nonlinear fitted distribution model that has been supplied.
A list of GRanges objects, each GRanges object carrying the selected cytosine sites and and the Weibull probability P[DIV_k > DIV(alpha)].
## Get a dataset of Hellinger divergency of methylation levels and their ## corresponding best nonlinear fit distribution models from the package data(HD, nlms) PS <- getPotentialDIMP(LR = HD, nlms = nlms, div.col = 9L, alpha = 0.05)