This function perform a selection of the cytosine sites carrying the potential methylation signal. The potential signals from controls and treatments are used as prior classification in further step of signal detection.

getPotentialDIMP(
LR,
nlms = NULL,
div.col,
dist.name = "Weibull2P",
absolute = FALSE,
alpha = 0.05,
pval.col = NULL,
tv.col = NULL,
tv.cut = NULL,
idv.col = NULL,
idv.cut = NULL,
min.coverage = NULL,
hdiv.col = NULL,
hdiv.cut = NULL,
)

## Arguments

LR

An object from 'InfDiv' or 'testDMP' class. These objects are previously obtained with function estimateDivergence or FisherTest.

nlms

A list of distribution fitted models (output of gofReport function) or NULL. If NULL, then empirical cumulative distribution function is used to get the potential DMPs.

div.col

Column number for divergence variable is located in the meta-column.

dist.name

Name of the fitted distribution. This could be the name of one distribution or a characters vector of length(nlms). Default is the two parameters Weibull distribution: 'Weibull2P'. The available options are:

"Weibull2P"

Weibull with two-parameters.

"Weibull3P"

Weibull with three-parameters.

"Gamma2P"

Gamma with two-parameters.

"Gamma3P"

Gamma with three-parameters.

"GGamma3P"

Generalized gamma with three-parameters.

"GGamma4P"

Generalized gamma with four-parameters.

"ECDF"

The empirical cumulative distribution function.

"None"

No distribution.

If dist.name != 'None', and nlms != NULL, then a column named 'wprob' with a probability vector derived from the application of model 'nlms' will be returned.

absolute

Logic (default, FALSE). Total variation (TV, the difference of methylation levels) is normally an output in the downstream MethylIT analysis. If 'absolute = TRUE', then TV is transformed into |TV|, which is an information divergence that can be fitted to Weibull or to Generalized Gamma distribution. So, if the nonlinear fit was performed for |TV|, then absolute must be set to TRUE.

alpha

A numerical value (usually $$\alpha \leq 0.05$$) used to select cytosine sites $$k$$ with information divergence ($$DIV_k$$) for which the the probabilities hold: $$P(DIV_k > DIV(\alpha))$$.

pval.col

An integer denoting a column from each GRanges object from LR where p-values are provided when dist.name == 'None' and nlms == NULL. Default is NULL. If NUll and dist.name == 'None' and nlms == NULL, then a column named adj.pval will used to select the potential DMPs.

tv.col

Column number for the total variation to be used for filtering cytosine positions (if provided).

tv.cut

If tv.cut and tv.col are provided, then cytosine sites k with $$abs(TV_k) < tv.cut$$ are removed before to perform the ROC analysis.

min.coverage

Cytosine sites with coverage less than min.coverage are discarded. Default: 0

hdiv.col

Optional. A column number for the Hellinger distance to be used for filtering cytosine positions. Default is NULL.

hdiv.cut

If hdiv.cut and hdiv.col are provided, then cytosine sites $$k$$ with hdiv < hdiv.cut are removed.

method used to adjust the p-values from other approaches like Fisher's exact test, which involve multiple comparisons Default is NULL. Do not apply it when a probability distribution model is used (when nlms is given), since it makes not sense.

idiv.col

Optional. A column number for any of the available information divergences: $$TV, bay.TV, hdiv, or jdiv$$. used for filtering cytosine positions. Default is NULL.

idiv.cut

If hdiv.cut and hdiv.col are provided, then cytosine sites $$k$$ with hdiv < hdiv.cut are removed.

## Value

A list of GRanges objects, each GRanges object carrying the selected cytosine sites and the probabilities that the specified divergence values can be greater than the critical value specified by $$\alpha$$: $$P(DIV_k > DIV(\alpha))$$.

## Details

The potential signals are cytosine sites k with information divergence (DIV_k) values greater than the DIV(alpha = 0.05). The value of alpha can be specified. For example, potential signals with DIV_k > DIV(alpha = 0.01) can be selected. For each sample, cytosine sites are selected based on the corresponding nonlinear fitted distribution model that has been supplied.

## Examples

## Get a dataset of Hellinger divergency of methylation levels and their
## corresponding best nonlinear fit distribution models from the package
data(HD, gof)
PS <- getPotentialDIMP(LR = HD, nlms = gof$nlms, dist.name = gof$bestModel,
div.col = 9L, alpha = 0.05)