selectDIMP {MethylIT}R Documentation

Selection of DIMPs


For a given cutpoint (previously estimated with the function estimateCutPoint), 'selectDIMP' will return the differentially informative methyated positions (DIMPs). DIMPs are cytosine positions for which the divergence is greater than the cutpoint.


selectDIMP(LR, div.col = NULL, pval.col = NULL, absolute = FALSE,
  cutpoint, tv.col = NULL, tv.cut = NULL)



A list of GRanges objects including control and treatment GRanges. Each GRanges object must correspond to a sample. For example, if a sample is named 's1', then this sample can be accessed in the list of GRanges objects as LR$s1.


Number of the column where the divergence variable (i.e., Hellinger divergence) is located in the GRanges meta-columns.


If the cutpoints is a p-value, then the column number for p-values should be provided. Default: NULL. Notice that one of the parameter values div.col or pval.col must be given.


Logic (default, FALSE). Total variation (TV, the difference of methylation levels) is normally an output in the downstream MethylIT analysis. If 'absolute = TRUE', then TV is tranformed into |TV|, which is an information divergence that can be fitted to Weibull or to Generalized Gamma distribution. So, if the nonlinear fit was performed for |TV|, then here absolute must be set to TRUE.


Cutpoint to select DIMPs. Cytosine positions with divergence greater than 'cutpoint' will selected as DIMPs. Cutpoints are estimated with the function 'estimateCutPoint'.


Column number for the total variation to be used for filtering cytosine positions (if provided).


If tv.cut and tv.col are provided, then cytosine sites k with abs(TV_k) < tv.cut are removed.


Theoretically a DIMP denotes a cytosine position with high probability to be differentially methylated. That is, in the statistical molecular-biophysics context, a DIMP must be considered only in a probabilistic term and not as an absolute deterministic experimental output.

The uncertainty and dynamics of the DNA methylation process, the continuous action of the omnipresent thermal fluctuations, as well as, the inherent stochasticity of the biochemical reactions make it impossible to ensure whether a specific cytosine position is methylated in an absolutely deterministic sense. Notice that the concept of DIMP is not applicable to a single cell (if we use an instrumentation/protocol to directly measure methylation at the molecular level, and not via PCR), since a concrete, single DNA cytosine position in a single cell is methylated or not methylated.

However, when pooling DNA extracted from a tissue, the previous reasonings about uncertainty hold plus an additional uncertainty factor: cells from the same tissue are not synchronized but are found in different stages of their ontogenetic developments. Hence, the DIMP concept holds in the mentioned circumstances where the uncertainty of methylation is present.


A list of GRanges containing only differentially informative position (DIMPs).


num.points <- 1000
HD <- GRangesList(
    sample1 = makeGRangesFromDataFrame(
        data.frame(chr = "chr1", start = 1:num.points, end = 1:num.points,
                strand = '*',
                hdiv = rweibull(1:num.points, shape = 0.75, scale = 1)),
        keep.extra.columns = TRUE),
    sample2 = makeGRangesFromDataFrame(
        data.frame(chr = "chr1", start = 1:num.points, end = 1:num.points,
                strand = '*',
                hdiv = rweibull(1:num.points, shape = 0.75, scale = 1)),
        keep.extra.columns = TRUE))

nlms <- nonlinearFitDist(HD, column = 1, verbose = FALSE)

PS <- getPotentialDIMP(LR = HD, nlms = nlms, div.col = 1, alpha = 0.05)
cutpoints <- estimateCutPoint(PS, control.names = "sample1",
                            treatment.names = c("sample2"),
                            div.col = 1, verbose = FALSE)
DIMPs <- selectDIMP(PS, div.col = 1, cutpoint = cutpoints$cutpoint$sample1)

[Package MethylIT version 0.3.1 ]