DMP clustering — dmpClusters • MethylIT

Given a 'pDMP' (or 'InfDiv') object carrying DMPs (methylated cytosines) detected in Methyl-IT downstream analysis, function 'dmpClusters' build clusters of DMPs, which can be further tested to identify differentially methylated regions (DMRs) with countTest2 function.

dmpClusters(
  GR,
  maxDist = 3,
  minNumDMPs = 1,
  maxClustDist = NULL,
  method = c("relaxed", "fixed.int"),
  chromosomes = NULL,
  columns = 9L,
  ignore.strand = TRUE,
  num.cores = detectCores() - 1,
  tasks = 0L,
  verbose = TRUE,
  ...
)

Arguments

GR

An object from 'pDMP' class, which is returned by selectDIMP function.

maxDist

maximum distance at which two reported bases sites from the same cluster can be separated. Default: \(maxDist = 3\).

minNumDMPs

Minimum number of DMPs inside of each cluster. Default: \(minNumDMPs = 1\).

maxClustDist

Clusters separated by a distance lesser than maxClustDist' positions are merged. Default: \(maxClustDist = NULL\). If \(0 < maxClustDist < maxDist\) or \(maxClustDist = NULL\), then maxClustDist will be recalculated as \(maxClustDist = maxDist + 1\).

method

Two different approaches are implemented to clustering DMPs:

"relaxed":: DMP ranges which are separated by a distance less than 'maxClustDist' are merged and ranges with less than 'minNumDMPs' are removed.
"fixed.int":: It will generate a partition where the distance between consecutive DMPs is not greater than 'maxDist'. Ranges with less than 'minNumDMPs' are removed. If, additionally, a value maxClustDist > 0 is provided, then the "relaxed" approach is applied to the ranges from the first step.

chromosomes

vector of characters labeling the chromosomes included in the analysis. Default: chromosomes = NULL (all chromosomes are included).

columns

An integer number corresponding to the specific column(s) to use from the meta-column of each GRanges. Default values is 9 (the column carrying to the Hellinger divergence values).

ignore.strand

Same as in findOverlaps-methods.

num.cores, tasks

integer(1). The number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply function from BiocParallel package).The number of tasks per job. value must be a scalar integer >= 0L (see MulticoreParam from BiocParallel package).

verbose

if TRUE, prints the function log to stdout.

...

Further parameters for uniqueGRanges function.

Value

A GRanges object with the numbers of positions inside each cluster, where DMPs were reported in at least one of the samples.

Details

Two algorithmic approaches are implemented, named: "relaxed" and "fixed.int" (see the description of parameter 'method'). The "fixed.int" is mostly addressed to find specific methylation patterns, but the price is the number of DMRs found is lower.

The number of DMPs reported in each cluster corresponds to the numbers of sites inside the cluster where DMPs were found in at least one of the samples (from control or from treatment). That is, dmpClusters is only a tool to locate regions with high density of DMPs from all the samples. It does not detect DMRs. It is assumed that only DMP coordinates are given in the 'GR' object. That is, all the sites provided are considered in the computation.

References

Sanchez R, Mackenzie SA (2016) Information Thermodynamics of Cytosine DNA Methylation. PLOS ONE 11(3): e0150427. https://doi.org/10.1371/journal.pone.0150427

Author

Robersy Sanchez (https://genomaths.com).

Examples

## Get a dataset of dmps from the package
data(dmps)

## Build clusters of DMPs taking into account the DNA strand
x1 = dmpClusters(GR = dmps, maxDist = 7, minNumDMPs = 6,
                 method = "fixed.int", ignore.strand = FALSE,
                 verbose = FALSE)
#> Warning:   worker number limited by available socket connections, setting
#>   BiocParallel workers to 124 (was 255)
#> Warning:   worker number limited by available socket connections, setting
#>   BiocParallel workers to 124 (was 255)
data.frame(x1)
#>    seqnames start  end width strand dmps
#> 1         1  1208 1229    22      -    7
#> 2         1  2142 2167    26      +    6
#> 3         1  2423 2449    27      +    8
#> 4         1  3273 3305    33      -    6
#> 5         1  3405 3427    23      -    7
#> 6         1  4656 4688    33      +    8
#> 7         1  4758 4785    28      -    6
#> 8         1  5626 5645    20      -    6
#> 9         1  5890 5909    20      -    6
#> 10        1  5940 5973    34      -    7
#> 11        1  6226 6242    17      -    6
#> 12        1  7379 7405    27      +    6
#> 13        1  9530 9557    28      +    8
#> 14        1  9913 9935    23      +    6

if (FALSE) {
## Build clusters of DMPs ignoring DNA strand and maxClustDist = 7
x2 = dmpClusters(GR = dmps, maxDist = 7, minNumDMPs = 6,
                  maxClustDist = 7, method = "fixed.int",
                  num.cores=2L, ignore.strand = TRUE,
                  verbose = FALSE)
DataFrame(data.frame(x2))

## The relaxed approach with method = "relaxed"
x3 = dmpClusters(GR = dmps, minNumDMPs = 6, method = "relaxed",
                  maxClustDist = 10, ignore.strand = TRUE,
                  verbose = FALSE)
DataFrame(data.frame(x3))

## ==== Setting up the experiment design to test for DMRs ===
nams <- names(dmps)
dmps_at_clusters <- getDMPatRegions(GR = dmps, regions = x3,
                                    ignore.strand = TRUE)
dmps_at_clusters <- uniqueGRanges(dmps_at_clusters, columns = 2L,
                                ignore.strand = TRUE, type = 'equal',
                                verbose = FALSE)
colnames(mcols(dmps_at_clusters)) <- nams

colData <- data.frame(condition = factor(c('CT', 'CT', 'CT',
                                        'TT', 'TT', 'TT'),
                                        levels = c('CT', 'TT')),
                    nams, row.names = 2)

## Build a RangedGlmDataSet object
ds <- glmDataSet(GR = dmps_at_clusters, colData = colData)

## ================ Testing to detect DMRs ===========
dmrs <- countTest2(DS = ds, num.cores = 4L, minCountPerIndv = 4,
                maxGrpCV = c(1, 1), Minlog2FC = 0.5,
                CountPerBp = 0.001, test = 'LRT',
                verbose = TRUE)
dmrs
}