uniqueGRfilterByCov {MethylIT}R Documentation

Unique GRanges of methylation read counts filtered by coverages

Description

Given two GRanges objects, samples '1' and '2', this function will filter by coverage each cytosine site from each GRanges object.

Usage

uniqueGRfilterByCov(x, y = NULL, min.coverage = 4, min.meth = 0,
  min.umeth = 0, percentile = 0.9999, high.coverage = NULL,
  columns = c(mC = 1, uC = 2), num.cores = 1L, ignore.strand = FALSE,
  tasks = 0L, verbose = TRUE, ...)

Arguments

x

An object from the classes 'GRanges', 'InfDiv', or 'pDMP' with methylated and unmethylated counts in its meta-column. If the argument 'y' is not given, then it is assumed that the first four columns of the GRanges metadata 'x' are counts: methylated and unmethylated counts for samples '1' and '2'.

y

A GRanges object with methylated and unmethylated counts in its meta-column. Default is NULL. If x is a 'InfDiv', or 'pDMP', then 'y' is not needed, since samples '1' and '2' are the first four columns of these objects.

min.coverage

An integer or an integer vector of length 2. Cytosine sites where the coverage in both samples, 'x' and 'y', are less than 'min.coverage' are discarded. The cytosine site is preserved, however, if the coverage is greater than 'min.coverage'in at least one sample. If 'min.coverage' is an integer vector, then the corresponding min coverage is applied to each sample.

min.meth

An integer or an integer vector of length 2. Cytosine sites where the numbers of read counts of methylated cytosine in both samples, '1' and '2', are less than 'min.meth' are discarded. If 'min.meth' is an integer vector, then the corresponding min number of reads is applied to each sample.

min.umeth

An integer or an integer vector of length 2. Min number of reads to consider cytosine position. Specifically cytosine positions where (uC <= min.umeth) & (mC > 0) & (mC < min.meth) hold will be removed, where mC and uC stand for the numbers of methylated and unmethylated reads. Default is min.umeth = 0.

percentile

Threshold to remove the outliers from each file and all files stacked.

high.coverage

An integer for read counts. Cytosine sites having higher coverage than this are discarded.

columns

Vector of integer numbers of the columns (from each GRanges meta-column) where the methylated and unmethylated counts are provided. If not provided, then the methylated and unmethylated counts are assumed to be at columns 1 and 2, respectively.

num.cores

The number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply function from BiocParallel package).

ignore.strand

When set to TRUE, the strand information is ignored in the overlap calculations. Default value: TRUE

tasks

Integer(1). The number of tasks per job. value must be a scalar integer >= 0L. In this documentation a job is defined as a single call to a function, such as bplapply, bpmapply etc. A task is the division of the X argument into chunks. When tasks == 0 (default), X is divided as evenly as possible over the number of workers (see MulticoreParam from BiocParallel package).

verbose

if TRUE, prints the function log to stdout

...

Additional parameters for 'uniqueGRanges' function.

Details

Cytosine sites with 'coverage' > 'min.coverage' in at least one of the samples are preserved. Positions with 'coverage' < 'min.coverage' in both samples, 'x' and 'y', are removed. Positions with 'coverage' > 'percentile' (e.g., 99.9 percentile) are removed as well. It is expected that the columns of methylated and unmethylated counts are given.

Value

A GRanges object with the columns of methylated and unmethylated counts filtered for each cytosine position.

Examples

df1 <- data.frame(chr = "chr1", start = 11:16, end = 11:16,
                  mC = c(2,10,7,9,1,10), uC = c(30,20,4,8,0,10))
df2 <- data.frame(chr = "chr1", start = 12:18, end = 12:18,
                  mC2 = 1:7, uC2 = 0:6)
gr1 <- makeGRangesFromDataFrame(df1, keep.extra.columns = TRUE)
gr2 <- makeGRangesFromDataFrame(df2, keep.extra.columns = TRUE)

## Filtering
r1 <- uniqueGRfilterByCov(gr1, gr2, ignore.strand = TRUE)
r1
## Cytosine position with coordinates 12 & 15 (rows #2 & #5) can pass the
## filtering conditions of min.coverage = 4 and lead to meaningless
## situations with methylation levels p = 1/(1 + 0) = 1
r1[c(2,5)]

## The last situation can be prevent, in this case, by setting min.meth = 1:
r1 <- uniqueGRfilterByCov(gr1, gr2, min.meth = 1, ignore.strand = TRUE)
r1


[Package MethylIT version 0.3.1 ]