rmstGR {MethylIT.utils} | R Documentation |
Count data in MethylIT.utils pipeline is carried in GRanges objects.
This function provides a shortcut to apply the parametric Bootstrap
of 2x2 Contingency independence test, which is implemented in function
bootstrap2x2
. The goodness of fit statistic is
the root-mean-square statistic (RMST) or Hellinger divergence, as
proposed by Perkins et al. [1, 2]. Hellinger divergence (HD) is computed
as proposed in [3].
rmstGR(LR, count.col = 1:2, control.names = NULL, treatment.names = NULL, stat = "rmst", pooling.stat = "sum", tv.cut = NULL, hdiv.cut = NULL, hdiv.col = NULL, num.permut = 100, pAdjustMethod = "BH", pvalCutOff = 0.05, saveAll = FALSE, num.cores = 1L, tasks = 0L, verbose = TRUE, ...)
LR |
A list of GRanges, a GRangesList, a CompressedGRangesList object. Each GRanges object from the list must have two columns: methylated (mC) and unmethylated (uC) counts. The name of each element from the list must coincide with a control or a treatment name. |
count.col |
2d-vector of integers with the indexes of the read count
columns. If not given, then it is asssumed that the methylated and
unmethylated read counts are located in columns 1 and 2 of each GRanges
metacolumns. If object LR is the output of Methyl-IT function
|
control.names, treatment.names |
Names/IDs of the control samples, which must be included in the variable GR at the metacolumn. Default is NULL. If NULL, then it is assumed that each GRanges object in LR has four columns of counts. The first two columns correspond to the methylated and unmethylated counts from control/reference and the other two columns are the methylated and unmethylated counts from treatment, respectively. |
stat |
Statistic to be used in the testing: 'rmst' (root mean square test) or 'hdiv' (Hellinger divergence test). |
pooling.stat |
statistic used to estimate the methylation pool: row sum, row mean or row median of methylated and unmethylated read counts across individuals. If the number of control samples is greater than 2 and pooling.stat is not NULL, then they will pooled. The same for treatment. Otherwise, all the pairwise comparisons will be done. |
tv.cut |
A cutoff for the total variation distance (TVD; absolute value of methylation levels differences) estimated at each site/range as the difference of the group means of methylation levels. If tv.cut is provided, then sites/ranges k with abs(TV_k) < tv.cut are removed before to perform the regression analysis. Its value must be NULL or a number 0 < tv.cut < 1. |
hdiv.cut |
An optional cutoff for the Hellinger divergence (*hdiv*). If
the LR object derives from the previous application of function
|
hdiv.col |
Optional. Columns where *hdiv* values are located in each GRange object from LR. It must be provided if together with *hdiv.cut*. Default is NULL. |
num.permut |
Number of permutations. |
pAdjustMethod |
method used to adjust the results; default: BH |
pvalCutOff |
cutoff used when a p-value adjustment is performed |
saveAll |
if TRUE all the temporal results are returned |
num.cores, tasks |
Paramaters for parallele computation using package
|
verbose |
if TRUE, prints the function log to stdout |
... |
Additional parameters for function
|
Samples from each group are pooled according to the statistic selected (see parameter pooling.stat) and a unique GRanges object is created with the methylated and unmethylated read counts for each group (control and treatment) in the metacolumn. So, a contingency table can be built for range from GRanges object.
A GRanges object with the original sample counts, bootstrap p-value probability, total variation (difference of methylation levels), and p-value adjusment.
Perkins W, Tygert M, Ward R. Chi-square and Classical Exact Tests Often Wildly Misreport Significance; the Remedy Lies in Computers. [Internet]. Uploaded to ArXiv. 2011. Report No.: arXiv:1108.4126v2.
Perkins, W., Tygert, M. & Ward, R. Computing the confidence levels for a root-mean square test of goodness-of-fit. 217, 9072-9084 (2011).
Basu, A., Mandal, A. & Pardo, L. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat. Probab. Lett. 80, 206-214 (2010).
FisherTest
#' A list of GRanges set.seed(123) sites = 15 data <- list( C1 = data.frame(chr = "chr1", start = 1:sites, end = 1:sites,strand = '*', mC = rnbinom(size = 8, mu = 3, n = sites), uC = rnbinom(size = 50, mu = 10, n = sites)), C2 = data.frame(chr = "chr1", start = 1:sites, end = 1:sites, strand = '*', mC = rnbinom(size = 8, mu = 3, n = sites), uC = rnbinom(size = 50, mu = 10, n = sites)), T1 = data.frame(chr = "chr1", start = 1:sites, end = 1:sites,strand = '*', mC = rnbinom(size = 50, mu = 10, n = sites), uC = rnbinom(size = 10, mu = 10, n = sites)), T2 = data.frame(chr = "chr1", start = 1:sites, end = 1:sites, strand = '*', mC = rnbinom(size = 50, mu = 10, n = sites), uC = rnbinom(size = 5, mu = 10, n = sites))) #' Transforming the list of data frames into a list of GRanges objects data = lapply(data, function(x) makeGRangesFromDataFrame(x, keep.extra.columns = TRUE)) rmstGR(LR = data, control.names = c("C1", "C2"), treatment.names = c("T1", "T2"), tv.cut = 0.25, num.permut = 100, pAdjustMethod="BH", pvalCutOff = 0.05, num.cores = 4L, verbose=TRUE)