divTest {MethylIT.utils} | R Documentation |
Generalized Linear Model for group comparison of information
divergence variables yielded by function
estimateDivergence
from MethylIT.utils R package output.
Basically, this a wrapping function to perform the fitting of generalized
linear models with glm
from 'stats' package to any
variable of interest given in GRanges objects of
estimateDivergence
output.
divTest(GR, control.names, treatment.names, glm.family = Gamma(link = "log"), var.weights = FALSE, weights = NULL, varFilter = 0, meanFilter = 0, FilterLog2FC = TRUE, Minlog2FC = 1, divPerBp = 0.001, minInd = 2, pAdjustMethod = NULL, scaling = 1L, pvalCutOff = 0.05, saveAll = FALSE, num.cores = 1, tasks = 0L, verbose = TRUE, ...)
GR |
GRanges objects including control and treatment samples containing an information divergence of methylation levels. The names for each column must coincide with the names given for parameters: 'control.names' and 'treatment.names'. |
control.names |
Names/IDs of the control samples, which must be included in the variable GR in a metacolumn. |
treatment.names |
Names/IDs of the treatment samples, which must be included in the variable GR in a metacolumn. |
glm.family, link |
Parameter to be passed to function
|
var.weights |
Logical (default: FALSE). Whether to use group variances as weights. |
weights |
An optional list of two numeric vectors of ‘prior weights’ to be used in the fitting process. One vector of weights for the control and one for the treatment. Each vector with length equal to length(GR) (default: NULL). Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions). |
varFilter |
Numeric (default: 0). GLM will be performed only for those rows (ranges denoting genomic regions) where the group variance is greater the number specified by varFilter. |
meanFilter |
Numeric (default: 0). GLM will be performed only for those rows (ranges denoting genomic regions) where the absolute difference of group means is greater the number specified by meanFilter. |
FilterLog2FC |
if TRUE, the results are filtered using the minimun absolute value of log2FoldChanges observed to accept that a gene in the treatment is differentially expressed in respect to the control. |
Minlog2FC |
minimum logarithm base 2 of fold changes |
divPerBp |
At least for one group the mean divergence per bp must be equal to or greater than 'divPerBp' (default divPerBp = 0.001). |
minInd |
Integer (Default: 2). At least one group must have 'minInd' individuals with a divergence value greater than zero. |
pAdjustMethod |
Method used to adjust the results; default: "NULL"
(see |
scaling |
integer (default 1). Scaling factor estimate the signal density as: scaling * "DIMP-Count-Per-Bp". For example, if scaling = 1000, then signal density denotes the number of DIMPs in 1000 bp. |
pvalCutOff |
cutoff used then a p-value adjustment is performed |
saveAll |
if TRUE all the temporal results that passed filters 'varFilter' and are 'meanFilter' returned. If FALSE, only the comparisons that passed filters 'varFilter', 'meanFilter', and pvalue < pvalCutOff or adj.pvalue < pvalCutOff (if pAdjustMethod is not NULL) are returned. |
num.cores |
The number of cores to use, i.e. at most how many child
processes will be run simultaneously (see
|
tasks |
integer(1). The number of tasks per job. Value must be a scalar
integer >= 0L. In this documentation a job is defined as a single call to
a function, such as bplapply, bpmapply etc. A task is the division of the
X argument into chunks. When tasks == 0 (default), X is divided as evenly
as possible over the number of workers (see
|
verbose |
if TRUE, prints the function log to stdout |
... |
Additional parameters passed to |
The default parameter setting glm.family = Gamma(link = "log") is
thought to perform the group comparison of the sums of absolute
differences of methylation levels (total variation distance (TVD) at
gene-body DIMPs on DMGs). The sums of Hellinger divergence (HD, at
gene-body DIMPs on DMGs) can be tested with this setting as well. Both
TVD and HD follow asymptotic Chi-square distribution and, consequently,
so do the sum of TVD and the sum of HD. The Chi-square distribution is
a particular case of Gamma distribution:
f(x|a,s) = 1/(s^a Gamma(a)) x^(a-1) e^-(x/s)
Chi-square density is derived after replacing a = n/2 and s = 2:
f(x|n) = 1/(2^(n/2) Gamma(n/2)) x^(n/2-1) e^(-x/2)
The original GRanges object with the columns "beta", "log2FC", "pvalue", "adj.pval" (if pAdjustMethod requested), "CT.divPerBp" and "TT.divPerBp" (divergence per base pairs), and "divPerBpVariation added.
## Gene annotation genes <- GRanges(seqnames = "1", ranges = IRanges(start = c(3631, 6788, 11649), end = c(5899, 9130, 13714)), strand = c("+", "-", "-")) mcols(genes) <- data.frame(gene_id = c("AT1G01010", "AT1G01020", "AT1G01030")) # === The number of cytosine sites to generate === sites = 11001 # == Set a seed for pseudo-random number generation === set.seed(123) alpha.ct <- 0.09 alpha.tt <- 0.2 # === Simulate samples === ref = simulateCounts(num.samples = 2, sites = sites, alpha = alpha.ct, beta = 0.5, size = 50, theta = 4.5, sample.ids = "R1") # Control group ctrl = simulateCounts(num.samples = 2, sites = sites, alpha = alpha.ct, beta = 0.5, size = 50, theta = 4.5, sample.ids = c("C1", "C2")) # Treatment group treat = simulateCounts(num.samples = 2, sites = sites, alpha = alpha.tt, beta = 0.5, size = 50, theta = 4.5, sample.ids = c("T1", "T2")) # === Estime Divergences === HD = estimateDivergence(ref = ref$R1, indiv = c(ctrl, treat), Bayesian = TRUE, num.cores = 1L, percentile = 1, verbose = FALSE) nlms <- nonlinearFitDist(HD, column = 4, verbose = FALSE) ## Next, the potential signal can be estimated PS <- getPotentialDIMP(LR = HD, nlms = nlms, div.col = 4, alpha = 0.05) ## The cutpoint estimation used to discriminate the signal from the noise cutpoints <- estimateCutPoint(PS, control.names = c("C1", "C2"), treatment.names = c("T1", "T2"), div.col = 4, verbose = FALSE) ## DIMPs are selected using the cupoints DIMPs <- selectDIMP(PS, div.col = 9, cutpoint = min(cutpoints$cutpoint)) ## Finally DIMPs statistics genes tv_DIMPs = getGRegionsStat(GR = DIMPs, grfeatures = genes, stat = "sum", absolute = TRUE, column = 7L) GR_tv_DIMP = uniqueGRanges(tv_DIMPs, type = "equal", chromosomes = "1") colnames(mcols(GR_tv_DIMP)) <- c("C1", "C2", "T1", "T2") res <- divTest(GR=GR_tv_DIMP, control.names = c("C1", "C2"), treatment.names = c("T1", "T2"))