gammaMixtCut {MethylIT.utils} | R Documentation |
This functions estimates cutpoint value to classify DMPs into two classes: 1) from treatment and 2) from control, based on Mixtures of Gamma Distributions. The cutpoint estimations are limited to the analysis of mixture distributions of the form: F(x) = λ G(x) + (1 - λ) H(x), where λ \in [0, 1], G(x) and F(x) are the gamma cummulative distribution functions distributions followed by the information divergences estimated for individuals from control and treatment populations, respectively.
gammaMixtCut(LR, post.cut = 0.5, div.col = NULL, tv.col = NULL, tv.cut = 0.25, find.cut = FALSE, control.names = NULL, treatment.names = NULL, column = c(hdiv = FALSE, TV = FALSE, wprob = FALSE, pos = FALSE), classifier = c("logistic", "pca.logistic", "lda", "svm", "qda", "pca.lda", "pca.qda"), prop = 0.6, clas.perf = FALSE, cut.interval = c(0.5, 0.8), cut.incr = 0.01, stat = 1, maximize = TRUE, num.cores = 1L, tasks = 0L, tol = .Machine$double.eps^0.5, maxiter = 1000, ...)
LR |
A "pDMP"or "InfDiv" object obtained with functions
|
post.cut |
Posterior probability to dicide whether a DMPs belong to treatment group. Default *post.cut* = 0.5. |
div.col |
Column number for divergence of methylation levels used in the estimation of the cutpoints. Default: 9L (hdiv column from an InfDiv object). |
tv.col |
Column number where the total variation is located in the metadata from each GRanges object. |
tv.cut |
A cutoff for the total variation distance to be applied to each site/range. Only sites/ranges *k* with TVD_{k} > tv.cut are are used in the analysis. Its value must be a number 0 < tv.cut < 1. Default is tv.cut = 0.25. |
find.cut |
Logic. Wether to search for an optimal cutoff value to classify DMPs based on given specifications. |
control.names, treatment.names |
Optional. Names/IDs of the control and treatment samples, which must be include in the variable LR (default, NULL). However, these are required if any of the parameters *find.cut* or *clas.perf* are set TRUE. |
treatment.names |
Optional. Names/IDs of the treatment samples, which must be include in the variable LR (default, NULL). |
column |
a logical vector for column names for the predictor variables
to be used: Hellinger divergence "hdiv", total variation "TV",
probability of potential DIMP "wprob", and the relative cytosine site
position "pos" in respect to the chromosome where it is located. The
relative position is estimated as (x - x.min)/(x.max - x), where x.min
and x.max are the maximum and minimum for the corresponding chromosome,
repectively. If "wprob = TRUE", then Logarithm base-10 of "wprob" will
be used as predictor in place of "wprob" ( see
|
classifier |
Classification model to use. Option "logistic" applies a logistic regression model; option "lda" applies a Linear Discriminant Analysis (LDA); "qda" applies a Quadratic Discriminant Analysis (QDA), "pca.logistic" applies logistic regression model using the Principal Component (PCs) estimated with Principal Component Analysis (PCA) as predictor variables. pca.lda" applies LDA using PCs as predictor variables, and the option "pca.qda" applies a Quadratic Discriminant Analysis (QDA) using PCs as predictor variables. 'SVM' applies Support Vector Machines classifier from R package e1071. |
prop |
Proportion to split the dataset used in the logistic regression: group versus divergence (at DIMPs) into two subsets, training and testing. |
clas.perf |
Logic. Whether to return the classificaiton performance for the estimated cutpoint. Default, FALSE. |
cut.interval |
0 < *cut.interval* < 0.1. If *find.cut*= TRUE, the interval of treatment group posterior probabilities where to search for a cutpoint. Deafult *cut.interval* = c(0.5, 0.8). |
cut.incr |
0 < *cut.incr* < 0.1. If *find.cut*= TRUE, the sucesive increamental values runing on the interval *cut.interval*. Deafult, *cut.incr* = 0.01. |
stat |
An integer number indicating the statistic to be used in the testing when *find.cut* = TRUE. The mapping for statistic names are: 0 = "Accuracy", 1 = "Sensitivity", 2 = "Specificity", 3 = "Pos Pred Value", 4 = "Neg Pred Value", 5 = "Precision", 6 = "Recall", 7 = "F1", 8 = "Prevalence", 9 = "Detection Rate", 10 = "Detection Prevalence", 11 = "Balanced Accuracy", 12 = FDR. |
maximize |
Whether to maximize the performance indicator given in parameter 'stat'. Default: TRUE. |
num.cores, tasks |
Paramaters for parallele computation using package
|
... |
Additional arameters to pass to functions
|
After the estimation of potential DMPs, the pool of DMPs from control and treatment is assumed that follows mixtures of Gamma distributions corresponding to two populations. A posterior probability 2d-vector is estimated for each DMP. The cutpoint is determined from the intersection of the two gamma probabilities density distributions. That is, f(x) and g(x) are the estimatied densities for control and treatment groups, repectively, then the cutpoint is the values of x for which f(x) = g(x).
The Mixtures of Gamma Distributions (MGD) is estimated by using function
gammamixEM
from package *mixtools*. By default
function gammamixEM
produces returns a long list
including the posterior probability to belong to the treatment. Here, by
the sake of brevety only the information on the fitted model is given.
The posterior model probability can be retrieved by using *predict*
function. Accordign with MGD model, DMPs with a posterior probability to
belong to the treatment group greater than *post.cut = 0.5* is classified
as *DMP from treatment*. The post.cut can be modified. For all the cases
0 < post.cut < 1. The cutpoint and, hence, the classification
derived throught MGD model might differ from that provided throught
evaluateDIMPclass
function, which includes more information
about the DMP and, therefore, reports better performance. The
classification perfomance reported when *clas.perf* = TRUE or *find.cut*
= TRUE is created with function evaluateDIMPclass
for the
especified matchin learning model.
If parameter *find.cut = TRUE*, then a search for the best
cutpoint in a predifined inteval (*cut.interval*) is performed calling
function evaluateDIMPclass
.