evaluateDIMPclass {MethylIT}  R Documentation 
For a given cutpoint (previously estimated with the function estimateCutPoint), 'evaluateDIMPclass' will return the evaluation of the classification of DMPs into two clases: DMPS from control and DMPs from treatment samples.
evaluateDIMPclass(LR, control.names, treatment.names, column = c(hdiv = FALSE, TV = FALSE, bay.TV = FALSE, wprob = FALSE, pos = FALSE), classifier = c("logistic", "pca.logistic", "lda", "qda", "pca.lda", "pca.qda"), pval.col = NULL, n.pc = 1, center = FALSE, scale = FALSE, interaction = NULL, output = "conf.mat", prop = 0.6, num.boot = 100, num.cores = 1L, tasks = 0L, seed = 1234, verbose = FALSE)
LR 
An object from 'pDMP' class. including control and treatment GRanges containing divergence values for each DMP in the metacolumn. LR is generated by the function 'selectDIMP' Each GRanges object must correspond to a sample. For example, if a sample is named 's1', then this sample can be accessed in the list of GRanges objects as LR$s1. 
control.names 
Names/IDs of the control samples, which must be include in thr variable LR. 
treatment.names 
Names/IDs of the treatment samples, which must be included in the variable LR. 
column 
a logical vector for column names for the predictor variables to be used: Hellinger divergence "hdiv", total variation "TV", TV estimated with Bayesian correction of methylation levels "bay.TV", probability of potential DMP "wprob", and the relative cytosine site position "pos" in respect to the chromosome where it is located. The relative position is estimated as (x  x.min)/(x.max  x), where x.min and x.max are the maximum and minimum for the corresponding chromosome, repectively. If "wprob = TRUE", then Logarithm base10 of "wprob" will be used as predictor in place of "wprob". 
classifier 
Classification model to use. Option "logistic" applies a logistic regression model; option "lda" applies a Linear Discriminant Analysis (LDA); "qda" applies a Quadratic Discriminant Analysis (QDA), "pca.logistic" applies logistic regression model using the Principal Component (PCs) estimated with Principal Component Analysis (PCA) as predictor variables. pca.lda" applies LDA using PCs as predictor variables, and the option "pca.qda" applies a Quadratic Discriminant Analysis (QDA) using PCs as predictor variables. 
pval.col 
Column number for pvalue used in the performance analysis and estimation of the cutpoints. Default: NULL. If NULL it is assumed that the column is named "wprob". 
n.pc 
Number of principal components (PCs) to use if the classifier is not 'logistic'. In the current case, the maximun number of PCs is 4. 
center 
A logical value indicating whether the variables should be shifted to be zero centered (same as in 'prcomp' prcomp). Only used if classifier = "pcaLDA". 
scale 
A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (same as in 'prcomp' prcomp). Only used if classifier = "pcaLDA". 
interaction 
Variable interactions to consider in a logistic regression model. Any pairwise combination of the variable "hdiv", "TV", "wprob", and "pos" can be provided. For example: "hdiv:TV", "wprob:pos", "wprob:TV", etc. 
output 
Type of output to request: output = c("conf.mat", "mc.val", "boot.all", "all"). See below. 
prop 
Proportion to split the dataset used in the logistic regression: group versus divergence (at DMPs) into two subsets, training and testing. 
num.boot 
Number of bootstrap validations to perform in the evaluation of the logistic regression: group versus divergence (at DMPs). 
num.cores, tasks 
Paramaters for parallele computation using package

seed 
Random seed used for random number generation. 
verbose 
if TRUE, prints the function log to stdout 
The regulatory methylation signal is also an output from a natural process that continuously takes place across the ontogenetic development of the organisms. So, we expect to see methylation signal on natural ordinary conditions. Here, to distinguish a control methylation signal from a treatment, three classification models are provided: 1) logistic, 2) Linear Discriminant Analysis (LDA) and 3) Quadratic Discriminant Analysis (QDA). In particular, four predictor variables can be used: Hellinger divergence "hdiv", total variation "TV", probability of potential DMP "wprob" and DMP genomic coordinated "pos". Principal component analysis (PCA) is used to convert a set of observations of possibly correlated predictor variables into a set of values of linearly uncorrelated variables (principal components, PCs). The PCs are used as new, uncorrelated predictor variables for LDA, QDA, and logistic classifiers.
A classification result with low accuracy and compromising values from other classification performance indicators (see below) suggest that the treatment does not induce a significant regulatory signal different from control.
output = "conf.mat" will perform a logistic regression group versus divergence (at DMPs) to evaluate the discrimination between controlDMPs and treatmentDMPs. The evaluation of this classification is provided through the function 'confusionMatrix' from R package 'caret'. "mc.val" will perform a 'num.boot'times Monte Carlo (bootstrap) validation and return a summary. By default function 'confusionMatrix' from R package caret' randomly splits the sample into two subsets, training and testing, according to the supplied proportion 'prop' (i.e., prop = 0.6). After selecting output = "mc.val", the function 'confusionMatrix' will be executed 'num.boot'times, each time performing a different random split of the sample. "boot.all" same as "mc.val" plus a matrix with statistcs reported by 'confusionMatrix'. "all" return a list with all the mentioned outputs.
data(cutpoint, PS, package = "MethylIT") ## DMPs are selected using the cupoints DMPs < selectDIMP(PS, div.col = 9L, cutpoint = cutpoint$cutpoint, tv.cut = 0.92) ## Classification of DMPs into two clases: DMPS from control and DMPs from ## treatment samples and evaluation of the classifier performance (for more ## details see ?evaluateDIMPclass). perf < evaluateDIMPclass(LR = DMPs, column = c(hdiv = TRUE, TV = TRUE, wprob = TRUE, pos = TRUE), classifier = "lda", n.pc = 4L, control.names = c("C1", "C2", "C3"), treatment.names = c("T1", "T2", "T3"), center = TRUE, scale = TRUE, prop = 0.6) ## Model classification performance perf$Performance