Predict DIMP class — predictDIMPclass • MethylIT

This function classify each DMP as a control or a treatment DMP

predictDIMPclass(
  LR,
  model,
  conf.matrix = FALSE,
  control.names = NULL,
  treatment.names = NULL
)

Arguments

LR: A list of GRanges objects obtained through the through MethylIT downstream analysis. Basically, this object is a list of GRanges containing only differentially methylated position (DMPs). The metacolumn of each GRanges must contain the column: Hellinger divergence 'hdiv', total variation 'TV', the probability of potential DMP 'wprob', which naturally are added in the downstream analysis of MethylIT.
model: A classifier model obtained with the function 'evaluateDIMPclass'.
conf.matrix: Optional. Logic, whether a confusion matrix should be returned (default, FALSE, see below).
control.names: Optional. Names/IDs of the control samples, which must be include in the variable LR (default, NULL).
treatment.names: Optional. Names/IDs of the treatment samples, which must be include in the variable LR (default, NULL).

Value

The same LR object with tow new columns named 'class' and 'posterior' added to each GRanges object from LR (default). Based on the model prediction each DMP is labeled as control 'CT' or as treatment 'TT' in column 'class'. Column 'posterior' provides, for each DMP, the posterior probability that the given DMP can be classified as induced by the 'treatment' (a treatment DMP).

Control DMPs classified as 'treatment' are false positives. However, if the same cytosine position is classified as 'treatment DMP' in both groups, control and treatment, but with higher posterior probability in the treatment group, then this would indicate a reinforcement of the methylation status in such a position induced by the treatment.

If 'conf.matrix' is TRUE and the arguments control.names and treatment.names are provided, then the overall confusion matrix is returned.

Details

Predictions only makes sense if the query DMPs belong to same methylation context and derive from an experiment accomplished under the same condition set for the DMPs used to build the model.

Examples


### Load dataset from the package
data(logit_perf, dmps, package = 'MethylIT')
set.seed(123)

### Select a random subset (70%) from each DMP sample
DMPs <- lapply(dmps, function(x) {
            idx <- length(x) * 0.7
            return(x[ sample.int(idx)  ])
    }, keep.attr = TRUE)


### To accomplish the prediction for logistic model
predclass.dmps <- predictDIMPclass(
    LR = DMPs,
    model = logit_perf$model,
    conf.matrix = TRUE,
    control.names =  c('C1', 'C2', 'C3'),
    treatment.names = c('T1', 'T2', 'T3'))

predclass.dmps
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction   CT   TT
#>         CT   91 1066
#>         TT    0    0
#>                                           
#>                Accuracy : 0.0787          
#>                  95% CI : (0.0638, 0.0957)
#>     No Information Rate : 0.9213          
#>     P-Value [Acc > NIR] : 1               
#>                                           
#>                   Kappa : 0               
#>                                           
#>  Mcnemar's Test P-Value : <2e-16          
#>                                           
#>             Sensitivity : 0.00000         
#>             Specificity : 1.00000         
#>          Pos Pred Value :     NaN         
#>          Neg Pred Value : 0.07865         
#>              Prevalence : 0.92135         
#>          Detection Rate : 0.00000         
#>    Detection Prevalence : 0.00000         
#>       Balanced Accuracy : 0.50000         
#>                                           
#>        'Positive' Class : TT              
#>                                           

### To accomplish the prediction PCA-QDA model
data("pcaQda_perf", package = 'MethylIT')

predclass.dmps <- predictDIMPclass(
    LR = DMPs,
    model = logit_perf$model,
    conf.matrix = TRUE,
    control.names =  c('C1', 'C2', 'C3'),
    treatment.names = c('T1', 'T2', 'T3'))

predclass.dmps
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction   CT   TT
#>         CT   91 1066
#>         TT    0    0
#>                                           
#>                Accuracy : 0.0787          
#>                  95% CI : (0.0638, 0.0957)
#>     No Information Rate : 0.9213          
#>     P-Value [Acc > NIR] : 1               
#>                                           
#>                   Kappa : 0               
#>                                           
#>  Mcnemar's Test P-Value : <2e-16          
#>                                           
#>             Sensitivity : 0.00000         
#>             Specificity : 1.00000         
#>          Pos Pred Value :     NaN         
#>          Neg Pred Value : 0.07865         
#>              Prevalence : 0.92135         
#>          Detection Rate : 0.00000         
#>    Detection Prevalence : 0.00000         
#>       Balanced Accuracy : 0.50000         
#>                                           
#>        'Positive' Class : TT              
#>                                           

### To accomplish the prediction PCA-LDA model
data("pcaLda_perf", package = 'MethylIT')

predclass.dmps <- predictDIMPclass(
    LR = DMPs,
    model = logit_perf$model,
    conf.matrix = TRUE,
    control.names =  c('C1', 'C2', 'C3'),
    treatment.names = c('T1', 'T2', 'T3'))

predclass.dmps
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction   CT   TT
#>         CT   91 1066
#>         TT    0    0
#>                                           
#>                Accuracy : 0.0787          
#>                  95% CI : (0.0638, 0.0957)
#>     No Information Rate : 0.9213          
#>     P-Value [Acc > NIR] : 1               
#>                                           
#>                   Kappa : 0               
#>                                           
#>  Mcnemar's Test P-Value : <2e-16          
#>                                           
#>             Sensitivity : 0.00000         
#>             Specificity : 1.00000         
#>          Pos Pred Value :     NaN         
#>          Neg Pred Value : 0.07865         
#>              Prevalence : 0.92135         
#>          Detection Rate : 0.00000         
#>    Detection Prevalence : 0.00000         
#>       Balanced Accuracy : 0.50000         
#>                                           
#>        'Positive' Class : TT              
#>