Machine-learning Approach for Cutpoint Estimation

Internal function to estimate cutpoint following a machine-learning approach

mlCutpoint(
  LR,
  control.names,
  treatment.names,
  column,
  div.col,
  tv.col = NULL,
  tv.cut,
  post.cut = 0.5,
  classifier1,
  classifier2 = NULL,
  interactions = NULL,
  n.pc,
  prop = 0.6,
  center = FALSE,
  scale = FALSE,
  stat = 0L,
  cut.values = NULL,
  maxnodes = NULL,
  ntree = 400,
  nsplit = 1L,
  num.cores = 1L,
  tasks = 0L,
  ...
)

Arguments

LR, res, control.names, treatment.names, column, div.col: Same as in estimateCutPoint
column, div.col, tv.col, tv.cut, cut.values, stat: Same as in estimateCutPoint
classifier1, classifier2, n.pc, prop, post.cut: Same as in estimateCutPoint
interactions: If a logistic classifier is used. Variable interactions to consider in a logistic regression model. Any pairwise combination of the variable 'hdiv', 'TV', 'wprob', and 'pos' can be provided. For example: 'hdiv:TV', 'wprob:pos', 'wprob:TV', etc.
center: A logical value indicating whether the variables should be shifted to be zero centered.
scale: A logical value indicating whether the variables should be
maxnodes, ntree: Only for Random Forest classifier (randomForest, 'random_forest'). Maximum number \(maxnodes\) of terminal nodes trees in the forest can have. If not given, trees are grown to the maximum possible. Parameter \(ntree\) stands for the number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times.
nsplit: Only for Random Forest classifier. The Random Forest (randomForest, 'random_forest') package uses a C+Fortran implementation which only supports integer indexes, so any dataframe/data table/matrix with >2^31 elements (limit for integers) gives an error. The option nsplit is applied to train \(ntrees=floor(ntree/nsplit)\) models with \(rep(ntrees,nsplit)\) which are finally combined to obtain a forest with \(ntree\). Each model in this would contain \(ntrees\).
...: Additional arguments for evaluateDIMPclass function
cutp_data, num.cores, tasks: Same as in estimateCutPoint

Value

Specified in function estimateCutPoint for parameter setting simple = FALSE

Details

This function is called by function estimateCutPoint.

Author

Robersy Sanchez (https://genomaths.com).

Examples

## Get a set of potential DMPS (PS)
data(PS, package = 'MethylIT')

cutp <- mlCutpoint(LR = PS,
                 column = c(hdiv = TRUE, TV = TRUE,
                            wprob = TRUE, pos = TRUE),
                 classifier1 = 'qda', n.pc = 4,
                 control.names = c('C1', 'C2', 'C3'),
                 treatment.names = c('T1', 'T2', 'T3'),
                 tv.cut = 0.68, prop = 0.6,
                 div.col = 9L)
cutp$testSetPerformance
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction  CT  TT
#>         CT  53   0
#>         TT   0 610
#>                                      
#>                Accuracy : 1          
#>                  95% CI : (0.9945, 1)
#>     No Information Rate : 0.9201     
#>     P-Value [Acc > NIR] : < 2.2e-16  
#>                                      
#>                   Kappa : 1          
#>                                      
#>  Mcnemar's Test P-Value : NA         
#>                                      
#>             Sensitivity : 1.0000     
#>             Specificity : 1.0000     
#>          Pos Pred Value : 1.0000     
#>          Neg Pred Value : 1.0000     
#>              Prevalence : 0.9201     
#>          Detection Rate : 0.9201     
#>    Detection Prevalence : 0.9201     
#>       Balanced Accuracy : 1.0000     
#>                                      
#>        'Positive' Class : TT         
#>