Internal function to estimate cutpoint following a machine-learning approach
mlCutpoint(
LR,
control.names,
treatment.names,
column,
div.col,
tv.col = NULL,
tv.cut,
post.cut = 0.5,
classifier1,
classifier2 = NULL,
interactions = NULL,
n.pc,
prop = 0.6,
center = FALSE,
scale = FALSE,
stat = 0L,
cut.values = NULL,
maxnodes = NULL,
ntree = 400,
nsplit = 1L,
num.cores = 1L,
tasks = 0L,
...
)
Same as in
estimateCutPoint
Same as in
estimateCutPoint
Same as in
estimateCutPoint
If a logistic classifier is used. Variable interactions to consider in a logistic regression model. Any pairwise combination of the variable 'hdiv', 'TV', 'wprob', and 'pos' can be provided. For example: 'hdiv:TV', 'wprob:pos', 'wprob:TV', etc.
A logical value indicating whether the variables should be shifted to be zero centered.
A logical value indicating whether the variables should be
Only for Random Forest classifier
(randomForest
, 'random_forest'). Maximum number
\(maxnodes\) of terminal nodes trees in the forest can have. If not given,
trees are grown to the maximum possible. Parameter \(ntree\) stands for
the number of trees to grow. This should not be set to too small a number,
to ensure that every input row gets predicted at least a few times.
Only for Random Forest classifier. The Random Forest
(randomForest
, 'random_forest') package uses a
C+Fortran implementation which only supports integer indexes, so any
dataframe/data table/matrix with >2^31 elements (limit for integers) gives
an error. The option nsplit is applied to train
\(ntrees=floor(ntree/nsplit)\) models with \(rep(ntrees,nsplit)\) which
are finally combined to obtain a forest with \(ntree\). Each model in this
would contain \(ntrees\).
Additional arguments for evaluateDIMPclass
function
Same as in estimateCutPoint
Specified in function estimateCutPoint
for parameter
setting simple = FALSE
This function is called by function estimateCutPoint
.
## Get a set of potential DMPS (PS)
data(PS, package = 'MethylIT')
cutp <- mlCutpoint(LR = PS,
column = c(hdiv = TRUE, TV = TRUE,
wprob = TRUE, pos = TRUE),
classifier1 = 'qda', n.pc = 4,
control.names = c('C1', 'C2', 'C3'),
treatment.names = c('T1', 'T2', 'T3'),
tv.cut = 0.68, prop = 0.6,
div.col = 9L)
cutp$testSetPerformance
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction CT TT
#> CT 53 0
#> TT 0 610
#>
#> Accuracy : 1
#> 95% CI : (0.9945, 1)
#> No Information Rate : 0.9201
#> P-Value [Acc > NIR] : < 2.2e-16
#>
#> Kappa : 1
#>
#> Mcnemar's Test P-Value : NA
#>
#> Sensitivity : 1.0000
#> Specificity : 1.0000
#> Pos Pred Value : 1.0000
#> Neg Pred Value : 1.0000
#> Prevalence : 0.9201
#> Detection Rate : 0.9201
#> Detection Prevalence : 0.9201
#> Balanced Accuracy : 1.0000
#>
#> 'Positive' Class : TT
#>