findCutpoint {MethylIT.utils}R Documentation

Find a cutoff of divergences of methylation level values

Description

A function to help on the decision of which is the best cutoff value for DIMP/DMP predictions. The genome-wide methylation changes that occurs in any living organism is the result of the superposition of several stochastic processes: the inherent stochasticity of biological processes and, particular, ultimately, it derives from the stochasticity of biochemical processes. On this scenario, there is not way to say with absolute determinism where a given value of an information divergence is a true positive value or a true negative value. All what we can do is the estimation of performance indicators like accuracy, sensitivity, false positive rate, etc., to evaluate the consequences of our decision on what we consider a true positive or a true negative. For example, a difference of methylation levels of 100 samples in given cytosine position does not means that this difference will not be observed in some sample from the control group. Without any doubt about it, such a different can be found in control samples as well. The fluctuation theorem guaranty such an outcome, which in the current context is a consequence of the action of second law of thermodynamics on living organisms.

Usage

findCutpoint(LR, min.tv = 0.25, tv.cut = 0.5, predcuts, tv.col,
  div.col = NULL, pval.col = NULL, stat = 1, maximize = TRUE,
  num.cores = 1L, tasks = tasks)

Arguments

LR

A list of GRanges, a GRangesList, a CompressedGRangesList object. Each GRanges object from the list must have at least two columns: a column containing the total variation of methylation level (TV, difference of methylation levels) and a column containing a divergence of methylation levels (it could be TV or Hellinger divergence) or a column with a p-value from where the cutpoint will be found (see example).

min.tv

Minimum value for the total variation distance (TVD; absolute value of methylation levels differences, TVD = abs(TV)). Only sites/ranges k with TVD_{k} > min.tv are analyzed. Defaul min.tv = 0.25.

tv.cut

A cutoff for the total variation distance to be applied to each site/range. Sites/ranges k with TVD_{k} < tv.cut are considered TRUE negatives and sites with TVD_{k} > tv.cut TRUE positives. Its value must be a number 0 < tv.cut < 1. A possible value for tv.cut would be, e.g., the minimum value of *TV* found in the treatment group after the potential DMPs are estimated. Default is tv.cut = 0.5.

predcuts

A numerical vector of possible cutoff values (cutpoints) for a divergence of methylation levels value or a p-value, according with the magnitude given in div.col or in pval.col, respectively. For each cutpoint k the values greater than predcuts[k] are predicted TRUE (positives), otherwise are predicted FALSE (negatives).

tv.col

Column number where the total variation is located in the metadata from each GRanges object.

div.col

Column number for divergence of methylation levels used in the estimation of the cutpoints. Default: NULL. One of the parameter values div.col or pval.col must be given.

pval.col

Column number for p-value used in the estimation of the cutpoints. Default: NULL. One of the parameter values div.col or pval.col must be given.

stat

An integer number indicating the statistic to be used in the testing. The mapping for statistic names are: 0 = "All" 1 = "Accuracy", 2 = "Sensitivity", 3 = "Specificity", 4 = "Pos Pred Value", 5 = "Neg Pred Value", 6 = "Precision", 7 = "Recall", 8 = "F1", 9 = "Prevalence", 10 = "Detection Rate", 11 = "Detection Prevalence", 12 = "Balanced Accuracy".

maximize

Whether to maximize the performance indicator given in parameter 'stat'. Default: TRUE.

num.cores, tasks

Paramaters for parallele computation using package BiocParallel-package: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply and the number of tasks per job (only for Linux OS).

Details

Given a numerical vector of cutoff values for the divergences of methylation level values, or p-values cutoffs, this function search for the cutoff value that yield the best classification performance for the specified performance indicator.

Value

A list with the classification repformance results for the best cutoff value in the ranges of predcuts supplied.

Author(s)

Robersy Sanchez

Examples

# load simulated data of potential methylated signal
data(sim_ps)

# Vector of cutoff values
cuts = c(2, 5, 10, 15, 18, 20, 21, 22, 25, 27, 30, 35, 40,
        45, 50, 55, 60)
# === To find the cutpoint that maximize the accuracy ===
pre.cut.acc = findCutpoint(LR = PS, min.tv = 0.25, tv.cut = 0.5,
                            predcuts = cuts, tv.col = 7L, div.col = 9,
                            stat = 1, num.cores = 15)

[Package MethylIT.utils version 0.3.1 ]