Abstract

A fast introduction into methylation with Methyl-IT is provided here. The methylome analysis implemented in Methyl-IT based on a signal-detection and machine-learning approach.
library(MethylIT)

Methyl-IT pipeline

MethylIT is an R package for methylome analysis based on a signal-detection machine learning approach (SD-ML). This approach is postulated to provide greater sensitivity for resolving true signal from the methylation background within the methylome (Sanchez and Mackenzie 2016; Sanchez et al. 2019). Because the biological signal created within the dynamic methylome environment characteristic of plants is not free from background noise, the approach, designated Methyl-IT, includes the application of signal detection theory (Greiner, Pfeiffer, and Smith 2000; Carter et al. 2016; Harpaz et al. 2013; Kruspe et al. 2017). A basic requirement for the application of signal detection is the knowledge of the probability distribution of the background noise, which is used as null hypothesis in the detection of the methylation signal. This probability distribution is, in general, a member of generalized gamma distribution family, and it can be deduced on a statistical mechanical/thermodynamics basis from DNA methylation induced by thermal fluctuations (Sanchez and Mackenzie 2016).

Load data

Methyl-IT uses the read counts of methylation called derived from whole-genome bisulfite sequencing (WGBS). Methylome datasets of WGBS are available at Gene Expression Omnibus (GEO DataSets). For the current example, we just load RData files containing the read-counts from several samples.

#### ============ Download methylation data from PSU's GitLab =============
url <- paste0("https://git.psu.edu/genomath/MethylIT_examples/raw/master/",
            "MethylIT_testing/memoryLine_100k_samples_cg.RData")
temp <- tempfile(fileext = ".RData")
download.file(url = url, destfile = temp)
load(temp)
file.remove(temp); rm(temp, url)
#> [1] TRUE

names(meth_samples)
#>  [1] "W_3_1" "W_3_2" "W_3_3" "W_3_4" "W_3_5" "M_3_1" "M_3_2" "M_3_3" "M_3_4"
#> [10] "M_3_5"

Gene annotation

url <- paste0("https://git.psu.edu/genomath/MethylIT_examples/raw/master/",
            "MethylIT_testing/Arabidopsis_thaliana.TAIR10.38_only_genes.RData")
temp <- tempfile(fileext = ".RData")
download.file(url = url, destfile = temp)
load(temp)
file.remove(temp); rm(temp, url)
#> [1] TRUE

The reference sample

To perform the comparison between the uncertainty of methylation levels from each group of individuals, control \((c)\) and treatment \((t)\), we should estimate the uncertainty variation with respect to the same reference individual on the mentioned metric space. The reason to measure the uncertainty variation with respect to the same reference resides in that even sibling individuals follow an independent ontogenetic development and, consequently, their developments follow independent stochastic processes. The reference individual is estimated with function poolFromGRlist:

ref <- poolFromGRlist(list(meth_samples$W_3_1, meth_samples$W_3_2,
                        meth_samples$W_3_3), stat = "mean", num.cores = 4L
)
#> *** Building a unique GRanges object from the list...
#>  *** Building coordinates for the new GRanges object ...
#>  *** Strand information is preserved ...
#>  *** Processing GRanges for sample: #1
#>  *** Processing GRanges for sample: #2...
#>  *** Processing GRanges for sample: #3...
#>  *** Sorting by chromosomes and start positions...
#> *** Building a virtual methylome...

Estimation of information divergences

To perform the comparison between the uncertainty of methylation levels from each group of individuals, control \((c)\) and treatment \((t)\), the divergence between the methylation levels of each individual is estimated with respect to the reference sample.

ml_samples <- meth_samples[c("W_3_3", "W_3_4","W_3_5",
                            "M_3_1","M_3_2","M_3_3","M_3_4")]
ml_hd <- estimateDivergence(ref = ref,
                            indiv = ml_samples,
                            Bayesian = TRUE,
                            min.coverage = 4,
                            min.meth = 1,
                            high.coverage = 500,
                            percentile = 0.999,
                            num.cores = 4L,
                            tasks = 2L, verbose = FALSE)
ml_hd$M_3_1
#> GRanges object with 13112 ranges and 9 metadata columns:
#>           seqnames    ranges strand |        c1        t1        c2        t2
#>              <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric>
#>       [1]        1       161      + |         1         2         5         1
#>       [2]        1       310      + |         4         5         4        12
#>       [3]        1       311      - |         2         2         3         2
#>       [4]        1       500      + |         8         8         7         6
#>       [5]        1       501      - |         4         1         4         3
#>       ...      ...       ...    ... .       ...       ...       ...       ...
#>   [13108]       Pt     84348      - |         2       214         2       186
#>   [13109]       Pt     84350      + |         2       152         0       133
#>   [13110]       Pt     84351      - |         1       205         2       175
#>   [13111]       Pt     84353      + |         1       145         0       127
#>   [13112]       Pt     84354      - |         1       195         0       164
#>                            p1                  p2                   TV
#>                     <numeric>           <numeric>            <numeric>
#>       [1]   0.392123103572845   0.764478583503753                  0.5
#>       [2]     0.4540512595575   0.272979343401941   -0.194444444444444
#>       [3]   0.503623099168751   0.576954596807912                  0.1
#>       [4]    0.50113182819633   0.534754757417114   0.0384615384615384
#>       [5]   0.735554399301687   0.559034434064729   -0.228571428571429
#>       ...                 ...                 ...                  ...
#>   [13108]  0.0126265121112533  0.0147758129336065  0.00137903861308117
#>   [13109]   0.017662437689319 0.00595389007120612   -0.012987012987013
#>   [13110] 0.00841476348922531  0.0156859154710009  0.00644506609620975
#>   [13111]  0.0118388367923311 0.00623170116120067 -0.00684931506849315
#>   [13112] 0.00884093144630434 0.00483925665447287 -0.00510204081632653
#>                         bay.TV               hdiv
#>                      <numeric>          <numeric>
#>       [1]    0.372355479930908  0.754596864387893
#>       [2]   -0.181071916155559  0.451488905020791
#>       [3]   0.0733314976391617 0.0295646659037349
#>       [4]    0.033622929220784 0.0173859971813955
#>       [5]   -0.176519965236958  0.236803386775921
#>       ...                  ...                ...
#>   [13108]   0.0021493008223532 0.0172923617150407
#>   [13109]  -0.0117085476181129   0.45154690308663
#>   [13110]  0.00727115198177563  0.217514462265944
#>   [13111] -0.00560713563113038  0.123141137274554
#>   [13112] -0.00400167479183146  0.108181314065562
#>   -------
#>   seqinfo: 7 sequences from an unspecified genome; no seqlengths

Function estimateDivergence returns a list of GRanges objects with the four columns of counts, the information divergence, and additional columns:

1. The original matrix of methylated ($c_i$) and unmethylated ($t_i$) read
    counts from control ($i=1$) and treatment ($i=2$) samples.
2. "p1" and "p2": methylation levels for control and treatment, 
    respectively.
3. "bay.TV": total variation TV = p2 - p1.
4. "TV": total variation based on simple counts: $TV=c1/(c1+t1)-c2/(c2+t2)$.
5. "hdiv": Hellinger divergence.

Observe that the absolute values |TV| is an information divergence as well, which is known as total variation distance (widely used in information theory).

Descriptive statistical analysis

A descriptive analysis on the distribution of Hellinger divergences of methylation levels is recommended. Like in any other statistical analysis, descriptive statistical analysis help us to detect potential issues in the raw data. It is the user responsibility to perform quality check of his/her dataset before start to apply Methyl-IT. Nevertheless, the quality checking of the raw data is not perfect. So, it is healthy to sees for potential issues.

## Critical values
critical.val <- do.call(rbind, lapply(ml_hd, function(x) {
    hd.95 = quantile(x$hdiv, 0.95)
    tv.95 = quantile(abs(x$bay.TV), 0.95)
    btv.95 = quantile(abs(x$TV), 0.95)
    return(c(tv = tv.95, hd = hd.95, TTV = btv.95,
            num.sites.hd95 = sum(x$hdiv > hd.95),
            num.sites.tv95 = sum(x$bay.TV > tv.95),
            num.sites.btv.95 = sum(x$TV > btv.95)
    ))})
)
critical.val
#>          tv.95%    hd.95%   TTV.95% num.sites.hd95 num.sites.tv95
#> W_3_3 0.1748777 0.4966905 0.2243590            652            329
#> W_3_4 0.2543914 1.0024683 0.3333333            663            245
#> W_3_5 0.2467064 1.0291523 0.3000000            694            309
#> M_3_1 0.2947047 1.1405845 0.3636364            656            154
#> M_3_2 0.2810490 1.3483083 0.3333333            669            188
#> M_3_3 0.3051646 1.2746263 0.3888889            650            158
#> M_3_4 0.2643298 1.5618782 0.3333333            669            257
#>       num.sites.btv.95
#> W_3_3              331
#> W_3_4              248
#> W_3_5              265
#> M_3_1              159
#> M_3_2              184
#> M_3_3              172
#> M_3_4              189

Quality checking of the coverage

After the raw data filtering introduced with estimateDivergence function, some coverage issue could be detected.

covr <- lapply(ml_hd, function(x) {
                cov <- x$c2 + x$t2
                return(cov)
})

do.call(rbind, lapply(covr, function(x) {
    q60 <- quantile(x, 0.6)
    q9999 <- quantile(x, 0.9999)
    idx1 <- which(x >= q60)
    idx2 <- which(x <= 500)
    q95 <- quantile(x, 0.95)
    idx <- intersect(idx1, idx2)
    return(c(
        round(summary(x)),
        q60, quantile(x, c(0.95, 0.99, 0.999, 0.9999)),
        num.siteGreater_8 = sum(x >= 8),
        q60_to_500 = sum((x >= q60) & (x <= 500)),
        num.siteGreater_500 = sum(x > 500)
))})
)
#>       Min. 1st Qu. Median Mean 3rd Qu. Max. 60% 95%    99%   99.9%   99.99%
#> W_3_3    0       7     12   33      40  491  15 112 351.00 424.000 466.4600
#> W_3_4    0       6     10   31      41  499  14  97 390.68 496.000 499.0000
#> W_3_5    0       7     12   34      41  495  16 113 370.20 446.480 486.6120
#> M_3_1    0       6     10   29      37  500  13  88 378.89 491.889 499.6889
#> M_3_2    0       6     11   28      30  499  14  77 357.00 457.000 496.3130
#> M_3_3    0       6      9   21      28  497  11  57 173.04 448.024 490.5020
#> M_3_4    0       7     11   28      30  494  15  81 347.00 449.000 489.6370
#>       num.siteGreater_8 q60_to_500 num.siteGreater_500
#> W_3_3              9348       5420                   0
#> W_3_4              8776       5364                   0
#> W_3_5             10133       5630                   0
#> M_3_1              8512       5288                   0
#> M_3_2              9020       5553                   0
#> M_3_3              7702       5602                   0
#> M_3_4              9390       5409                   0

Estimation of the best fitted probability distribution model

A basic requirement for the application of signal detection is the knowledge of the probability distribution of the background noise. Probability distribution, as a Weibull distribution model, can be deduced on a statistical mechanical/thermodynamics basis for DNA methylation induced by thermal fluctuations (Sanchez and Mackenzie 2016).

gof <- gofReport(HD = ml_hd,
                model = c("Weibull2P", "Weibull3P",
                        "Gamma2P", "Gamma3P"),
                column = 9,
                output = "all",
                confl_model = TRUE,
                num.cores = 4L,
                task = 2L,
                verbose = FALSE
)
#> Warning in gofReport(HD = ml_hd, model = c("Weibull2P", "Weibull3P", "Gamma2P", : 
#>  The best fitted model for sample(s) M_3_4 require(s) for further analysis. 
#> The model with the lowest AIC must have the highest R.Cross.val
gof$stats
gof$bestModel

Signal detection

The information thermodynamics-based approach is postulated to provide greater sensitivity for resolving true signal from the thermodynamic background within the methylome (Sanchez and Mackenzie 2016). Since the biological signal created within the dynamic methylome environment characteristic of living organisms is not free from background noise, the approach, designated Methyl-IT, includes the application of signal detection theory (Greiner, Pfeiffer, and Smith 2000; Carter et al. 2016; Harpaz et al. 2013; Kruspe et al. 2017). Signal detection is a critical step to increase sensitivity and resolution of methylation signal by reducing the signal-to-noise ratio and objectively controlling the false positive rate and prediction accuracy/risk.

Potential methylation signal

The first estimation in our signal detection step is the identification of the cytosine sites carrying potential methylation signal \(PS\). The methylation regulatory signal does not hold the theoretical distribution and, consequently, for a given level of significance \(\alpha\) (Type I error probability, e.g. \(\alpha = 0.05\)), cytosine positions \(k\) with information divergence \(H_k >= H_{\alpha = 0.05}\) can be selected as sites carrying potential signals \(PS\). The value of \(\alpha\) can be specified. For example, potential signals with \(H_k > H_{\alpha = 0.01}\) can be selected. For each sample, cytosine sites are selected based on the corresponding fitted theoretical distribution model estimated in the previous step.

ps <- getPotentialDIMP(LR = ml_hd,
                        nlms = gof$nlms,
                        div.col = 9L,
                        tv.col = 0.25,
                        dist.name = gof$bestModel)
ps$M_3_1
#> GRanges object with 737 ranges and 10 metadata columns:
#>         seqnames    ranges strand |        c1        t1        c2        t2
#>            <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric>
#>     [1]        1       511      + |        11         5         4         8
#>     [2]        1       648      - |         2         2         7         0
#>     [3]        1      5459      + |         6         2         2         5
#>     [4]        1      5466      - |         6         1         6         8
#>     [5]        1      7196      + |         2         3         0         9
#>     ...      ...       ...    ... .       ...       ...       ...       ...
#>   [733]       Mt     86158      + |         0        74         2        48
#>   [734]       Pt     30364      + |         0       371         4       370
#>   [735]       Pt     36502      - |         3       362         0       434
#>   [736]       Pt     46637      + |         3       487         0       494
#>   [737]       Pt     67192      - |         1       417         6       481
#>                          p1                  p2                   TV
#>                   <numeric>           <numeric>            <numeric>
#>     [1]   0.673033759139014   0.353334714631113   -0.354166666666667
#>     [2]   0.503623099168751   0.908334478893389                  0.5
#>     [3]   0.713689467904528   0.326167737512289   -0.464285714285714
#>     [4]   0.798131662896332   0.436301124853528   -0.428571428571429
#>     [5]   0.425563914946981  0.0756781258028576                 -0.4
#>     ...                 ...                 ...                  ...
#>   [733] 0.00988257160661495  0.0543012507208152                 0.04
#>   [734] 0.00200202523549494  0.0127834702578034   0.0106951871657754
#>   [735]  0.0102214201503147   0.001839638935673 -0.00821917808219178
#>   [736] 0.00762161791063853 0.00161691723646952 -0.00612244897959184
#>   [737] 0.00416176037143209  0.0139203528120561   0.0099279840444868
#>                       bay.TV             hdiv              wprob
#>                    <numeric>        <numeric>          <numeric>
#>     [1]   -0.319699044507901 1.54766789991309 0.0187209712650864
#>     [2]    0.404711379724638  1.3579644452668 0.0276330873566829
#>     [3]   -0.387521730392239 1.32635922347908 0.0295018600053619
#>     [4]   -0.361830538042805 1.51432351745599 0.0200392118787338
#>     [5]   -0.349885789144124 1.37801067484234 0.0265120089817865
#>     ...                  ...              ...                ...
#>   [733]   0.0444186791142002  1.1148774914804 0.0459417346437029
#>   [734]   0.0107814450223084 1.75426411786179 0.0123204930412189
#>   [735] -0.00838178121464166 1.35401146205784 0.0278599176187997
#>   [736] -0.00600470067416902 1.09770327022115 0.0476450078448564
#>   [737]    0.009758592440624 1.30004029948585   0.03115842254738
#>   -------
#>   seqinfo: 7 sequences from an unspecified genome; no seqlengths

Cutpoint estimation. DMPs

Laws of statistical physics can account for background methylation, a response to thermal fluctuations that presumably functions in DNA stability (Sanchez and Mackenzie 2016; Sanchez et al. 2019). True signal is detected based on the optimal cutpoint (López-Ratón et al. 2014).

The need for the application of (what is now known as) signal detection in cancer research was pointed out by Youden in the midst of the last century (Youden 1950). In the next example, the simple cutpoint estimation available in Methyl-IT is based on the application of Youden index (Youden 1950). Although cutpoints are estimated for a single variable, the classification performance can be evaluated for several variables and applying different model classifiers. A optimal cutpoint distinguishes disease stages from healthy individual. The performance of this classification is given in the output of function estimateCutPoint.

A model classifier can be requested for further predictions and its classification performance is also provided. Below, the selected model classifier is a quadratic discriminant analysis (QDA) (classifier1 = “qda”, clas.perf = TRUE). Four predictor variables are available: the Hellinger divergence of methylation levels (hdiv), total variation distance (TV, absolute difference of methylation levels), relative position of cytosine site in the chromosome (pos), and the logarithm base two of the probability to observe a Hellinger divergence value \(H\) greater than the critical value \(H_{\alpha = 0.05}\) (values given as probabilities in object PS, wprob).

Notice that the cutpoint can be estimated for any of the two currently available information divergences: Hellinger divergence (div.col = 9) or the total variation distance (with Bayesian correction, div.col = 8).

### Cutpoint estimated using a model classifier
cutpoint <- estimateCutPoint(LR = ps,
                            simple = FALSE,
                            div.col = 9,
                            control.names = c("W_3_3", "W_3_4", "W_3_5"),
                            treatment.names = c("M_3_1", "M_3_2", "M_3_3",
                                                "M_3_4", "M_3_5"),
                            column = c(hdiv = TRUE, bay.TV = TRUE,
                                        wprob = TRUE, pos = TRUE),
                            classifier1 = "pca.qda", n.pc = 4,
                            classifier2 = "pca.logistic",
                            center = TRUE,
                            scale = TRUE,
                            clas.perf = TRUE,
                            verbose = FALSE
)
cutpoint$cutpoint
#> [1] 0.9612333
cutpoint$testSetPerformance
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction  CT  TT
#>         CT 237   0
#>         TT   1 544
#>                                      
#>                Accuracy : 0.9987     
#>                  95% CI : (0.9929, 1)
#>     No Information Rate : 0.6957     
#>     P-Value [Acc > NIR] : <2e-16     
#>                                      
#>                   Kappa : 0.997      
#>                                      
#>  Mcnemar's Test P-Value : 1          
#>                                      
#>             Sensitivity : 1.0000     
#>             Specificity : 0.9958     
#>          Pos Pred Value : 0.9982     
#>          Neg Pred Value : 1.0000     
#>              Prevalence : 0.6957     
#>          Detection Rate : 0.6957     
#>    Detection Prevalence : 0.6969     
#>       Balanced Accuracy : 0.9979     
#>                                      
#>        'Positive' Class : TT         
#> 
cutpoint$testSetModel.FDR
#> [1] 0.001834862

Differentially methylated positions (DMPs) can be now identified using selectDIMP function

dmps <- selectDIMP(ps, div.col = 9, cutpoint = cutpoint$cutpoint)
data.frame(dmps =unlist(lapply(dmps, length)))
#>       dmps
#> W_3_3  130
#> W_3_4  709
#> W_3_5  712
#> M_3_1  737
#> M_3_2  784
#> M_3_3  738
#> M_3_4  844

Differentially methylated genes (DMGs)

Differentially methylated genes (DMGs) are estimated from group comparisons for the number of DMPs on gene-body regions between control and treatment.

Function getDIMPatGenes is used to count the number of DMPs at gene-body. Nevertheless, it can be used for any arbitrary specified genomic region as well. The operation of this function is based on the ‘findOverlaps’ function from the ‘GenomicRanges’ Bioconductor R package. The ‘findOverlaps’ function has several critical parameters like, for example, ‘maxgap’, ‘minoverlap’, and ‘ignore.strand’. In our function getDIMPatGenes, except for setting ignore.strand = TRUE and type = “within”, we preserve the rest of default ‘findOverlaps’ parameters. In this case, these are important parameter settings because the local mechanical effect of methylation changes on a DNA region where a gene is located is independent of the strand where the gene is encoded. That is, methylation changes located in any of the two DNA strands inside the gene-body region will affect the flexibility of the DNA molecule (Choy et al. 2010; Severin et al. 2011).

nams <- names(dmps)
dmps_at_genes <- getDIMPatGenes(GR = dmps, GENES = genes, ignore.strand = TRUE)
dmps_at_genes <- uniqueGRanges(dmps_at_genes, columns = 2L,
                                ignore.strand = TRUE, type = "equal")
#>  *** Building coordinates for the new GRanges object ...
#>  *** Setting strand information to * ...
#>  *** Processing GRanges for sample: #1
#>  *** Processing GRanges for sample: #2...
#>  *** Processing GRanges for sample: #3...
#>  *** Processing GRanges for sample: #4...
#>  *** Processing GRanges for sample: #5...
#>  *** Processing GRanges for sample: #6...
#>  *** Processing GRanges for sample: #7...
#>  *** Sorting by chromosomes and start positions...
colnames(mcols(dmps_at_genes)) <- nams
dmps_at_genes
#> GRanges object with 77 ranges and 7 metadata columns:
#>        seqnames      ranges strand |     W_3_3     W_3_4     W_3_5     M_3_1
#>           <Rle>   <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric>
#>    [1]        1   3631-5899      * |         0         2         0         2
#>    [2]        1   6788-9130      * |         0         1         0         2
#>    [3]        1 11649-13714      * |         0         0         0         1
#>    [4]        1 23121-31227      * |         2         5         8        13
#>    [5]        1 31170-33171      * |         0         0         1         0
#>    ...      ...         ...    ... .       ...       ...       ...       ...
#>   [73]        5 49891-51437      * |         0         1         0         2
#>   [74]        5 53856-56052      * |         0         3         2         5
#>   [75]        5 61017-63936      * |         0         3         5         6
#>   [76]        5 86543-90117      * |         0         0         0         1
#>   [77]        5 91838-95701      * |         0         0         2         6
#>            M_3_2     M_3_3     M_3_4
#>        <numeric> <numeric> <numeric>
#>    [1]         2         1         3
#>    [2]         0         1         0
#>    [3]         0         0         0
#>    [4]        15        14        12
#>    [5]         0         0         0
#>    ...       ...       ...       ...
#>   [73]         1         2         2
#>   [74]         3         2         3
#>   [75]         7         4         8
#>   [76]         0         0         0
#>   [77]         5         3         5
#>   -------
#>   seqinfo: 5 sequences from an unspecified genome; no seqlengths

Setting experiment design

The experimental design is set with glmDataSet function

colData <- data.frame(condition = factor(c("WT", "WT", "WT",
                                            "ML", "ML", "ML", "ML"),
                                        levels = c("WT", "ML")),
                    nams,
                    row.names = 2)
## A RangedGlmDataSet is created
ds <- glmDataSet(GR = dmps_at_genes, colData = colData)

Testing for DMGs

DMGs are detected using function countTest2.

dmgs <- countTest2(DS = ds, num.cores = 4L,
                    tasks = 2L,
                    minCountPerIndv = 7,
                    maxGrpCV = c(1.2, 1.2),
                    Minlog2FC = 0.5,
                    CountPerBp = 0.001,
                    test = "LRT",
                    verbose = TRUE)
#> *** Number of GR after filtering counts 5
#> *** GLM...
dmgs
#> GRanges object with 4 ranges and 15 metadata columns:
#>       seqnames      ranges strand |     W_3_3     W_3_4     W_3_5     M_3_1
#>          <Rle>   <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric>
#>   [1]        1 23121-31227      * |         2         5         8        13
#>   [2]        3 31145-34670      * |         2         4         3         8
#>   [3]        4 51166-53449      * |         0         3         3        14
#>   [4]        5 34538-37999      * |         1         3         1         9
#>           M_3_2     M_3_3     M_3_4            log2FC  scaled.deviance
#>       <numeric> <numeric> <numeric>         <numeric>        <numeric>
#>   [1]        15        14        12 0.993251773010284 46.1893444462227
#>   [2]        11        12         7  1.15267950993839  6.0652836321834
#>   [3]         8         6        10  1.25276296849529 9.16216983213072
#>   [4]         7         6         6  1.43508452461667 23.8885478941796
#>                    pvalue          model             adj.pval
#>                 <numeric>       <factor>            <numeric>
#>   [1] 1.0735927829192e-11 Neg.Binomial.W 4.29437113167679e-11
#>   [2]  0.0137864612185258 Neg.Binomial.W   0.0137864612185258
#>   [3] 0.00247069440811782 Neg.Binomial.W  0.00494138881623564
#>   [4] 0.00452300284262087   QuasiPoisson  0.00603067045682782
#>           CT.SignalDensity    TT.SignalDensity SignalDensityVariation
#>                  <numeric>           <numeric>              <numeric>
#>   [1] 0.000616750955963982 0.00166522758110275    0.00104847662513877
#>   [2] 0.000850822461712989  0.0026942711287578    0.00184344866704481
#>   [3] 0.000875656742556918 0.00415936952714536    0.00328371278458844
#>   [4] 0.000481417292509147 0.00202195262853842    0.00154053533602927
#>   -------
#>   seqinfo: 5 sequences from an unspecified genome; no seqlengths

References

Carter, Jane V., Jianmin Pan, Shesh N. Rai, and Susan Galandiuk. 2016. “ROC-ing along: Evaluation and interpretation of receiver operating characteristic curves.” Surgery 159 (6). Mosby: 1638–45. https://doi.org/10.1016/j.surg.2015.12.029.

Choy, John S., Sijie Wei, Ju Yeon Lee, Song Tan, Steven Chu, and Tae Hee Lee. 2010. “DNA methylation increases nucleosome compaction and rigidity.” Journal of the American Chemical Society 132 (6). American Chemical Society: 1782–3. https://doi.org/10.1021/ja910264z.

Greiner, M, D Pfeiffer, and R D Smith. 2000. “Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests.” Preventive Veterinary Medicine 45 (1-2): 23–41. https://doi.org/10.1016/S0167-5877(00)00115-X.

Harpaz, Rave, William DuMouchel, Paea LePendu, Anna Bauer-Mehren, Patrick Ryan, and Nigam H Shah. 2013. “Performance of Pharmacovigilance Signal Detection Algorithms for the FDA Adverse Event Reporting System.” Clin Pharmacol Ther 93 (6): 1–19. https://doi.org/10.1038/clpt.2013.24.Performance.

Kruspe, Sven, David D. Dickey, Kevin T. Urak, Giselle N. Blanco, Matthew J. Miller, Karen C. Clark, Elliot Burghardt, et al. 2017. “Rapid and Sensitive Detection of Breast Cancer Cells in Patient Blood with Nuclease-Activated Probe Technology.” Molecular Therapy - Nucleic Acids 8 (September). Cell Press: 542–57. https://doi.org/10.1016/j.omtn.2017.08.004.

López-Ratón, Mónica, Mar\(\backslash\)\(\backslash\)ia Xosé Rodríguez-Álvarez, Carmen Cadarso-Suárez, Francisco Gude-Sampedro, and Others. 2014. “OptimalCutpoints: an R package for selecting optimal cutpoints in diagnostic tests.” Journal of Statistical Software 61 (8). Foundation for Open Access Statistics: 1–36. https://www.jstatsoft.org/article/view/v061i08.

Sanchez, Robersy, and Sally A. Mackenzie. 2016. “Information Thermodynamics of Cytosine DNA Methylation.” Edited by Barbara Bardoni. PLOS ONE 11 (3). Public Library of Science: e0150427. https://doi.org/10.1371/journal.pone.0150427.

Sanchez, Robersy, Xiaodong Yang, Thomas Maher, and Sally Mackenzie. 2019. “Discrimination of DNA Methylation Signal from Background Variation for Clinical Diagnostics.” Int. J. Mol. Sci. 20 (21): 5343. https://doi.org/https://doi.org/10.3390/ijms20215343.

Severin, Philip M D, Xueqing Zou, Hermann E Gaub, and Klaus Schulten. 2011. “Cytosine methylation alters DNA mechanical properties.” Nucleic Acids Research 39 (20): 8740–51. https://doi.org/10.1093/nar/gkr578.

Youden, W. J. 1950. “Index for rating diagnostic tests.” Cancer 3 (1): 32–35. https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3.