Department of Biology and Plant Science.

Pennsylvania State University, University Park, PA 16802

Maintainer: Robersy Sanchez

Pennsylvania State University, University Park, PA 16802

Maintainer: Robersy Sanchez

`vignettes/pca_lda_with_methylit.Rmd`

`pca_lda_with_methylit.Rmd`

Abstract

When methylation analysis is intended for diagnostic/prognostic purposes, for example, in clinical applications for patient diagnostics, to know whether the patient would be in healthy or disease stage we would like to have a good predictor tool in our side. It turns out that classical machine learning (ML) tools like hierarchical clustering, principal components and linear discriminant analysis can help us to reach such a goal. The current Methyl-IT downstream analysis is equipped with the mentioned ML tools.

The discrimination of the methylation signal from the stochastic methylation background resultant from the standard (non-stressful) biological processes is a critical step for the genome-wide methylation analysis. Such a discrimination requires for the knowledge of the probability distribution of the information divergence of methylation levels and a proper evaluation of the classification performance of differentially methylated positions (DMPs) into two classes: DMPs from control and DMPs from treatment.

Once a methylation analysis is performed to detect the genome-wide DMPs, a natural next step is to investigate whether the detected DMPs carries enough discriminatory power to discern between individual groups. Then, we must move from the methylation analysis to statistical/machine learning field (multivariate statistics). To reach our goal, each individual can be represented as a vector from the \(N\)-dimensional space of DMPs, where DMPs can be expressed in terms of some suitable measurement as function of the methylation levels. These suitable measurement are information divergences, such as total variation distance (\(TV_d\), the absolute value of methylation levels) and Hellinger divergence (\(H\)). It is worthy to observe that methylation levels are uncertainty measurements and, therefore, they do not carry information per se. Information is only carried on measurements expressing uncertainty variations like \(TV_d\), \(H\), Jensen-Shannon Divergence, and in general, the so called information theoretical divergences (Liese and Vajda 2006).

The \(N\)-dimensional space of DMPs is, usually, too big. So, a first dimension reduction is required. This can be done by splitting the chromosomes into non-overlapping genomic region and next computing some statistic like the *sum* or *mean* of some information divergences of methylation levels at DMPs inside of each region. Still we will have to deal with a multidimensional space, but acceptable for a meaningful application of hierarchical clustering (HC), principal component analysis (PCA) and linear discriminant analysis (LDA). In the current examples, Methyl-IT methylation analysis will be applied to a dataset of simulated samples to detect DMPs on then. Next, after space dimension reduction, the mentioned machine learning approached will be applied.

For the current example on methylation analysis with Methyl-IT we will use simulated data. Read-count matrices of methylated and unmethylated cytosine are generated with MethylIT.utils function simulateCounts. A basic example generating datasets is given in the web page: Methylation analysis with Methyl-IT

library(MethylIT) library(MethylIT.utils) library(ggplot2) library(ape) alpha.ct <- 0.019 alpha.g1 <- 0.022 alpha.g2 <- 0.025 # The number of cytosine sites to generate sites = 1.5e5 # Set a seed for pseudo-random number generation set.seed(124) control.nam <- c("C1", "C2", "C3", "C4", "C5") treatment.nam1 <- c("T1", "T2", "T3", "T4", "T5") treatment.nam2 <- c("T6", "T7", "T8", "T9", "T10") # Reference group ref0 = simulateCounts(num.samples = 3, sites = sites, alpha = alpha.ct, beta = 0.5, size = 50, theta = 4.5, sample.ids = c("R1", "R2", "R3")) # Control group ctrl = simulateCounts(num.samples = 5, sites = sites, alpha = alpha.ct, beta = 0.5, size = 50, theta = 4.5, sample.ids = control.nam) # Treatment group II treat = simulateCounts(num.samples = 5, sites = sites, alpha = alpha.g1, beta = 0.5, size = 50, theta = 4.5, sample.ids = treatment.nam1) # Treatment group II treat2 = simulateCounts(num.samples = 5, sites = sites, alpha = alpha.g2, beta = 0.5, size = 50, theta = 4.5, sample.ids = treatment.nam2)

A reference sample (virtual individual) is created using individual samples from the control population using function poolFromGRlist. The reference sample is further used to compute the information divergences of methylation levels, \(TV_d\) and \(H\), with function estimateDivergence. This is a first fundamental step to remove the background noise (fluctuations) generated by the inherent stochasticity of the molecular processes in the cells.

# === Methylation level divergences === # Reference sample ref = poolFromGRlist(ref0, stat = "mean", num.cores = 4L, verbose = FALSE) divs <- estimateDivergence(ref = ref, indiv = c(ctrl, treat, treat2), Bayesian = TRUE, num.cores = 6L, percentile = 1, verbose = FALSE) # To remove hd == 0 to estimate. The methylation signal only is given for divs = lapply(divs, function(div) div[ abs(div$hdiv) > 0 ], keep.attr = TRUE) names(divs) <- names(divs)

To get some statistical description about the sample is useful. Here, empirical critical values for the probability distribution of \(H\) and \(TV\) can is obtained using *quantile* function from the R package *stats*.

critical.val <- do.call(rbind, lapply(divs, function(x) { x <- x[x$hdiv > 0] hd.95 = quantile(x$hdiv, 0.95) tv.95 = quantile(abs(x$bay.TV), 0.95) return(c(tv = tv.95, hd = hd.95)) })) critical.val

```
## tv.95% hd.95%
## C1 0.6523410 62.26835
## C2 0.6500774 60.89741
## C3 0.6508289 61.48977
## C4 0.4856903 56.60726
## C5 0.4967501 58.12772
## T1 0.9236788 126.34287
## T2 0.9136199 120.45119
## T3 0.9137842 121.99485
## T4 0.9192167 124.54324
## T5 0.9225962 124.25515
## T6 0.9447952 132.69525
## T7 0.9447369 134.88511
## T8 0.9408397 131.80632
## T9 0.9482647 134.69034
## T10 0.9456276 133.87224
```

Here, the methylation signal is expressed in terms of Hellinger divergence of methylation levels. Here, the signal distribution is modelled by a Weibull probability distribution model. Basically, the model could be a member of the generalized gamma distribution family. For example, it could be gamma distribution, Weibull, or log-normal. To describe the signal, we may prefer a model with a cross-validations: R.Cross.val > 0.95. Cross-validations for the nonlinear regressions are performed in each methylome as described in (Stevens 2009). The non-linear fit is performed through the function nonlinearFitDist

The best model excluding “GGamma3P” model, for which the computation is computationally expensive:

d <- c("Weibull2P", "Weibull3P", "Gamma2P", "Gamma3P") gof <- gofReport(HD = divs, column = 9L, model = d, num.cores = 6L, output = "all", verbose = FALSE)

gof$bestModel

```
## C1 C2 C3 C4 C5 T1
## "Weibull3P" "Weibull3P" "Weibull3P" "Weibull3P" "Weibull3P" "Weibull3P"
## T2 T3 T4 T5 T6 T7
## "Weibull3P" "Weibull3P" "Weibull3P" "Weibull3P" "Weibull3P" "Weibull3P"
## T8 T9 T10
## "Weibull3P" "Weibull3P" "Weibull3P"
```

Although we did not included the main member of the generalized gamma distribution family (“GGamma3P”), the Cross-validations correlation coefficient *w3p_R.Cross.val* values above 0.99 gives us some peace of mind on our confidence on the model prediction power/performance when confronted with external data sets (AIC and BIC are not conceived to provide such an information).

gof$stats

```
## w2p_AIC w2p_R.Cross.val w3p_AIC w3p_R.Cross.val g2p_AIC
## C1 -172642.0 0.9994639 -174557.7 0.9994974 -164820.3
## C2 -175594.8 0.9995131 -177009.1 0.9995357 -167532.6
## C3 -177238.3 0.9995420 -178755.3 0.9995652 -169774.1
## C4 -219269.6 0.9994955 -264072.9 0.9998548 -206343.4
## C5 -217500.5 0.9994374 -273018.4 0.9998799 -205306.2
## T1 -168079.8 0.9972701 -225575.8 0.9994668 -160887.9
## T2 -167286.1 0.9972728 -226888.2 0.9995015 -160748.6
## T3 -170301.7 0.9974336 -227799.2 0.9995012 -163997.9
## T4 -168551.1 0.9973218 -225182.0 0.9994660 -162169.6
## T5 -167122.0 0.9972267 -227207.9 0.9994985 -160694.2
## T6 -173245.4 0.9971876 -229256.8 0.9993945 -165656.4
## T7 -170905.0 0.9970145 -228452.3 0.9993819 -163341.9
## T8 -171372.5 0.9970757 -229286.8 0.9994006 -163649.4
## T9 -170096.9 0.9969616 -227490.3 0.9993672 -162250.4
## T10 -171266.7 0.9970282 -228041.3 0.9993707 -163857.2
## g2p_R.Cross.val g3p_AIC g3p_R.Cross.val bestModel
## C1 0.9993929 -166320.3 0.9994541 w3p
## C2 0.9994499 -168751.9 0.9994939 w3p
## C3 0.9994861 -170986.9 0.9995260 w3p
## C4 0.9993142 -262797.6 0.9998669 Needs revision
## C5 0.9992290 -267887.9 0.9998753 w3p
## T1 0.9963828 -205273.7 0.9989377 w3p
## T2 0.9964685 -206912.6 0.9990147 w3p
## T3 0.9966887 -208521.3 0.9990291 w3p
## T4 0.9965315 -205648.9 0.9989541 w3p
## T5 0.9964091 -206816.6 0.9989987 w3p
## T6 0.9962174 -208521.5 0.9988035 w3p
## T7 0.9959979 -206833.2 0.9987503 w3p
## T8 0.9960769 -207922.5 0.9988038 w3p
## T9 0.9958861 -205571.5 0.9987166 w3p
## T10 0.9960165 -206551.4 0.9987330 w3p
```

The above statistical description of the dataset (evidently) suggests that there two main groups: control and treatments, while treatment group would split into two subgroups of samples. In the current case, to search for a good cutpoint, we do not need to use all the samples. The critical value \(H_{\alpha=0.05}=33.51223\) suggests that any optimal cutpoint for the subset of samples T1 to T5 will be optimal for the samples T6 to T10 as well.

Below, we are letting the PCA+LDA model classifier to take the decision on whether a differentially methylated cytosine position is a treatment DMP. To do it, Methyl-IT function getPotentialDIMP is used to get methylation signal probabilities of the observed \(H\) values for all cytosine site (alpha = 1), in accordance with the 2-parameter Weibull distribution model. Next, this information will be used to identify DMPs using Methyl-IT function estimateCutPoint. Cytosine positions with \(H\) values above the cutpoint are considered DMPs. Finally, a PCA + QDA model classifier will be fitted to classify DMPs into two classes: DMPs from control and those from treatment.

dmps <- getPotentialDIMP(LR = divs, nlms = gof$nlms, div.col = 9L, tv.cut = 0.461, tv.col = 8, alpha = 0.05, dist.name = gof$bestModel) cutp = estimateCutPoint(LR = dmps, simple = FALSE, column = c(hdiv = TRUE, TV = TRUE, wprob = TRUE, pos = TRUE), classifier1 = "pca.lda", classifier2 = "pca.qda", tv.cut = 0.461, control.names = control.nam, treatment.names = treatment.nam1, post.cut = 0.5, cut.values = seq(15, 38, 1), clas.perf = TRUE, prop = 0.6, center = FALSE, scale = FALSE, n.pc = 4, div.col = 9L, stat = 0) cutp

```
## Cutpoint estimation with 'pca.lda' classifier
## Cutpoint search performed using model posterior probabilities
##
## Posterior probability used to get the cutpoint = 0.5
## Cytosine sites with treatment PostProbCut >= 0.5 have a
## divergence value >= 92.13839
##
## Optimized statistic: Accuracy = 1
## Cutpoint = 92.138
##
## Model classifier 'pca.qda'
##
## The accessible objects in the output list are:
## Length Class Mode
## cutpoint 1 -none- numeric
## testSetPerformance 6 confusionMatrix list
## testSetModel.FDR 1 -none- numeric
## model 2 pcaQDA list
## modelConfMatrix 6 confusionMatrix list
## initModel 1 -none- character
## postProbCut 1 -none- numeric
## postCut 1 -none- numeric
## classifier 1 -none- character
## statistic 1 -none- character
## optStatVal 1 -none- numeric
## cutpData 1 -none- logical
```

In this case, the cutpoint \(H_{cutpoint}=92.138\) is lower from what is expected from the higher treatment empirical \(H_{\alpha = 0.05}^{TT_{Emp}}=120.45119\) critical value: \(H_{cutpoint} < H_{\alpha = 0.05}^{TT_{Emp}}\).

The model performance in in the test dataset is:

cutp$testSetPerformance

```
## Confusion Matrix and Statistics
##
## Reference
## Prediction CT TT
## CT 2464 0
## TT 0 5278
##
## Accuracy : 1
## 95% CI : (0.9995, 1)
## No Information Rate : 0.6817
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0000
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 1.0000
## Prevalence : 0.6817
## Detection Rate : 0.6817
## Detection Prevalence : 0.6817
## Balanced Accuracy : 1.0000
##
## 'Positive' Class : TT
##
```

The above cutpoint can be used to identify DMPs from control and treatment. The PCA+QDA model classifier can be used any time to discriminate control DMPs from those treatment. DMPs are retrieved using selectDIMP function (`selectDIMP`

):

dmps <- selectDIMP(LR = divs, div.col = 9L, cutpoint = cutp$cutpoint, tv.cut = 0.34, tv.col = 7) sapply(dmps,length)

```
## C1 C2 C3 C4 C5 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
## 450 428 410 522 578 2808 2651 2632 2735 2768 3170 3233 3210 3318 3239
```

Next, to represent individual samples as vectors from the *N*-dimensional space, we can use getGRegionsStat function from MethylIT.utils R package. Here, the simulated “chromosome” is split into regions of 450bp non-overlapping windows. and the *density* of Hellinger divergences values is taken for each windows.

ns <- names(dmps) DMRs <- getGRegionsStat(GR = dmps, win.size = 150, step.size = 150, stat = "density", column = 9L, scaling = 1) names(DMRs) <- ns as(DMRs, "GRangesList")

Hierarchical clustering (HC) is an unsupervised machine learning approach. HC can provide an initial estimation of the number of possible groups and information on their members. However, the effectivity of HC will depend on the experimental dataset, the metric used, and the agglomeration algorithm applied. For an unknown reason (and based on the personal experience of the author working in numerical taxonomy), Ward’s agglomeration algorithm performs much better on biological experimental datasets than the rest of the available algorithms like UPGMA, UPGMC, etc.

dmgm <- uniqueGRanges(DMRs, verbose = FALSE) dmgm <- t(as.matrix(mcols(dmgm))) rownames(dmgm) <- ns sampleNames <- ns hc = hclust(dist(data.frame( dmgm ))^2, method = 'ward.D') hc.rsq = hc hc.rsq$height <- sqrt( hc$height )

colors = sampleNames colors[grep("C", colors)] <- "green4" colors[grep("T[6-9]{1}", colors)] <- "red" colors[grep("T10", colors)] <- "red" colors[grep("T[1-5]", colors)] <- "blue" # rgb(red, green, blue, alpha, names = NULL, maxColorValue = 1) clusters.color = c(rgb(0, 0.7, 0, 0.1), rgb(0, 0, 1, 0.1), rgb(1, 0.2, 0, 0.1)) par(font.lab=2,font=3,font.axis=2, mar=c(0,3,2,0), family="serif" , lwd = 0.4) plot(as.phylo(hc.rsq), tip.color = colors, label.offset = 0.5, font = 2, cex = 0.9, edge.width = 0.4, direction = "downwards", no.margin = FALSE, align.tip.label = TRUE, adj = 0) axisPhylo( 2, las = 1, lwd = 0.4, cex.axis = 1.4, hadj = 0.8, tck = -0.01 ) ## cuts = c(xleft, ybottom, xright, ytop) hclust_rect(hc.rsq, k = 3L, border = c("green4", "blue", "red"), color = clusters.color, cuts = c(0.56, 1.4, 0.41, 106))

Here, we have use function as.phylo from the R package ape for better dendrogram visualization and function hclust_rect from MethylIT.utils R package to draw rectangles with background colors around the branches of a dendrogram highlighting the corresponding clusters.

MethylIT function pcaLDA will be used to perform the PCA and PCA + LDA analyses. The function returns a list of two objects: 1) ‘lda’: an object of class ‘lda’ from package ‘MASS’. 2) ‘pca’: an object of class ‘prcomp’ from package ‘stats’. For information on how to use these objects see ?lda and ?prcomp.

Unlike hierarchical clustering (HC), LDA is a supervised machine learning approach. So, we must provide a prior classification of the individuals, which can be derived, for example, from the HC, or from a previous exploratory analysis with PCA.

# A prior classification derived from HC grps <- cutree(hc, k = 3) grps[grep(1, grps)] <- "CT" grps[grep(2, grps)] <- "T1" grps[grep(3, grps)] <- "T2" grps <- factor(grps) ld <- pcaLDA(data = data.frame(dmgm), grouping = grps, n.pc = 3, max.pc = 3, scale = FALSE, center = FALSE, tol = 1e-6) summary(ld$pca)

```
## Importance of first k=3 (out of 15) components:
## PC1 PC2 PC3
## Standard deviation 301.3440 56.94793 55.39186
## Proportion of Variance 0.7817 0.02792 0.02641
## Cumulative Proportion 0.7817 0.80964 0.83606
```

We may retain enough components so that the cumulative percent of variance accounted for at least 70 to 80%. By setting \(scale=TRUE\) and \(center=TRUE\), we could have different results and would improve or not our results. In particular, these settings are essentials if the *N*-dimensional space is integrated by variables from different measurement scales/units, for example, Kg and g, or Kg and Km.

The individual coordinates in the principal components (PCs) are returned by function pcaLDA. In the current case, based on the cumulative proportion of variance, the two firsts PCs carried about the 84% of the total sample variance and could split the sample into meaningful groups.

pca.coord <- ld$pca$x pca.coord

```
## PC1 PC2 PC3
## C1 -63.94820 2.6773997 -2.0772261
## C2 -59.14588 0.3494248 2.9216406
## C3 -55.92878 -1.1138205 -0.4042085
## C4 -72.35580 0.6619679 -0.5541555
## C5 -85.67651 0.5268261 4.4506433
## T1 -330.24164 17.2511606 -16.7093598
## T2 -312.19387 -22.6638714 -21.9071693
## T3 -317.49366 27.0910087 -32.7557444
## T4 -324.52502 6.9421313 -30.1602641
## T5 -325.93819 12.7211961 -18.6529162
## T6 -372.08875 27.8938888 -67.5314185
## T7 -384.96008 -118.5328081 91.6230364
## T8 -377.15782 84.4173789 -46.8876766
## T9 -389.99841 89.0206347 152.9284288
## T10 -384.90662 -117.3005955 -35.9445392
```

Next, the graphic for individual coordinates in the two firsts PCs can be easily visualized now:

dt <- data.frame(pca.coord[, 1:2], subgrp = grps) p0 <- theme( axis.text.x = element_text( face = "bold", size = 18, color="black", # hjust = 0.5, vjust = 0.5, family = "serif", angle = 0, margin = margin(1,0,1,0, unit = "pt" )), axis.text.y = element_text( face = "bold", size = 18, color="black", family = "serif", margin = margin( 0,0.1,0,0, unit = "mm" )), axis.title.x = element_text(face = "bold", family = "serif", size = 18, color="black", vjust = 0 ), axis.title.y = element_text(face = "bold", family = "serif", size = 18, color="black", margin = margin( 0,2,0,0, unit = "mm" ) ), legend.title=element_blank(), legend.text = element_text(size = 20, face = "bold", family = "serif"), legend.position = c(0.88, 0.14), panel.border = element_rect(fill=NA, colour = "black",size=0.07), panel.grid.minor = element_line(color= "white",size = 0.2), axis.ticks = element_line(size = 0.1), axis.ticks.length = unit(0.5, "mm"), plot.margin = unit(c(1,1,0,0), "lines")) ggplot(dt, aes(x = PC1, y = PC2, colour = grps)) + geom_vline(xintercept = 0, color = "white", size = 1) + geom_hline(yintercept = 0, color = "white", size = 1) + geom_point(size = 6) + scale_color_manual(values = c("green4","blue","brown1")) + stat_ellipse(aes(x = PC1, y = PC2, fill = subgrp), data = dt, type = "norm", geom = "polygon", level = 0.5, alpha=0.2, show.legend = FALSE) + scale_fill_manual(values = c("green4","blue","brown1")) + p0

In the current case, better resolution is obtained with the linear discriminant functions, which is based on the three firsts PCs. Notice that the number principal components used the LDA step must be lower than the number of individuals (\(N\)) divided by 3: \(N/3\).

ind.coord <- predict(ld, newdata = data.frame(dmgm), type = "scores") dt <- data.frame(ind.coord, subgrp = grps) p0 <- theme( axis.text.x = element_text( face = "bold", size = 18, color="black", # hjust = 0.5, vjust = 0.5, family = "serif", angle = 0, margin = margin(1,0,1,0, unit = "pt" )), axis.text.y = element_text( face = "bold", size = 18, color="black", family = "serif", margin = margin( 0,0.1,0,0, unit = "mm" )), axis.title.x = element_text(face = "bold", family = "serif", size = 18, color="black", vjust = 0 ), axis.title.y = element_text(face = "bold", family = "serif", size = 18, color="black", margin = margin( 0,2,0,0, unit = "mm" ) ), legend.title=element_blank(), legend.text = element_text(size = 20, face = "bold", family = "serif"), legend.position = c(0.15, 0.83), panel.border = element_rect(fill=NA, colour = "black",size=0.07), panel.grid.minor = element_line(color= "white",size = 0.2), axis.ticks = element_line(size = 0.1), axis.ticks.length = unit(0.5, "mm"), plot.margin = unit(c(1,1,0,0), "lines")) ggplot(dt, aes(x = LD1, y = LD2, colour = grps)) + geom_vline(xintercept = 0, color = "white", size = 1) + geom_hline(yintercept = 0, color = "white", size = 1) + geom_point(size = 6) + scale_color_manual(values = c("green4","blue","brown1")) + stat_ellipse(aes(x = LD1, y = LD2, fill = subgrp), data = dt, type = "norm", geom = "polygon", level = 0.5, alpha=0.2, show.legend = FALSE) + scale_fill_manual(values = c("green4","blue","brown1")) + p0

Liese, Friedrich, and Igor Vajda. 2006. “On divergences and informations in statistics and information theory.” *IEEE Transactions on Information Theory* 52 (10): 4394–4412. https://doi.org/10.1109/TIT.2006.881731.

Stevens, James P. 2009. *Applied Multivariate Statistics for the Social Sciences*. Fifth Edit. Routledge Academic.

Here is the output of `sessionInfo()`

on the system on which this document was compiled running pandoc 2.7.3:

```
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ape_5.4 ggplot2_3.3.1 MethylIT.utils_0.1
## [4] MethylIT_0.3.2.1 rtracklayer_1.48.0 GenomicRanges_1.40.0
## [7] GenomeInfoDb_1.24.0 IRanges_2.22.2 S4Vectors_0.26.1
## [10] BiocGenerics_0.34.0
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.4-1 ellipsis_0.3.1
## [3] class_7.3-17 modeltools_0.2-23
## [5] mclust_5.4.6 rprojroot_1.3-2
## [7] XVector_0.28.0 ArgumentCheck_0.10.2
## [9] fs_1.4.1 rstudioapi_0.11
## [11] farver_2.0.3 flexmix_2.3-15
## [13] bit64_0.9-7 AnnotationDbi_1.50.0
## [15] prodlim_2019.11.13 lubridate_1.7.9
## [17] codetools_0.2-16 splines_4.0.0
## [19] knitr_1.28 Formula_1.2-3
## [21] pROC_1.16.2 Rsamtools_2.4.0
## [23] caret_6.0-86 annotate_1.66.0
## [25] kernlab_0.9-29 compiler_4.0.0
## [27] backports_1.1.7 assertthat_0.2.1
## [29] Matrix_1.2-18 htmltools_0.4.0
## [31] tools_4.0.0 gtable_0.3.0
## [33] glue_1.4.1 GenomeInfoDbData_1.2.3
## [35] reshape2_1.4.4 dplyr_1.0.0
## [37] Rcpp_1.0.4.6 Biobase_2.48.0
## [39] pkgdown_1.5.1 vctrs_0.3.1
## [41] Biostrings_2.56.0 nlme_3.1-148
## [43] iterators_1.0.12 betareg_3.1-3
## [45] lmtest_0.9-37 timeDate_3043.102
## [47] gower_0.2.1 xfun_0.14
## [49] stringr_1.4.0 lifecycle_0.2.0
## [51] XML_3.99-0.3 zlibbioc_1.34.0
## [53] MASS_7.3-51.6 zoo_1.8-8
## [55] scales_1.1.1 ipred_0.9-9
## [57] SummarizedExperiment_1.18.1 sandwich_2.5-1
## [59] yaml_2.2.1 memoise_1.1.0
## [61] rpart_4.1-15 segmented_1.1-0
## [63] stringi_1.4.6 RSQLite_2.2.0
## [65] genefilter_1.70.0 desc_1.2.0
## [67] foreach_1.5.0 e1071_1.7-3
## [69] BiocParallel_1.22.0 lava_1.6.7
## [71] rlang_0.4.6 pkgconfig_2.0.3
## [73] matrixStats_0.56.0 bitops_1.0-6
## [75] evaluate_0.14 lattice_0.20-41
## [77] purrr_0.3.4 labeling_0.3
## [79] GenomicAlignments_1.24.0 recipes_0.1.12
## [81] bit_1.1-15.2 tidyselect_1.1.0
## [83] plyr_1.8.6 magrittr_1.5
## [85] nls2_0.2 R6_2.4.1
## [87] generics_0.0.2 DelayedArray_0.14.0
## [89] DBI_1.1.0 pillar_1.4.4
## [91] withr_2.2.0 mixtools_1.2.0
## [93] survival_3.1-12 RCurl_1.98-1.2
## [95] nnet_7.3-14 tibble_3.0.1
## [97] crayon_1.3.4 rmarkdown_2.2
## [99] grid_4.0.0 minpack.lm_1.2-1
## [101] data.table_1.12.8 blob_1.2.1
## [103] ModelMetrics_1.2.2.2 digest_0.6.25
## [105] xtable_1.8-4 munsell_0.5.0
```