We have developed a novel methylome analysis procedure, Methyl-IT, based on signal detection and machine learning. Methylation analysis is confronted in Methyl-IT as a signal detection problem, and the method was designed to discriminate methylation regulatory signal from background noise induced by Brownian motion and thermal fluctuations. Our group has proposed an information thermodynamics approach to investigate genome-wide methylation patterning based on the statistical mechanical effect of methylation on DNA molecules (1-3). The information thermodynamics-based approach is postulated to provide greater sensitivity for resolving true signal from the thermodynamic background within the methylome (1). This theory provides the knowledge on the family of probability distributions that better fit the methylation signals when expressed in terms of information divergences of methylation levels. Because the biological signal created within the dynamic methylome environment characteristic of organisms is not free from background noise, the approach, designated Methyl-IT, includes an application of signal detection theory.
It is important to highlight that, to date, MethylIT approach is the only one methylation analysis approach sets on well established and rigorous statistical physics (and thermodynamics) ground. The study (1) implies that traditional statistical (ad hoc) approaches must not be applied to the analysis of methylation process, since these statistical approaches ignores the continuous action of the Second Law of Thermodynamics on living organisms, and their assumptions are not valid in the thermal bath of the cells.
A basic requirement for the application of signal detection is the knowledge of the background noise probability distribution. A generalized gamma (GG) probability distribution model can be deduced on a statistical mechanical/thermodynamics basis for DNA methylation induced by thermal fluctuations (1), which leads to the particular cases of members of GG probability distribution family. For example, assuming that this background methylation variation is consistent with a Poisson process, it can be distinguished from variation associated with methylation regulatory machinery, which is non-independent for all genomic regions (1). An information-theoretic divergence to express the variation in methylation induced by background thermal fluctuations will follow a probability distribution model, member of the GG family, provided that it is proportional to minimum energy dissipated per bit of information from methylation change. The information thermodynamics model was previously verified with more than 150 Arabidopsis and more than 90 human methylome datasets (3).
With Methyl-IT R package we are providing the functions from the R scripts used in the manuscript (1), as well as, additional functions that will be used in further studies. The application of the information thermodynamics of cytosine DNA methylation is not limited to the current methylome analysis, which is only a particular application. The theory permits us the study of plant and animal methylomes in the framework of a communication system (3), where cytosine DNA methylation has the dual roles of stabilizing the DNA molecule and to carry the regulatory signals.
The application of Methyl-T signal detection-machine learning approach to methylation analysis of whole genome bisulfite sequencing (WGBS) data permits a high level of methylation signal resolution in cancer-associated genes and pathways (4), as well as, treatment-associated genes and pathways in plants (5).
Notice that MethylIT pipeline is not limited to bisulfite sequencing experimental data. Any experimental data set on summarized GRanges-class or GRangesList-class objects carrying the counts of methylated (mC) and unmethylated (uC) cytosines in their metacolumns can be analyzed. As a matter of fact, since cell’s developmental stages in a tissue are not synchronized, but in different ontogenetic development stages, the methylation status at each cytosine site would dynamically vary from cell to cell. Therefore, at tissue level, the methylation status can be quantitaivally represented by a probability, i.e., a number between 0 and 1, also known as methylation level.
Currently, the package is actively used in methylation analyses. Nevertheless, improvements are regularly introduced. Watch this repo or check for updates. THE PACKAGE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. The beta version is frequently updated. We are using the version notation: 0.3.2.x.y, where digit ‘y’ is frequently changed.
Be sure that both the R and bioconductor packages are up to date. To get the latest version of Bioconductor by starting R and entering the commands:
if (!requireNamespace("BiocManager")) install.packages("BiocManager") ::install()BiocManager
Install OS packages:
-y install libcurl-devel.x86_64 openssl-devel.x86_64 libxml2-devel mariadb-devel.x86_64sudo yum
-dev libcurl4-openssl-dev libssl-dev libxml2-dev git build-essentialsudo apt install libmysqlclient
Install R dependencies:
::install(c("caret", "Epi", "e1071", "minpack.lm", "nls2", BiocManager"caTools", "RCurl",'GenomicRanges','BiocParallel', 'biovizBase', 'genefilter', 'data.table', 'dplyr', 'GenomeInfoDb', 'randomForest', dependencies=TRUE))
You can install MethylIT from GitHub:
# Master stable version (0.3.2.7) ::install("genomaths/MethylIT")BiocManager
The Beta (developing) version:
::install('genomaths/MethylIT', BiocManagerref = "MethylIT_0.3.2_beta")
Whole documentation, manual and several examples, are available at Methyl-IT website
You are free to copy, distribute and transmit MethylIT for non-commercial purposes. Any use of MethylIT for a commercial purpose is subject to and requires a special license.
For questions about the MethylIT project, contact email@example.com
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.