Given a the methylation levels from two individuals at a given
cytosine site, this function computes the J information divergence
(*JD*) between methylation levels. The motivation to introduce
*JD* in Methyl-IT is founded on:

It is a symmetrised form of Kullback–Leibler divergence (\(D_{KL}\)). Kullback and Leibler themselves actually defined the divergence as: \(D_{KL}(P || Q) + D_{KL}(Q || P)\), which is symmetric and nonnegative, where the probability distributions

*P*and*Q*are defined on the same probability space (see reference (1) and Wikipedia).In general,

*JD*is highly correlated with Hellinger divergence, which is the main divergence currently used in Methyl-IT (see examples for function`estimateDivergence`

.By construction, the unit of measurement of

*JD*is: bit of information, which set the basis for further information-thermodynamics analyses.

```
estimateJDiv(
p,
logbase = 2,
stat = FALSE,
n = NULL,
output = c("single", "all")
)
```

- p
A numerical vector of the methylation levels \(p = (p1, p2)\) from individuals 1 and 2.

- logbase
Logarithm base used to compute the JD. Logarithm base 2 is used as default. Use logbase = exp(1) for natural logarithm.

- stat
logical(1). Whether to compute the statistic with asymptotic Chi-squared distribution with one degree of freedom (details).

- n
An integer vector \(n = (n_1, n_2)\) carrying the coverage used in the computation of the methylation levels \(p = (p_1, p_2)\). Default is NULL. If provided, then the coverages are used to compute J divergence statistic as reported in reference (3), which follows Chi-squared probablity distribution.

- output
Optional. If 'stat = TRUE', whether to return \(JD\) and its asymptotic chi-squared statistic.

The J divergence value for the given methylation levels is returned

The methylation levels \(p_{ij}\), at a given cytosine site
*i* from an individual *j*, lead to the probability vectors
\(p^{ij} = (p_{ij}, 1 - p_{ij})\). Then, the J-information divergence
between the methylation levels \(p_{ij}\) and \(q^i = (q_i, 1 - q_i)\),
used as reference, is given by (1-2):

$$JD(p^{ij}, q^i) = ((p_{ij} - q_i) * log(p_{ij}/q_i) + (q_i - p_{ij}) * log((1 - p_{ij})/(1 - q_i)))/2$$

Missing methylation levels, reported as NA or NaN, are replaced with zero.

The statistic with asymptotic Chi-squared distribution is based on the statistic suggested by Kupperman (1957) (2) for \(D_{KL}\) and commented in references (3-4). That is,

$$ 2*(n_{1} + 1)*(n_{2} + 1)*JD(p^{ij}, q^i)/(n_1 + n_2 + 2)$$

Where \(n_{1}\) and \(n_{2}\) are the total counts (coverage in the case of methylation) used to compute the probabilities \(p_{ij}\) and \(q_i\). A basic Bayesian correction is added to prevent zero counts.

Kullback S, Leibler RA. On Information and Sufficiency. Ann Math Stat. 1951;22: 79–86. doi:10.1214/aoms/1177729694.

Kupperman, M., 1957. Further application to information theory to multivariate analysis and statistical inference. Ph.D. Dissertation, George Washington University.

Salicrú M, Morales D, Menéndez ML, Pardo L. On the applications of divergence type measures in testing statistical hypotheses. Journal of Multivariate Analysis. 1994. pp. 372–391. doi:10.1006/jmva.1994.1068.

Basu A, Mandal A, Pardo L. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat Probab Lett. Elsevier B.V.; 2010;80: 206–214. doi:10.1016/j.spl.2009.10.008.

J. K. Chung, P. L. Kannappan, C. T. Ng, P. K. Sahoo, Measures of distance between probability distributions. J. Math. Anal. Appl. 138, 280–292 (1989).

Lin J. Divergence Measures Based on the Shannon Entropy. IEEE Trans Inform Theory, 1991, 37:145–51.

Sanchez R, Mackenzie SA. Information thermodynamics of cytosine DNA methylation. PLoS One, 2016, 11:e0150427.

https://en.wikipedia.org/wiki/Kullback-Leibler_divergence
for more details, and `estimateDivergence`

for an example of
using it.