Hellinger divergence of methylation levels — estimateHellingerDiv • MethylIT

Given a the methylation levels of two individual, the function computes the information divergence between methylation levels.

estimateHellingerDiv(p, n = NULL)

Arguments

p: A numerical vector of the methylation levels p = c(p1, p2) of individuals 1 and 2.
n: if supplied, it is a vector of integers denoting the coverages used in the estimation of the methylation levels.

Value

The Hellinger divergence value for the given methylation levels is returned

Details

The methylation level $p_ij$ for an individual $i$ at cytosine site $j$ corresponds to a probability vector $p^ij = (p_ij, 1 - p_ij)$. Then, the information divergence between methylation levels $p^1j$ and $p^2j$ from individuals 1 and 2 at site $j$ is the divergence between the vectors $p^1j = (p_1j, 1 - p_1j)$ and $p^2j = (p_2j, 1 - p_2j)$. If the vector of coverage is supplied, then the information divergence is estimated according to the formula:

$$hdiv = 2*(n_1 + 1)*(n_2 + 1)*((sqrt(p_1j) - sqrt(p_2j))^2 + (sqrt(1 - p_1j) - sqrt(1 - p_2j))^2)/(n_1 + n_2 + 2)$$

This formula corresponds to Hellinger divergence as given in the first formula from Theorem 1 from reference 1. Otherwise:

$$hdiv = (sqrt(p_1j) - sqrt(p_2j))^2 + (sqrt(1 - p_1j) - sqrt(1 - p_2j))^2$$

Missing methylation levels, reported as NA or NaN, are replaced with zero.

References

' 1. Basu A., Mandal A., Pardo L (2010) Hypothesis testing for two discrete populations based on the Hellinger distance. Stat Probab Lett 80: 206-214.

Examples

    p <- c(0.5, 0.5)
    estimateHellingerDiv(p)
#> [1] 0