Given a the methylation levels of two individual, the function computes the information divergence between methylation levels.
estimateHellingerDiv(p, n = NULL)
A numerical vector of the methylation levels p = c(p1, p2) of individuals 1 and 2.
if supplied, it is a vector of integers denoting the coverages used in the estimation of the methylation levels.
The Hellinger divergence value for the given methylation levels is returned
The methylation level \(p_ij\) for an individual \(i\) at cytosine site \(j\) corresponds to a probability vector \(p^ij = (p_ij, 1 - p_ij)\). Then, the information divergence between methylation levels \(p^1j\) and \(p^2j\) from individuals 1 and 2 at site \(j\) is the divergence between the vectors \(p^1j = (p_1j, 1 - p_1j)\) and \(p^2j = (p_2j, 1 - p_2j)\). If the vector of coverage is supplied, then the information divergence is estimated according to the formula:
$$hdiv = 2*(n_1 + 1)*(n_2 + 1)*((sqrt(p_1j) - sqrt(p_2j))^2 + (sqrt(1 - p_1j) - sqrt(1 - p_2j))^2)/(n_1 + n_2 + 2)$$
This formula corresponds to Hellinger divergence as given in the first formula from Theorem 1 from reference 1. Otherwise:
$$hdiv = (sqrt(p_1j) - sqrt(p_2j))^2 + (sqrt(1 - p_1j) - sqrt(1 - p_2j))^2$$
Missing methylation levels, reported as NA or NaN, are replaced with zero.
' 1. Basu A., Mandal A., Pardo L (2010) Hypothesis testing for two discrete populations based on the Hellinger distance. Stat Probab Lett 80: 206-214.
p <- c(0.5, 0.5)
estimateHellingerDiv(p)
#> [1] 0