Given a the methylation levels of two individual, the function computes the information divergence between methylation levels.

estimateHellingerDiv(p, n = NULL)

Arguments

p

A numerical vector of the methylation levels p = c(p1, p2) of individuals 1 and 2.

n

if supplied, it is a vector of integers denoting the coverages used in the estimation of the methylation levels.

Value

The Hellinger divergence value for the given methylation levels is returned

Details

The methylation level \(p_ij\) for an individual \(i\) at cytosine site \(j\) corresponds to a probability vector \(p^ij = (p_ij, 1 - p_ij)\). Then, the information divergence between methylation levels \(p^1j\) and \(p^2j\) from individuals 1 and 2 at site \(j\) is the divergence between the vectors \(p^1j = (p_1j, 1 - p_1j)\) and \(p^2j = (p_2j, 1 - p_2j)\). If the vector of coverage is supplied, then the information divergence is estimated according to the formula:

$$hdiv = 2*(n_1 + 1)*(n_2 + 1)*((sqrt(p_1j) - sqrt(p_2j))^2 + (sqrt(1 - p_1j) - sqrt(1 - p_2j))^2)/(n_1 + n_2 + 2)$$

This formula corresponds to Hellinger divergence as given in the first formula from Theorem 1 from reference 1. Otherwise:

$$hdiv = (sqrt(p_1j) - sqrt(p_2j))^2 + (sqrt(1 - p_1j) - sqrt(1 - p_2j))^2$$

Missing methylation levels, reported as NA or NaN, are replaced with zero.

References

' 1. Basu A., Mandal A., Pardo L (2010) Hypothesis testing for two discrete populations based on the Hellinger distance. Stat Probab Lett 80: 206-214.

Examples

    p <- c(0.5, 0.5)
    estimateHellingerDiv(p)
#> [1] 0