Given a the methylation levels of two individual, the function computes the information divergence between methylation levels.

estimateHellingerDiv(p, n = NULL)

## Arguments

p

A numerical vector of the methylation levels p = c(p1, p2) of individuals 1 and 2.

n

if supplied, it is a vector of integers denoting the coverages used in the estimation of the methylation levels.

## Value

The Hellinger divergence value for the given methylation levels is returned

## Details

The methylation level $$p_ij$$ for an individual $$i$$ at cytosine site $$j$$ corresponds to a probability vector $$p^ij = (p_ij, 1 - p_ij)$$. Then, the information divergence between methylation levels $$p^1j$$ and $$p^2j$$ from individuals 1 and 2 at site $$j$$ is the divergence between the vectors $$p^1j = (p_1j, 1 - p_1j)$$ and $$p^2j = (p_2j, 1 - p_2j)$$. If the vector of coverage is supplied, then the information divergence is estimated according to the formula:

$$hdiv = 2*(n_1 + 1)*(n_2 + 1)*((sqrt(p_1j) - sqrt(p_2j))^2 + (sqrt(1 - p_1j) - sqrt(1 - p_2j))^2)/(n_1 + n_2 + 2)$$

This formula corresponds to Hellinger divergence as given in the first formula from Theorem 1 from reference 1. Otherwise:

$$hdiv = (sqrt(p_1j) - sqrt(p_2j))^2 + (sqrt(1 - p_1j) - sqrt(1 - p_2j))^2$$

Missing methylation levels, reported as NA or NaN, are replaced with zero.

## References

' 1. Basu A., Mandal A., Pardo L (2010) Hypothesis testing for two discrete populations based on the Hellinger distance. Stat Probab Lett 80: 206-214.

## Examples

    p <- c(0.5, 0.5)
estimateHellingerDiv(p)
#> [1] 0