Dirichlet Distribution — dirichlet • usefr

Probability density function (PDF), cumulative density function (CDF), and random generation for the Dirichlet distribution with parameters \(\lapha = \alpha_1, ... , \alpha_n\), all positive reals, in short, \(Dir(\alpha)\). Dirichlet distribution is a family of continuous multivariate probability distributions, a multivariate generalization of the Beta distribution. The PDF is a multidimensional generalization of the beta distribution.

ddirichlet(x, alpha, log = FALSE)

pdirichlet(q, alpha, lower.tail = TRUE, log.p = FALSE, log.base = exp(1), ...)

rdirichlet(n, alpha)

Arguments

x: A numerical matrix \(x_ij >= 0\) or vector \(x_i >= 0\) of observed counts or relative frequencies \(\sum x_i = 1\).
alpha: Dirichlet distribution's parameters. A numerical parameter vector, strictly positive (default 1): \(\lapha = \alpha_1, ... , \alpha_n\).
q: numeric vector.
lower.tail: logical; if TRUE (default), probabilities are \(P(X<=x)\), otherwise, \(P(X > x)\)
log.p: logical; if TRUE, probabilities/densities p are returned as log(p).
log.base: (Optional) The logarithm's base if log.p = TRUE.
...: Additional named or unnamed arguments to be passed to function hcubature used in functions 'pdirichlet' and 'qdirichlet'.
n: number of observations.
p: vector of probabilities.
interval: The interval whether to search for the quantile (see Details).
tol: The desired accuracy (convergence tolerance) for the quantile estimation.

Value

GG PDF values (3-parameters or 4-parameters) for ddirichlet, GG probability for pdirichlet, quantiles or GG random generated values for rdirichlet.

Details

The computation of the function 'pdirichlet' is accomplished using Monte Carlo integration of the density 'dirichlet'. The estimation are based on the fact that if a variable \(x = (x_1, x_2, ...x_n)\) follows Dirichlet Distribution with parameters \(\alpha = \alpha_1, ... , \alpha_n\) (all positive reals), in short, \(x ~ Dir(\alpha)\), then \(x_i ~ Beta(\alpha_i, \alpha_0 - \alpha_i)\), where Beta(.) stands for the Beta distribution and \(\alpha_0 = \sum \alpha_i\).

The density is computed directly from the logarithm of Dirichlet PDF \(log(Dir(\alpha))\).

if \(x = (x_1, x_2, ...x_n)\) are observed counts, then they are transformed estimated probability \(p_i\) of the observation \(x_i\) is estimated as: \(p_i = (x_i + \alpha_i)/(\sum x_i + sum \alpa_i)\).

References

1. Sjolander K, Karplus K, Brown M, Hughey R, Krogh A, Saira Mian I, et al. Dirichlet mixtures: A method for improved detection of weak but significant protein sequence homology. Bioinformatics. 1996;12: 327–345. doi:10.1093/bioinformatics/12.4.327.

Author

Robersy Sanchez (https://genomaths.com).

Examples

## A random generation of numerical vectors
x <- rdirichlet(n = 1e3, alpha = rbind(
    c(2.1, 3.1, 1.2),
    c(2.5, 1.9, 1.6)
))
head(x[[1]])
#>             a1        a2         a3
#> [1,] 0.2890342 0.5430619 0.16790386
#> [2,] 0.0636819 0.6853852 0.25093295
#> [3,] 0.2706472 0.7128996 0.01645324
#> [4,] 0.2522002 0.7219474 0.02585242
#> [5,] 0.3963101 0.5590376 0.04465233
#> [6,] 0.3901223 0.3815010 0.22837675
lapply(x, estimateDirichDist)
#> $alpha1
#> $alpha
#> [1] 2.152039 3.124078 1.216973
#> 
#> $marginals
#> $marginals$a1
#>        Estimate  Std.Error t_value Pr(>|t|) Adj.R.Square       rho R.Cross.val
#> shape1 2.152039 0.01178206 182.654   <1e-16    0.9999982 0.9999982   0.9996341
#> shape2 4.310442 0.03163860 136.240   <1e-16           NA        NA          NA
#>              AIC       BIC    n
#> shape1 -6838.737 -6824.014 1000
#> shape2        NA        NA   NA
#> 
#> $marginals$a2
#>        Estimate  Std.Error  t_value Pr(>|t|) Adj.R.Square      rho R.Cross.val
#> shape1 3.124078 0.01632555 191.3613   <1e-16     0.999999 0.999999   0.9998268
#> shape2 3.365139 0.03163860 106.3618   <1e-16           NA       NA          NA
#>              AIC       BIC    n
#> shape1 -7592.255 -7577.532 1000
#> shape2        NA        NA   NA
#> 
#> $marginals$a3
#>        Estimate   Std.Error  t_value Pr(>|t|) Adj.R.Square       rho
#> shape1 1.216973 0.007329763 166.0317   <1e-16    0.9999993 0.9999993
#> shape2 5.268830 0.031638600 166.5317   <1e-16           NA        NA
#>        R.Cross.val      AIC       BIC    n
#> shape1   0.9997937 -7382.74 -7368.017 1000
#> shape2          NA       NA        NA   NA
#> 
#> 
#> attr(,"class")
#> [1] "DirchModel" "list"      
#> 
#> $alpha2
#> $alpha
#> [1] 2.472685 2.047425 1.608069
#> 
#> $marginals
#> $marginals$a1
#>        Estimate  Std.Error  t_value Pr(>|t|) Adj.R.Square       rho R.Cross.val
#> shape1 2.472685 0.01459795 169.3857   <1e-16    0.9999991 0.9999991   0.9998464
#> shape2 3.376989 0.03163860 106.7364   <1e-16           NA        NA          NA
#>              AIC       BIC    n
#> shape1 -7707.521 -7692.798 1000
#> shape2        NA        NA   NA
#> 
#> $marginals$a2
#>        Estimate  Std.Error  t_value Pr(>|t|) Adj.R.Square       rho R.Cross.val
#> shape1 2.047425 0.01133689 180.5986   <1e-16    0.9999989 0.9999989   0.9997778
#> shape2 4.526918 0.03163860 143.0821   <1e-16           NA        NA          NA
#>              AIC       BIC    n
#> shape1 -7334.931 -7320.208 1000
#> shape2        NA        NA   NA
#> 
#> $marginals$a3
#>        Estimate   Std.Error  t_value Pr(>|t|) Adj.R.Square      rho R.Cross.val
#> shape1 1.608069 0.009876014 162.8257   <1e-16     0.999999 0.999999   0.9997978
#> shape2 4.495452 0.031638600 142.0876   <1e-16           NA       NA          NA
#>              AIC       BIC    n
#> shape1 -7424.286 -7409.563 1000
#> shape2        NA        NA   NA
#> 
#> 
#> attr(,"class")
#> [1] "DirchModel" "list"      
#> 

### Density computation
alpha <- cbind(1:10, 5, 10:1)
x <- rdirichlet(10, c(5, 5, 10))
ddirichlet(x = x, alpha = alpha)
#>  [1]  3.411141186  1.917849181  7.238696827 11.337708897 11.577649457
#>  [6]  9.021077966  0.081503238  0.014539829  0.004065136  0.001103315

### Or just one vector alpha
x <- rdirichlet(n = 10, alpha = c(2.1, 3.1, 1.2))
ddirichlet(x = x, alpha = c(2.1, 3.1, 1.2))
#>  [1] 4.9390887 0.5844373 5.4481662 5.5648085 2.7981867 5.8982327 1.3307699
#>  [8] 2.5807305 2.2030758 6.1053033

## Compute Dirichlet probability
set.seed(123)
q <- rdirichlet(10, alpha = c(2.1, 3.2, 3.3))
pdirichlet(q, alpha = c(2.1, 3.2, 3.3))
#>  [1] 0.376861070 0.472692168 0.033818798 0.310649407 0.428081189 0.257884996
#>  [7] 0.010027824 0.002683565 0.043098143 0.235638945