Probability density function (PDF), cumulative density function (CDF), and random generation for the Dirichlet distribution with parameters \(\lapha = \alpha_1, ... , \alpha_n\), all positive reals, in short, \(Dir(\alpha)\). Dirichlet distribution is a family of continuous multivariate probability distributions, a multivariate generalization of the Beta distribution. The PDF is a multidimensional generalization of the beta distribution.
ddirichlet(x, alpha, log = FALSE)
pdirichlet(q, alpha, lower.tail = TRUE, log.p = FALSE, log.base = exp(1), ...)
rdirichlet(n, alpha)
A numerical matrix \(x_ij >= 0\) or vector \(x_i >= 0\) of observed counts or relative frequencies \(\sum x_i = 1\).
Dirichlet distribution's parameters. A numerical parameter vector, strictly positive (default 1): \(\lapha = \alpha_1, ... , \alpha_n\).
numeric vector.
logical; if TRUE (default), probabilities are \(P(X<=x)\), otherwise, \(P(X > x)\)
logical; if TRUE, probabilities/densities p are returned as log(p).
(Optional) The logarithm's base if log.p = TRUE.
Additional named or unnamed arguments to be passed to function
hcubature
used in functions 'pdirichlet' and
'qdirichlet'.
number of observations.
vector of probabilities.
The interval whether to search for the quantile (see Details).
The desired accuracy (convergence tolerance) for the quantile estimation.
GG PDF values (3-parameters or 4-parameters) for ddirichlet, GG probability for pdirichlet, quantiles or GG random generated values for rdirichlet.
The computation of the function 'pdirichlet' is accomplished using Monte Carlo integration of the density 'dirichlet'. The estimation are based on the fact that if a variable \(x = (x_1, x_2, ...x_n)\) follows Dirichlet Distribution with parameters \(\alpha = \alpha_1, ... , \alpha_n\) (all positive reals), in short, \(x ~ Dir(\alpha)\), then \(x_i ~ Beta(\alpha_i, \alpha_0 - \alpha_i)\), where Beta(.) stands for the Beta distribution and \(\alpha_0 = \sum \alpha_i\).
The density is computed directly from the logarithm of Dirichlet PDF \(log(Dir(\alpha))\).
if \(x = (x_1, x_2, ...x_n)\) are observed counts, then they are transformed estimated probability \(p_i\) of the observation \(x_i\) is estimated as: \(p_i = (x_i + \alpha_i)/(\sum x_i + sum \alpa_i)\).
1. Sjolander K, Karplus K, Brown M, Hughey R, Krogh A, Saira Mian I, et al. Dirichlet mixtures: A method for improved detection of weak but significant protein sequence homology. Bioinformatics. 1996;12: 327–345. doi:10.1093/bioinformatics/12.4.327.
## A random generation of numerical vectors
x <- rdirichlet(n = 1e3, alpha = rbind(
c(2.1, 3.1, 1.2),
c(2.5, 1.9, 1.6)
))
head(x[[1]])
#> a1 a2 a3
#> [1,] 0.2890342 0.5430619 0.16790386
#> [2,] 0.0636819 0.6853852 0.25093295
#> [3,] 0.2706472 0.7128996 0.01645324
#> [4,] 0.2522002 0.7219474 0.02585242
#> [5,] 0.3963101 0.5590376 0.04465233
#> [6,] 0.3901223 0.3815010 0.22837675
lapply(x, estimateDirichDist)
#> $alpha1
#> $alpha
#> [1] 2.152039 3.124078 1.216973
#>
#> $marginals
#> $marginals$a1
#> Estimate Std.Error t_value Pr(>|t|) Adj.R.Square rho R.Cross.val
#> shape1 2.152039 0.01178206 182.654 <1e-16 0.9999982 0.9999982 0.9996341
#> shape2 4.310442 0.03163860 136.240 <1e-16 NA NA NA
#> AIC BIC n
#> shape1 -6838.737 -6824.014 1000
#> shape2 NA NA NA
#>
#> $marginals$a2
#> Estimate Std.Error t_value Pr(>|t|) Adj.R.Square rho R.Cross.val
#> shape1 3.124078 0.01632555 191.3613 <1e-16 0.999999 0.999999 0.9998268
#> shape2 3.365139 0.03163860 106.3618 <1e-16 NA NA NA
#> AIC BIC n
#> shape1 -7592.255 -7577.532 1000
#> shape2 NA NA NA
#>
#> $marginals$a3
#> Estimate Std.Error t_value Pr(>|t|) Adj.R.Square rho
#> shape1 1.216973 0.007329763 166.0317 <1e-16 0.9999993 0.9999993
#> shape2 5.268830 0.031638600 166.5317 <1e-16 NA NA
#> R.Cross.val AIC BIC n
#> shape1 0.9997937 -7382.74 -7368.017 1000
#> shape2 NA NA NA NA
#>
#>
#> attr(,"class")
#> [1] "DirchModel" "list"
#>
#> $alpha2
#> $alpha
#> [1] 2.472685 2.047425 1.608069
#>
#> $marginals
#> $marginals$a1
#> Estimate Std.Error t_value Pr(>|t|) Adj.R.Square rho R.Cross.val
#> shape1 2.472685 0.01459795 169.3857 <1e-16 0.9999991 0.9999991 0.9998464
#> shape2 3.376989 0.03163860 106.7364 <1e-16 NA NA NA
#> AIC BIC n
#> shape1 -7707.521 -7692.798 1000
#> shape2 NA NA NA
#>
#> $marginals$a2
#> Estimate Std.Error t_value Pr(>|t|) Adj.R.Square rho R.Cross.val
#> shape1 2.047425 0.01133689 180.5986 <1e-16 0.9999989 0.9999989 0.9997778
#> shape2 4.526918 0.03163860 143.0821 <1e-16 NA NA NA
#> AIC BIC n
#> shape1 -7334.931 -7320.208 1000
#> shape2 NA NA NA
#>
#> $marginals$a3
#> Estimate Std.Error t_value Pr(>|t|) Adj.R.Square rho R.Cross.val
#> shape1 1.608069 0.009876014 162.8257 <1e-16 0.999999 0.999999 0.9997978
#> shape2 4.495452 0.031638600 142.0876 <1e-16 NA NA NA
#> AIC BIC n
#> shape1 -7424.286 -7409.563 1000
#> shape2 NA NA NA
#>
#>
#> attr(,"class")
#> [1] "DirchModel" "list"
#>
### Density computation
alpha <- cbind(1:10, 5, 10:1)
x <- rdirichlet(10, c(5, 5, 10))
ddirichlet(x = x, alpha = alpha)
#> [1] 3.411141186 1.917849181 7.238696827 11.337708897 11.577649457
#> [6] 9.021077966 0.081503238 0.014539829 0.004065136 0.001103315
### Or just one vector alpha
x <- rdirichlet(n = 10, alpha = c(2.1, 3.1, 1.2))
ddirichlet(x = x, alpha = c(2.1, 3.1, 1.2))
#> [1] 4.9390887 0.5844373 5.4481662 5.5648085 2.7981867 5.8982327 1.3307699
#> [8] 2.5807305 2.2030758 6.1053033
## Compute Dirichlet probability
set.seed(123)
q <- rdirichlet(10, alpha = c(2.1, 3.2, 3.3))
pdirichlet(q, alpha = c(2.1, 3.2, 3.3))
#> [1] 0.376861070 0.472692168 0.033818798 0.310649407 0.428081189 0.257884996
#> [7] 0.010027824 0.002683565 0.043098143 0.235638945