Mutual information Based on Multivariate Distributions Constructed from Copulas

Computes the mutual information for pairwise x and y marginal values based on their multivariate distribution constructed from a copula.

mutualinf(
  x,
  y,
  copula = NULL,
  margins = NULL,
  paramMargins = NULL,
  method = "ml",
  ties.method = "max"
)

Arguments

x, y: marginal variates
copula: A copula object from class Mvdc or string specifying all the name for a copula from package copula-package.
margins: A character vector specifying all the parametric marginal distributions. See details below.
paramMargins: A list whose each component is a list (or numeric vectors) of named components, giving the parameter values of the marginal distributions. See details below.
method: A character string specifying the estimation method to be used to estimate the dependence parameter(s) (if the copula needs to be estimated) see fitCopula.

Value

A list with a data frame carrying the estimated mutual information for each (x, y) pair, the joint and marginal probabilities, and the "mvdc" copula object.

Details

The mutual information of a pairwise x and y marginal values is defined as:

$$I{x, y} = log(P(x,y)) - (log(P_1(x)) + log(P_2(y)))$$

where P(x,y) is the multivariate distribution constructed from a copula, and P_1(x) and P_2(y) are the marginal CDFs.

The values $I{x, y}$ expresses a measurement of the relative dependece/independece of x and y at the specified point value.

Notice that the above definition expresses the differences between two uncertainty variations. So, for values $I{x, y} > 0$, we shall say that at point (x, y) there is a gain of information for the association of the subjacent stochastic processes generating x and y in respect to the independent processes. Otherwise, for values $I{x, y} < 0$ we shall say that at point (x, y) there is a loss of information for the association of the subjacent stochastic process generating x and y in respect to the independent processes. Or, equivallently, there is a gain of information for the independent processes in respect to their association.

Examples

require(stats)
set.seed(12) # set a seed for random number generation
## Random generation of a Normal distributed marginal variate
X <- rnorm(2000, mean = 1, sd = 0.2)

## Random generation of a Weibull-3P distributed marginal variate
Y <- X + rweibull3p(2000, shape = 2, scale = 0.85, mu = 1)

## Correlation test
cor.test(X, Y, method = "spearman")
#> 
#> 	Spearman's rank correlation rho
#> 
#> data:  X and Y
#> S = 742966548, p-value < 2.2e-16
#> alternative hypothesis: true rho is not equal to 0
#> sample estimates:
#>       rho 
#> 0.4427749 
#> 

## Non-linear model fit for 'Y' distribution values
fitY <- fitCDF(Y, distNames = 12) # 3P Weibull distribution model
#> 
#> *** Fitting 3P Weibull distribution ... 
#> .Fitting Done.
#> ** Done ***
coefs <- coef(fitY$bestfit) # model coefficients

## Goodness-of-fit test for the  Weibull-3P distribution model
mcgoftest(
    varobj = Y, distr = "weibull3p", pars = coefs, num.sampl = 99,
    sample.size = 1999, stat = "chisq", num.cores = 4, breaks = 200,
    seed = 123
)
#> *** Permutation GoF testing based on Pearson's Chi-squared statistic ( parametric approach )  ...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================================| 100%
#> 
#>       Chisq  mc_p.value sample.size   num.sampl 
#>    246.6798      0.1000   1999.0000     99.0000 

## Settngs to estimate the Mutual information
margins <- c("norm", "weibull3p")
parMargins <- list(
    list(mean = 1, sd = 0.2),
    as.list(coefs)
) # Notice "as.list" is used here, not "list"

## Finally estimation of the mutual information
mutual.Inf <- mutualinf(
    x = X, y = Y, copula = "normalCopula",
    margins = margins, paramMargins = parMargins
)
head(mutual.Inf$stat)
#>         jprob         p1         p2         x        y         mInf
#> 1 0.012212519 0.06936092 0.03654972 0.7038865 2.063813  2.268233724
#> 2 0.935912349 0.94262173 0.99623362 1.3154339 4.097047 -0.004861521
#> 3 0.026703146 0.16934812 0.05253993 0.8086511 2.106610  1.585531548
#> 4 0.065589060 0.17878501 0.15761051 0.8159990 2.297089  1.218865734
#> 5 0.006073589 0.02287774 0.04545271 0.6004716 2.088608  2.546166746
#> 6 0.321124429 0.39269720 0.65771400 0.9455408 2.902408  0.314182830
## The fitted copula is also returned, so, it can be used in downstream
## analyses
mutual.Inf$copula@copula
#> Normal copula, dim. d = 2 
#> Dimension:  2 
#> Parameters:
#>   rho.1   = 0.4647337

Mutual information Based on Multivariate Distributions Constructed from Copulas

Arguments

Value

Details

See also

Examples