Goodness of Fit Boostrap Test for Bivariate distributions Constructed from Copula with Known Margins.

Goodness-of-fit (GOF) tests for a two-dimensional copula based, by default, on the knowledge of the marginal probability distributions. Several functionalities/tools from copula-package are integrated to perform the GOF of copulas that includes specific margin parameter settings. In terms of copula-package vocabulary, these are GOF for copula objects from class Mvdc (also called non-free copulas).

bicopulaGOF(
  x,
  y,
  copula = NULL,
  margins = NULL,
  paramMargins = NULL,
  sample.size = NULL,
  nboots = 10,
  approach = c("adchisq", "adgamma", "chisq", "rmse", "Sn", "SnB", "SnC"),
  Rosenblatt = FALSE,
  breaks = 12,
  method = "ml",
  num.cores = 1L,
  tasks = 0,
  seed = 123,
  verbose = TRUE,
  ...
)

Arguments

x: Numerical vector with the observations from the first marginal distribution.
y: Numerical vector with the observations from the second margin distribution.
copula: A copula object from class Mvdc or string specifying all the name for a copula from package copula-package.
margins: A character vector specifying all the parametric marginal distributions. See details below.
paramMargins: A list whose each component is a list (or numeric vectors) of named components, giving the parameter values of the marginal distributions. See details below.
sample.size: The size of the samples used for each sampling. It is not required for the approaches: "Sn", "SnB", and "SnC"; see below.
nboots: The number of booststrap resampling to perform.
approach: a character string specifying the goodness-of-fit test statistic to be used, which has to be one (or a unique abbreviation) of following: "adchisq", "adgamma", "Sn", "SnB", "SnC", "chisq", and "rmse". With the exception of chisq and rmse, all the other statistics are the same as in functions gofTstat and gofCopula. The test using chisq implement the approach described in reference [1].
Rosenblatt: The Anderson–Darling statistic approach using Rosenblatt transformation is normally used for the GOF in function gofCopula from copula-package package. since, the current function applies a parametric bootstrap approach generating random variate from the analytic expression for the margin CDFs, the test does not depend on the theoretical distribution of the Anderson–Darling statistic. Simulations suggest, so far, that the application of Rosenblatt transformation may not be needed in this case. Hence, the desicion on whether to apply the Rosenblatt transformation (computational expensive for big datasets) is left to the users. Function cCopula is used to computes the Rosenblatt transform. So, the application of Rosenblatt transform is limited to those copulas for which cCopula is implemented.
breaks: A single number giving the number of bins for the computation of the Pearson's Chi-squared statistic as suggested in reference [1]. Bascally, it is used to split the unit square [0, 1]^2 into bins/regions.
method: A character string specifying the estimation method to be used to estimate the dependence parameter(s) (if the copula needs to be estimated) see fitCopula.
num.cores, tasks: Paramaters for parallele computation using package BiocParallel-package: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply and the number of tasks per job (only for Linux OS).
seed: An integer used to set a 'seed' for random number generation.
verbose: if verbose, comments and progress bar will be printed.

Value

The statistic value estimated for the observations, and the estimated bootstrap p.value.

Details

Notice that copula-package already have function gofCopula to perform GOF. However, this function does not support bivariate distributions constructed from copula with known margins. In addition, its use can be computational expensive for big datasets.

References

Jaworski, P. Copulae in Mathematical and Quantitative Finance. 213, d (2013).
Wang, Y. et al. Multivariate analysis of joint probability of different rainfall frequencies based on copulas. Water (Switzerland) 9, (2017).

Author

Robersy Sanchez (https://genomaths.com).

Examples

require(stats)

set.seed(12)
margins <- c("norm", "norm")
## Random variates from normal distributions
X <- rnorm(2 * 1e3, mean = 0, sd = 10)
Y <- rnorm(2 * 1e3, mean = 0, sd = 10)

parMargins <- list(
    list(mean = 0, sd = 10),
    list(mean = 0, sd = 10)
)

bicopulaGOF(
    x = X, y = Y, copula = "normalCopula", sample.size = 1e2,
    margins = margins, paramMargins = parMargins, nboots = 99,
    Rosenblatt = TRUE, approach = "adgamma", num.cores = 1L,
    verbose = FALSE
)
#> $gof
#>     AD.stat  mc_p.value sample.size   num.sampl 
#>   0.2074162   0.9400000 100.0000000  99.0000000 
#> 
#> $copula
#> Multivariate Distribution Copula based ("mvdc")
#>  @ copula:
#> Normal copula, dim. d = 2 
#> Dimension:  2 
#> Parameters:
#>   rho.1   = 0.01905212
#>  @ margins:
#> [1] "norm" "norm"
#>    with 2 (not identical)  margins; with parameters (@ paramMargins) 
#> List of 2
#>  $ :List of 2
#>   ..$ mean: num 0
#>   ..$ sd  : num 10
#>  $ :List of 2
#>   ..$ mean: num 0
#>   ..$ sd  : num 10
#> 

bicopulaGOF(
    x = X, y = Y, copula = "normalCopula", sample.size = 1e2,
    margins = margins, paramMargins = parMargins, nboots = 99,
    Rosenblatt = FALSE, approach = "adgamma", num.cores = 1L,
    verbose = FALSE
)
#> $gof
#>     AD.stat  mc_p.value sample.size   num.sampl 
#>   0.2383179   0.8500000 100.0000000  99.0000000 
#> 
#> $copula
#> Multivariate Distribution Copula based ("mvdc")
#>  @ copula:
#> Normal copula, dim. d = 2 
#> Dimension:  2 
#> Parameters:
#>   rho.1   = 0.01905212
#>  @ margins:
#> [1] "norm" "norm"
#>    with 2 (not identical)  margins; with parameters (@ paramMargins) 
#> List of 2
#>  $ :List of 2
#>   ..$ mean: num 0
#>   ..$ sd  : num 10
#>  $ :List of 2
#>   ..$ mean: num 0
#>   ..$ sd  : num 10
#> 

## --- Non-parallel expensive computation ---- -
if (FALSE) {
    U <- pobs(cbind(X, Y)) #' # Compute the pseudo-observations
    fit <- fitCopula(normalCopula(), U, method = 'ml')
    U <- cCopula(u = U, copula = fit@copula) ## Rosenblatt transformation

    parMargins <- list(
        list(mean = 0, sd = 10),
        list(mean = 0, sd = 10)
    )

    ptm <- proc.time()
    gof <- gofCopula(copula = fit@copula, x = U, N = 99, method = "Sn",
                        simulation = "pb")
    (proc.time() - ptm)[3]/60 # in min
    gof

    ## --- Parallel computation with 2 cores ---- -
    ## Same algorithm as in 'gofCopula' adapted for parallel computation
    ptm <- proc.time()
    system.time(
        gof <- bicopulaGOF(x = X, y = Y, copula = "normalCopula",
                    margins = margins, paramMargins = parMargins,
                    nboots = 99, approach = "Sn", seed = 12,
                    num.cores = 2L,
                    verbose = FALSE)
    )
    (proc.time() - ptm)[3]/60 # in min
    gof
}