mcgoftest {usefr}R Documentation

Bootstrap test for Goodness of fit (GoF)

Description

To accomplish the nonlinear fit of a probability distribution function (*PDF*), dIfferent optimization algorithms can be used. Each algorithm will return a different set of estimated parameter values. AIC and BIC are not useful (in this case) to decide which parameter set of values is the best. The goodness-of-fit tests (GOF) can help in this case.

Usage

mcgoftest(varobj, distr, pars, num.sampl = 999, sample.size,
  stat = c("ks", "ad", "rmst", "chisq"), breaks = NULL,
  parametric = TRUE, seed = 1, num.cores = 1, tasks = 0)

Arguments

varobj

A a vector containing observations, the variable for which the CDF parameters was estimated.

distr

The name of the cummulative distribution function (CDF) or a concrete CDF from where estimate the cummulative probabilities. Distribution distr must be defined in environment-namespace from any package or environment defined by user.

pars

CDF model parameters. A list of parameters to evaluate the CDF.

num.sampl

Number of resamplings.

sample.size

Size of the samples used for each sampling.

stat

One string denoting the statistic to used in the testing: "ks": Kolmogorov–Smirnov, "ad": Anderson–Darling statistic, "chisq: Pearson's Chi-squared, and "rmst": Root Mean Square statistic.

breaks

Default is NULL. Basically, the it is same as in function hist. If breaks = NULL, then function 'nclass.FD' (see nclass is applied to estimate the breaks.

parametric

Logical object. If TRUE, then samples are drawn from the theoretical population described by distr. Default: TRUE.

seed

An integer used to set a 'seed' for random number generation.

num.cores, tasks

Paramaters for parallele computation using package BiocParallel-package: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply and the number of tasks per job (only for Linux OS).

Details

The test is intended for continuos distributions. If sampling size is lesser the size of the sample, then the test becomes a Monte Carlo test. The thes is based on the use of measures of goodness of fit, statistics. The following statistics are availible:

Value

A numeric vector with the following data:

  1. Statistic value.

  2. mc_p.value: the probability of finding the observed, or more extreme, results when the null hypothesis H_0 of a study question is true obtained Monte Carlo resampling approach.

Author(s)

Robersy Sanchez (https://genomaths.com).

References

  1. Feller, W. On the Kolmogorov-Smirnov Limit Theorems for Empirical Distributions. Ann. Math. Stat. 19, 177–189 (1948).

  2. Anderson, T. . & Darling, D. A. A Test Of Goodness Of Fit. J. Am. Stat. Assoc. 49, 765–769 (1954).

  3. Watson, G. S. On Chi-Square Goodness-Of-Fit Tests for Continuous Distributions. J. R. Stat. Soc. Ser. B Stat. Methodol. 20, 44–72 (1958).

See Also

Distribution fitting: fitMixDist, fitdistr, fitCDF.

Examples

# Example 1
# Let us generate a random sample a from a specified Weibull distribution:
# Set a seed
set.seed( 1 )
# Random sample from Weibull( x | shape = 0.5, scale = 1.2 )
x = rweibull(10000, shape = 0.5, scale = 1.2)

# MC KS test accept the null hypothesis that variable x comes
# from Weibull(x | shape = 0.5, scale = 1.2), while the standard
# Kolmogorov-Smirnov test reject the Null Hypothesis.
mcgoftest(x, distr = pweibull, pars = c( 0.5, 1.2 ), num.sampl = 500,
        sample.size = 1000, num.cores = 4)

# Example 2
# Let us generate a random sample a random sample from a specified Normal
# distribution:
# Set a seed
set.seed( 1 )
x = rnorm(10000, mean = 1.5, sd = 2)

# MC KS test accept the null hypothesis that variable x comes
# from N(x | mean = 0.5, sd = 1.2), while the standard
# Kolmogorov-Smirnov test reject the Null Hypothesis.
mcgoftest(x, distr = pnorm, pars = c(1.5, 2), num.sampl = 500,
          sample.size = 1000, num.cores = 1)

[Package usefr version 0.1.0 ]