Parametric Bootstrap of \(n x m\) Contingency independence test. It is assumed that the provided contingency table summarizes the results where two or more groups are compared and the outcome is a categorical variable, i.e., disease vs. no disease, pass vs. fail, treated patient vs. control group, etc.
The goodness of fit statistic is the root-mean-square statistic (RMST) or Hellinger divergence, as proposed by Perkins et al. [1, 2]. Hellinger divergence (HD) is computed as proposed in [3].
tableBoots(
x,
stat = c("rmst", "hd", "chisq", "all"),
out.stat = FALSE,
num.permut = 100
)
A numerical matrix corresponding to cross tabulation \(n x m\) table (contingency table).
Statistic to be used in the testing: 'rmst','hdiv', or 'all'.
logical(1). Whether to return the values of the statistics used: the bootstrap mean and the original value estimated.
Number of permutations.
A p-value probability
For goodness-of-fit the following null hypothesis is tested \(H_\theta: p = p(\theta)\) To conduct a single simulation, we perform the following three-step procedure [1,2]:
To generate m i.i.d. draws according to the model distribution \(p(\theta)\), where \(\theta'\) is the estimate calculated from the experimental data,
To estimate the parameter \(\theta\) from the data generated in Step 1, obtaining a new estimate \(\theta\)est.
To calculate the statistic under consideration (HD, RMST), using the data generated in Step 1 and taking the model distribution to be \(\theta\)est, where \(\theta\)est is the estimate calculated in Step 2 from the data generated in Step 1.
After conducting many such simulations, the confidence level for rejecting the null hypothesis is the fraction of the statistics calculated in step 3 that are less than the statistic calculated from the empirical data. The significance level \(\alpha\) is the same as a confidence level of \(1-\alpha\).
Perkins W, Tygert M, Ward R. Chi^2 and Classical Exact Tests Often Wildly Misreport Significance; the Remedy Lies in Computers [Internet]. Uploaded to ArXiv. 2011. Report No.: arXiv:1108.4126v2.
Perkins, W., Tygert, M. & Ward, R. Computing the confidence levels or a root-mean square test of goodness-of-fit. 217, 9072-9084 (2011).
Basu, A., Mandal, A. & Pardo, L. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat. Probab. Lett. 80, 206-214 (2010).
set.seed(123)
## The numbers of methylated and unmethylted "read-counts" targeting DNA
## cytosine sites are the typical raw data resulting from bisulfate
## sequencing experiments (BiSeq). Methylation analysis requires for the
## application of some statistical tests to identify differentially
## methylated sites/ positions (DMSs or DMPs). BiSeq experiments on humans
## are relatively expensive (still in 2020) and some researcher is willing
## to accept a site coverage (total) of 4 read-counts as threshold for a
## site to be included in the analysis. Let's suppose that at a given site
## we have the following contingency table:
site_res <- matrix(c(3, 1, 1, 3),
nrow = 2,
dimnames = list(
status = c("ctrl", "treat"),
treat = c("meth", "unmeth")
)
)
site_res
#> treat
#> status meth unmeth
#> ctrl 3 1
#> treat 1 3
## The application of classical test variants yield the following results
fisher.test(site_res)$p.value
#> [1] 0.4857143
fisher.test(site_res, alternative = "greater")$p.value
#> [1] 0.2428571
fisher.test(site_res, simulate.p.value = TRUE)$p.value
#> [1] 0.4857143
chisq.test(site_res)$p.value
#> [1] 0.4795001
chisq.test(site_res, simulate.p.value = TRUE)$p.value
#> [1] 0.4832584
## That is, these test did not detect differences between the read-counts
## for control and the treated patients. 'tableBoots' function suggests
## that the site is borderline.
tableBoots(site_res, stat = "all", num.permut = 1e3)
#> rmst.p.value hdiv.p.value chisq.p.value
#> 0.09390609 0.04095904 0.14585415
## A slight change in the read counts identify a meaningful difference
## not detected by the classicla tests
site_res <- matrix(c(3, 0, 1, 4),
nrow = 2,
dimnames = list(
status = c("ctrl", "treat"),
treat = c("meth", "unmeth")
)
)
site_res
#> treat
#> status meth unmeth
#> ctrl 3 1
#> treat 0 4
set.seed(23)
tableBoots(site_res, stat = "all", num.permut = 1e3)
#> rmst.p.value hdiv.p.value chisq.p.value
#> 0.007992008 0.008991009 0.010989011
fisher.test(site_res)$p.value
#> [1] 0.1428571
fisher.test(site_res, alternative = "greater")$p.value
#> [1] 0.07142857
fisher.test(site_res, simulate.p.value = TRUE)$p.value
#> [1] 0.1428571
chisq.test(site_res)$p.value
#> [1] 0.144127
chisq.test(site_res, simulate.p.value = TRUE)$p.value
#> [1] 0.1324338
## Now, let's suppose that we want to test whether methylation status from
## two DNA sites the same differentially methylated region (DMR)
## are independent, i.e. whether the treatment affected the methylation
## status of the two sites. In this case, the counts grouped into four
## categories.
set.seed(1)
site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4),
nrow = 2, byrow = FALSE,
dimnames = list(
status = c("ctrl", "treat"),
treat = c(
"meth.1", "unmeth.1",
"meth.2", "unmeth.2"
)
)
)
site_res
#> treat
#> status meth.1 unmeth.1 meth.2 unmeth.2
#> ctrl 3 1 3 1
#> treat 1 3 0 4
## Chi-squared from the R package 'stats'
chisq.test(site_res)$p.bvalue
#> NULL
chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value
#> [1] 0.1244378
tableBoots(site_res, stat = "all", num.permut = 999)
#> rmst.p.value hdiv.p.value chisq.p.value
#> 0.065 0.087 0.068
## Results above are in border. If we include, third site,
## then sinces would different.
site_res <- matrix(c(3, 1, 1, 3, 3, 0, 1, 4, 4, 0, 1, 4),
nrow = 2, byrow = FALSE,
dimnames = list(
status = c("ctrl", "treat"),
treat = c(
"meth.1", "unmeth.1",
"meth.2", "unmeth.2",
"meth.3", "unmeth.3"
)
)
)
site_res
#> treat
#> status meth.1 unmeth.1 meth.2 unmeth.2 meth.3 unmeth.3
#> ctrl 3 1 3 1 4 1
#> treat 1 3 0 4 0 4
## That is, we have not reason to believe that the observed methylation
## levels in the treatment are not independent
chisq.test(site_res)$p.value
#> [1] 0.02764776
chisq.test(site_res, simulate.p.value = TRUE, B = 2e3)$p.value
#> [1] 0.017991
tableBoots(site_res, stat = "all", num.permut = 999)
#> rmst.p.value hdiv.p.value chisq.p.value
#> 0.014 0.016 0.010