vignettes/fisher-exact-test-failure-can-lead-to-biassed-results.Rmd
fisher-exact-test-failure-can-lead-to-biassed-results.Rmd
Abstract
Fisher’s exact test is a statistical significance test used in the analysis of contingency tables. Although this test is routinely used even though, it has been full of controversy for over 80 years. Herein, the case of its application analyzed is scrutinized with specific examples.
The statistical significance of the difference between two bisulfite sequence from control and treatment groups at each CG site can be evaluated with Fisher’s exact test. This is a statistical test used to determine if there are nonrandom associations between two categorical variables.
Let there exist two such (categorical) variables \(X\) and \(Y\), where \(X\) stands for two groups of individuals: control and treatment, and \(Y\) be a two states variable denoting the methylation status, carrying the number of times that a cytosine site is found methylated (\(^{m}CG\)) and non-methylated (\(CG\)), respectively.
This information can be summarized in a \(2 \times 2\) table, a \(2 \times 2\) matrix in which the entries \(a_{ij}\) represent the number of observations in which \(x=i\) and \(y=j\). Calculate the row and column sums \(R_i\) and \(C_j\), respectively, and the total sum:
\[N=\sum_iR_i=\sum_jC_j\]
of the matrix:
\(Y = ^mCG\) | \(Y = CG\) | \(R_i\) | |
---|---|---|---|
Control | \(a_{11}\) | \(a_12\) | \(a_{11}+a_{12}\) |
Treatment | \(a_{21}\) | \(a_22\) | \(a_{21}+a_{22}\) |
\(C_i\) | \(a_{11}+a_{21}\) | \(a_{12}+a_{22}\) | \(a_{11}+a_{12}+a_{21}+a_{22} = N\) |
Then the conditional probability of getting the actual matrix, given the particular row and column sums, is given by the formula:
\[P_{cutoff}=\frac{R_1!R_2!}{N!\prod_{i,j}a_{ij}!}C_1!C_2!\]
Let’s consider the following hypthetical case of methylation at a given cytosine site found in the comparison of control and treatment groups:
case1 <- matrix(c(5, 14, 15, 12), nrow = 2,
dimnames = list(Group = c("Ctrl", "Treat"),
Meth.status = c("mCG", "CG")))
case1
## Meth.status
## Group mCG CG
## Ctrl 5 15
## Treat 14 12
That is, the cytosine site was found methylated 5 times from 20 counts in the control group and 14 out of 26 in the treatment. This accounts for methylation levels about 0.28 and 0.53 in the control and and the treatment groups, respectively, which correspond to a value of 0.25 (50%) of methylation levels difference.
## Proportions
case1/rowSums(case1)
## Meth.status
## Group mCG CG
## Ctrl 0.2500000 0.7500000
## Treat 0.5384615 0.4615385
Fisher’s exact test found not difference between these group!
fisher.test(case1)
##
## Fisher's Exact Test for Count Data
##
## data: case1
## p-value = 0.07158
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.06352714 1.18168682
## sample estimates:
## odds ratio
## 0.293905
Considering the direction of the changes seems to be more sensitive to the magnitude of methylation changes, statistically significant at a significance level of \(\alpha = 0.05\).
fisher.test(case1, alternative = "less")
##
## Fisher's Exact Test for Count Data
##
## data: case1
## p-value = 0.04666
## alternative hypothesis: true odds ratio is less than 1
## 95 percent confidence interval:
## 0.0000000 0.9804613
## sample estimates:
## odds ratio
## 0.293905
To realize how difficult would be the interpretation of this result in a real concrete scenario, let’s consider a basketball team where a top player finished a game scoring 14 field-goal out of 26 shooting. Based on Fisher’s exact test, does it make sense to say that another player who made 6 out of 15 field-goals performed as well as the best player?
Alternative testings are possible by means of bootstrap re-sampling from the set (population) of all the matrices with the same row and column sums \(R_i\) and \(C_j\). The analyses will be accomplished with function tableBoots from the R packge named usefr.
Hellinger test statistic has been proposed to compare discrete probability distributions (1). Basu et all have shown that the Hellinger divergence, as reported in 1, has asymptotically has a chi-square distribution with one degree of freedom under the null hypothesis. This a good property that makes the statistic Hellinger-Chi-squared statistic suitable for bootstrap’s test of two discrete populations.
library(usefr)
tableBoots(x = case1, stat = "hd", num.permut = 1999)
## [1] 0.0495
The Root-Mean-Square statistic (RMST) has been also proposed to test differences between two discrete probability distributions (2)
tableBoots(x = case1, stat = "rmst", num.permut = 1999)
## [1] 0.0415
The \(\chi^2\) statistic fails
tableBoots(x = case1, stat = "chisq", num.permut = 1999)
## [1] 0.054
Assuming that expected discrete probability distribution (DPD) are:
p <- case1[1,]/sum(case1[1,])
p
## mCG CG
## 0.25 0.75
Then, we can test whether the treatment departs from the expected DPD:
chisq.test(x= case1[2,], p = p,
simulate.p.value = TRUE, B = 2e3)
##
## Chi-squared test for given probabilities with simulated p-value (based
## on 2000 replicates)
##
## data: case1[2, ]
## X-squared = 11.538, df = NA, p-value = 0.001999
The failure of Fisher’s exact test is evident in the comparisons given below:
res <- lapply(seq(1,78,1), function(k) {
datos <- matrix(c(5, k+1, 15, k), nrow = 2,
dimnames = list(Group = c("Ctrl", "Treat"),
Meth.status = c("mCG", "CG")))
datos
x <- datos/rowSums(datos)
x <- x[2,1] - x[1,1]
ft <- fisher.test(datos)
hd <- tableBoots(x = datos, stat = "hd", num.permut = 1999)
rmst <- tableBoots(x = datos, stat = "rmst", num.permut = 1999)
chisq <- tableBoots(x = datos, stat = "chisq", num.permut = 1999)
chisq.p <- chisq.test(x= datos[2,], p = datos[1,]/sum(datos[1,]),
simulate.p.value = TRUE, B = 2e3)$p.value
c(meth.diff = x,
FT.pvalue = ft$p.value,
HD.pvalue = hd,
RMST.pvalue = rmst,
chisq.pvalue = chisq,
chisq_test.pvalue = chisq.p)
})
do.call(rbind, res)
## meth.diff FT.pvalue HD.pvalue RMST.pvalue chisq.pvalue chisq_test.pvalue
## [1,] 0.4166667 0.20948617 0.0855 0.1105 0.1295 0.1649175412
## [2,] 0.3500000 0.28326746 0.1180 0.1155 0.1465 0.0999500250
## [3,] 0.3214286 0.17506841 0.1205 0.1045 0.1310 0.0629685157
## [4,] 0.3055556 0.20467538 0.1175 0.0900 0.1100 0.0609695152
## [5,] 0.2954545 0.13181103 0.1125 0.0970 0.0940 0.0294852574
## [6,] 0.2884615 0.14213458 0.1010 0.0960 0.0900 0.0229885057
## [7,] 0.2833333 0.15672314 0.0890 0.0820 0.0875 0.0219890055
## [8,] 0.2794118 0.10138414 0.0915 0.0750 0.0800 0.0099950025
## [9,] 0.2763158 0.10534027 0.0745 0.0705 0.0810 0.0119940030
## [10,] 0.2738095 0.11089814 0.0740 0.0570 0.0700 0.0064967516
## [11,] 0.2717391 0.11750154 0.0755 0.0590 0.0675 0.0049975012
## [12,] 0.2700000 0.07769381 0.0755 0.0660 0.0665 0.0084957521
## [13,] 0.2685185 0.07887317 0.0660 0.0630 0.0680 0.0034982509
## [14,] 0.2672414 0.08061039 0.0670 0.0570 0.0625 0.0019990005
## [15,] 0.2661290 0.08276810 0.0580 0.0500 0.0620 0.0024987506
## [16,] 0.2651515 0.08523730 0.0645 0.0480 0.0565 0.0009995002
## [17,] 0.2642857 0.08793188 0.0495 0.0505 0.0605 0.0004997501
## [18,] 0.2635135 0.09078398 0.0495 0.0420 0.0575 0.0009995002
## [19,] 0.2628205 0.09374027 0.0420 0.0475 0.0540 0.0004997501
## [20,] 0.2621951 0.06031279 0.0565 0.0470 0.0495 0.0004997501
## [21,] 0.2616279 0.06080833 0.0465 0.0405 0.0530 0.0004997501
## [22,] 0.2611111 0.06141565 0.0455 0.0475 0.0525 0.0004997501
## [23,] 0.2606383 0.06211375 0.0500 0.0450 0.0620 0.0004997501
## [24,] 0.2602041 0.06288506 0.0515 0.0445 0.0550 0.0004997501
## [25,] 0.2598039 0.06371480 0.0460 0.0555 0.0385 0.0004997501
## [26,] 0.2594340 0.06459058 0.0420 0.0415 0.0410 0.0009995002
## [27,] 0.2590909 0.06550200 0.0495 0.0420 0.0465 0.0004997501
## [28,] 0.2587719 0.06644033 0.0540 0.0440 0.0470 0.0004997501
## [29,] 0.2584746 0.06739822 0.0450 0.0390 0.0500 0.0004997501
## [30,] 0.2581967 0.06836952 0.0420 0.0375 0.0380 0.0004997501
## [31,] 0.2579365 0.06934906 0.0390 0.0525 0.0425 0.0004997501
## [32,] 0.2576923 0.07033254 0.0480 0.0315 0.0405 0.0004997501
## [33,] 0.2574627 0.07131633 0.0425 0.0470 0.0380 0.0004997501
## [34,] 0.2572464 0.07229742 0.0410 0.0370 0.0470 0.0004997501
## [35,] 0.2570423 0.04633836 0.0385 0.0400 0.0465 0.0004997501
## [36,] 0.2568493 0.04646593 0.0415 0.0425 0.0350 0.0004997501
## [37,] 0.2566667 0.04660925 0.0355 0.0390 0.0400 0.0004997501
## [38,] 0.2564935 0.04676622 0.0360 0.0365 0.0375 0.0004997501
## [39,] 0.2563291 0.04693498 0.0370 0.0405 0.0415 0.0004997501
## [40,] 0.2561728 0.04711391 0.0320 0.0405 0.0390 0.0004997501
## [41,] 0.2560241 0.04730155 0.0360 0.0350 0.0400 0.0004997501
## [42,] 0.2558824 0.04749662 0.0280 0.0420 0.0390 0.0004997501
## [43,] 0.2557471 0.04769797 0.0420 0.0375 0.0335 0.0004997501
## [44,] 0.2556180 0.04790460 0.0390 0.0335 0.0445 0.0004997501
## [45,] 0.2554945 0.04811561 0.0360 0.0405 0.0415 0.0004997501
## [46,] 0.2553763 0.04833021 0.0325 0.0435 0.0460 0.0004997501
## [47,] 0.2552632 0.04854768 0.0485 0.0365 0.0410 0.0004997501
## [48,] 0.2551546 0.04876741 0.0315 0.0340 0.0415 0.0004997501
## [49,] 0.2550505 0.04898884 0.0350 0.0415 0.0335 0.0004997501
## [50,] 0.2549505 0.04921147 0.0350 0.0350 0.0380 0.0004997501
## [51,] 0.2548544 0.04943486 0.0340 0.0330 0.0415 0.0004997501
## [52,] 0.2547619 0.04965863 0.0320 0.0355 0.0315 0.0004997501
## [53,] 0.2546729 0.04988242 0.0320 0.0310 0.0300 0.0004997501
## [54,] 0.2545872 0.05010595 0.0355 0.0390 0.0395 0.0004997501
## [55,] 0.2545045 0.05032893 0.0300 0.0365 0.0430 0.0004997501
## [56,] 0.2544248 0.05055114 0.0305 0.0345 0.0330 0.0004997501
## [57,] 0.2543478 0.05077235 0.0360 0.0315 0.0360 0.0004997501
## [58,] 0.2542735 0.05099239 0.0335 0.0355 0.0360 0.0004997501
## [59,] 0.2542017 0.05121109 0.0405 0.0340 0.0385 0.0004997501
## [60,] 0.2541322 0.05142831 0.0370 0.0310 0.0375 0.0004997501
## [61,] 0.2540650 0.05164393 0.0295 0.0385 0.0380 0.0004997501
## [62,] 0.2540000 0.05185784 0.0310 0.0370 0.0360 0.0004997501
## [63,] 0.2539370 0.05206995 0.0345 0.0320 0.0355 0.0004997501
## [64,] 0.2538760 0.05228018 0.0335 0.0305 0.0365 0.0004997501
## [65,] 0.2538168 0.05248845 0.0300 0.0340 0.0360 0.0004997501
## [66,] 0.2537594 0.05269472 0.0325 0.0385 0.0315 0.0004997501
## [67,] 0.2537037 0.05289893 0.0410 0.0360 0.0420 0.0004997501
## [68,] 0.2536496 0.05310105 0.0350 0.0350 0.0325 0.0004997501
## [69,] 0.2535971 0.05330104 0.0365 0.0335 0.0355 0.0004997501
## [70,] 0.2535461 0.05349888 0.0355 0.0335 0.0355 0.0004997501
## [71,] 0.2534965 0.05369454 0.0335 0.0405 0.0350 0.0004997501
## [72,] 0.2534483 0.05388802 0.0335 0.0405 0.0295 0.0004997501
## [73,] 0.2534014 0.05407930 0.0315 0.0345 0.0305 0.0004997501
## [74,] 0.2533557 0.05426839 0.0325 0.0355 0.0260 0.0004997501
## [75,] 0.2533113 0.05445527 0.0280 0.0280 0.0340 0.0004997501
## [76,] 0.2532680 0.05463995 0.0385 0.0245 0.0315 0.0004997501
## [77,] 0.2532258 0.05482244 0.0350 0.0275 0.0350 0.0004997501
## [78,] 0.2531847 0.03529458 0.0335 0.0260 0.0320 0.0004997501