Article Text

Testing association with Fisher's Exact test
1. Pamela Warner
1. Reader in Medical Statistics, Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
1. Correspondence to Dr Pamela Warner, Centre for Population Health Sciences, University of Edinburgh, Medical School, Teviot Place, Edinburgh EH8 9AG, UK; p.warner{at}ed.ac.uk

## Background

Fisher's Exact tests have been used to test association in a paper in this issue of the Journal, namely that by Akintomide et al.1 These notes are intended to provide some supplementary explanation of this method (see Box 1 for a glossary of terms used in this article).

Box 1

## What is Fisher's Exact test?

Undoubtedly the most widely known test of association between two binary variables is the 2×2 Chi-square (χ2) test.2–5 However, many readers will also have learned about Fisher's Exact test at some point – most likely in a basic statistics course – that Fisher's Exact test is the advised, or in fact the obligatory, alternative to the 2×2 χ2 test in the situation that ‘the sample size is small’.2–5 It might seem surprising then that Fisher Exact tests have been used for all analyses of association in the article by Akintomide et al., even though the n available for analysis is >100 in all analyses reported, and despite the fact that the cross-tabulations are not 2×2, but 3×3 or, in one case, 3×4.1

The fact is, Fisher's Exact test of association between two categorical (classification) variables is much more widely applicable than basic statistics courses have led learners to believe. There is an historical reason why it has been so ‘overlooked’, and that is because of the torturous arithmetic calculations that are required to achieve the Fisher Exact test for a cross-tabulation with large overall n, even more so to complete tests analogous to 2×2 Fisher Exact test, for tables of larger dimension (R×C rather than 2×2). The calculations necessary would be pretty much impossible using a calculator, and have not even been much available in statistical software for personal computers. It is only with recent improvements in desktop computing power that the necessary procedures have come to be added into statistical software packages.6

Fisher's Exact test (or an analogous test for tables larger than 2×2) enables, for any cross-classified R×C table, calculation of the exact probability of obtaining a set of cell frequencies at least as extreme as the observed data. Reflection on the size of this calculated probability then allows evaluation of the null hypothesis of no association (or equivalently, of independence) between the two classification variables.

## When/why is it useful?

The well-known χ2 test is an asymptotic test (i.e. it depends on large-sample approximation) and so the larger the sample the better it will perform. Of course the reverse is also true, which has consequences for the circumstances in which χ2 is valid (dependable). We have referred above to the well-known caveat that the 2×2 χ2 test is not valid if there is a small sample. [Small sample size is variously defined by textbooks, along the lines of ‘in all cases where total n<20, or when n<40 and any expected cell count is<5’.]2–4 It is also the case that the χ2 test is not valid in any R×C tables where more than 20% of the expected cell counts for the table are less than 5, or if any expected cell count is less than 1.2–4 In either of these circumstances (small n, or ‘lop-sided’ classification, i.e. reasonable n but too many small expected counts), Fisher's Exact test is invaluable in enabling a (valid) test of association to be performed. However, it can also be used in tables where these validity concerns do not apply, and in such circumstances has the advantage that it provides an exact probability for the significance test, rather than an approximation.

It should be pointed out also that although Fisher Exact test for R×C tables is now included in many statistical software packages, it remains very demanding on computing power/time, and for some particular table arrangements the calculation would not be feasible computationally even by personal computer. Therefore it is often the case that an alternative calculation method is provided, a ‘Monte Carlo’ estimate of the exact probability. This calculation method is adopted by the software if the true ‘exact’ calculation would not be possible. For our purposes we will not distinguish between the two, but for further explanation see Mehta.6

## What precautions are needed?

There are three reservations applying to Fisher Exact test, but note that these also apply to the χ2 test. First, the test is designed for nominal level data (i.e. categorical but with no inherent ordering). This means that the test will be under-powered if the data variables (and the association) are in fact ordinal, as has been pointed out previously regarding χ2.5 ,7 ,8 Second, both Fisher and χ2 are solely significance tests, and as such provide no quantification of the size of effect (i.e. the degree/strength of association), which is these days the preferred approach to statistical analysis.2 ,5 ,7 The third reservation is too complex to explain here, but hinges on the fact that the tests are theoretically designed for cross-tabulations where the marginal totals are fixed (i.e. set/specified prior to data collection), not whatever count happens to turn out randomly. Yet tables with both sets of marginal totals fixed are seldom found in health research. There has been considerable debate among statisticians about this issue, and the consequences for analysis findings in a table where marginal totals are not fixed in advance. The pragmatic view is that although Fisher's Exact test might tend to be on the conservative side in such circumstances, its use for small samples that are unsuited to χ2 is acceptable.4

## Example

To illustrate with an example, Figure 1 shows the data reported in the third section of Table 1 of Akintomide et al.1 for the association between health professional tendency to use local anaesthetic (LA) for intrauterine device (IUD) insertions, and the number of insertions performed in the past year. It can be seen that those always/sometimes using LA are more likely to be those who have undertaken more than 50 IUD insertions in the past year. If a standard χ2 had been performed, despite the fact that the data fails the requirements for χ2 (in that 33% of the expected cell frequencies are less than 5), the p value found would be 0.011. The Fisher Exact probability, as reported, was 0.010, so in this case there is very little disparity (and the Fisher p value is not more conservative than χ2). However, depending on the precise table pattern, disparities can be in the other direction, and/or greater, particularly for smaller n.

Figure 1

Percentage distribution of respondents by annual number of intrauterine device (IUD) insertions, separately for subgroups based on reported frequency of use of local anaesthetic (LA)

*Graph has been created from the third panel of data reported in Table 1 of Akintomide et al.1 †Those performing <13 insertions in past year are not plotted (6%, 2% and 16% across the three columns), but if they had been this would have brought each column up to 100%.

With respect to the points made above: (1) As is usual in health research, this cross-tabulation did not have fixed marginal totals; the marginal numbers are as occurred randomly in the sample surveyed (e.g. column marginal totals=36, 59 and 32). Nevertheless, Fisher Exact test is regarded as an acceptable test to use. (2) It is the case here that both cross-tabulation variables are ordinal: degree of use of LA, and number of insertions performed. An alternative analysis approach could have been non-parametric correlation,7 which would have given a Spearman rank order correlation rho of 0.24 (95% confidence interval 0.06–0.42).

## Overview

Fisher Exact tests are preferable to χ2 for hypothesis-testing in small or sparse cross-tabulations, whether 2×2 or R×C tables. They can also be used for larger samples to obtain an exact p value.

View Abstract

## Footnotes

• Competing interests None.

• Provenance and peer review Commissioned; internally peer reviewed.

## Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.