Contingency table (chi2 etc.)

These tests expect a frequency table with numbers of elements in different categories (rows and columns). Rows represent the different states of one nominal variable, columns represent the states of another nominal variable, and cells contain the integer counts of occurrences of that specific state (row, column) of the two variables. The contingency table analysis then gives information on whether the two variables of taxon and locality are associated. For example, this test can be used to compare two samples (columns) with the number of individuals in each taxon organized in the rows. You should be cautious about this test if any of the cells contain less than five individuals (see Fisher’s exact test below).

The significance of association between the two variables is given, with p values from the chi-squared distribution and from a permutation test with 9999 replicates.

The "Sample vs. expected" box should be ticked if you have two columns, and your second column consists of counts from a theoretical distribution (expected values) with zero sampling error, possibly non-integer. This is not a small-sample correction. In this case, only the chi-squared test is available. The Monte Carlo permutation test uses the given number of random replicates. For "Sample vs. expected" these replicates are generated by keeping the expected values fixed, while the values in the first column are random with relative probabilities as specified by the expected values, and with constant sum. For two samples, all cells are random but with constant row and column sums.

See e.g. Brown & Rothery (1993) or Davis (1986) for details.

Two further measures of association are given; Cramer's V and the Contingency coefficient C. Both are transformations of chi-squared (Press et al. 1992).

Fisher's exact test

The Fisher's exact test is also given (two-tailed). When available, the Fisher's exact test may be superior to the chi-square. For large tables or large counts, the computation time can be prohibitive and will time out after one minute. In such cases the parametric test is probably acceptable in any case. The procedure is complex, and based on the network algorithm of Mehta & Patel (1986).

Residuals

If you get a significant association (p<0.05) in the chi-squared test, it may be of interest to see which of the cells contribute most strongly to the departure from the expected values under the null hypothesis of no association (post-hoc analysis). The table of residuals can show the following values for each cell:

  • Raw residuals: O-E, where O is the observed and E the expected value.
  • Standardized residuals: (O-E)/√E, standardizing for the magnitude of the expected value.
  • Adjusted residuals: The adjusted residuals are approximately normally distributed, meaning that values outside the two sigma interval [-1.96, 1.96] can be considered significant at p<0.05 , although the multiple testing problem applies. See the Past manual for equation.
  • p values: The adjusted residuals are approximately normally distributed, meaning
    that values outside the two-sigma interval [-1.96, 1.96] can be considered significant at p<0.05, although the multiple testing problem applies. It is recommended to use the Bonferroni correction. Significant p values are marked in pink.

Missing data not supported.

References

Brown, D. & P. Rothery. 1993. Models in biology: mathematics, statistics and computing. John Wiley & Sons.

Davis, J.C. 1986. Statistics and Data Analysis in Geology. John Wiley & Sons.

Mehta, C.R. & N.R. Patel. 1986. Algorithm 643: FEXACT: a FORTRAN subroutine for Fisher's exact test on unordered r×c contingency tables. ACM Transactions on Mathematical Software 12:154-161.

Press, W.H., S.A. Teukolsky, W.T. Vetterling & B.P. Flannery. 1992. Numerical Recipes in C. Cambridge University Press.

Published Aug. 31, 2020 7:51 PM - Last modified May 9, 2021 11:35 PM