Outlier tests

A single column of numbers is required. These tests provide objective procedures for detecting outliers in normally distributed data. The “single outlier” tests (Grubbs and Dixon) are designed to detect one outlier only, and should not be repeated for several outliers. The “multiple outlier” test (generalized ESD) attempts to detect multiple outliers, if present.

Grubbs test

The Grubbs test (also known as the Pearson-Hartley or the Extreme Studentized Deviate) tests for a single outlier. The basic reference is Grubbs (1950) but the actual reference for the G statistic as defined by e.g. Wikipedia and NIST is difficult to trace down. In any case, we define G as G=max|x_i - u|/s where u is the sample mean and s is the sample standard deviation. See the Past manual for further details on the significance computations. Note that this is a two-sided test, testing for presence of an outlier at either end of the range (smallest or largest value).

The Grubb’s test is recommended for relatively large sample sizes (N>30). For smaller sample sizes, the Dixon test is preferable, although the two tests usually give similar results.

Dixon’s test

The Dixon test (Dixon 1950) tests for a single outlier. It can only be used for small sample sizes (N<=30) but is then considered superior to the Grubbs test. The idea is to compare the gap between the smallest (or largest) value and its adjacent value, to the total range, giving a test statistic Q. The calculation of Q depends on the sample size, as detailed in the Past manual.

The p value (two-tailed) is estimated by Monte Carlo simulation with 200,000 random, normally distributed samples of size N (it will vary slightly between runs).

Generalized ESD (Extreme Studentized Deviate) test

This procedure can detect more than one outlier. Moreover, it can detect outliers even when the one-outlier tests above do not report significance, because of so-called “masking”.

The procedure starts by testing the most extreme value in the complete sample, giving a test statistic R₁ (= Grubbs G). This most extreme value is then removed from the sample and the procedure is repeated until 20% of the sample has been tested. The critical value R_crit (for significance at p<0.05) is adjusted for each iteration (Rosner 1983). Past marks the values for which R>R_crit in pink. A p value is not calculated explicitly in Past.

Important: All of the most extreme values, up to the last value for which R>R_crit, are to be considered outliers, and are marked as such in Past. Quite often, the initial, most extreme values do not give R>R_crit but they can still be outliers because of a significant value further down in the list. This is due to the masking effect. It looks odd, but is not a bug!

References

Dixon, W.J. 1950. Analysis of extreme values. Annals of Mathematical Statistics 21:488-506.

Grubbs, F. 1950. Sample criteria for testing outlying observations. Annals of Mathematical Statistics 21:27-58.

Rosner, B. 1983. Percentage points for a generalized ESD many-outlier procedure. Technometrics 25:165-172.