A typical use of the intraclass correlation coefficient (ICC) is to quantify rater reliability, i.e. level of agreement between several ‘raters’ measuring the same objects. It is a standard tool to assess measurement error. ICC=1 would indicate perfect reliability. Raters (or ‘judges’) go in columns, while the objects measured go in rows.
Past follows the standard reference, Shrout and Fleiss (1979), which provides a number of different coefficients, referred to as ICC(m,k) where m is the model type. If k=1, the coefficient evaluates individual measurements (by a single rater); otherwise it evaluates the average measurement across raters. The model types are are
- Model 1: the raters rating different objects are different, and randomly sampled from a larger set of raters
- Model 2: the same raters rate all objects, and the raters are a subset of a larger set of raters.
- Model 3: no assumptions about the raters.
The most commonly used ICC is ICC(2,1), which is therefore marked in red in Past.
The analysis is based on a two-way ANOVA without replication, as described elsewhere in this manual. Confidence intervals are parametric, following the equations of Shrout and Fleiss (1979).
Reference
Shrout, P.E., Fleiss, J.L. 1979. Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin 86:420-428.