Non-metric NMDS

Non-metric multidimensional scaling is based on a distance matrix computed with any of 25 supported distance measures, as explained under "Similarity and Distance Indices". The algorithm then attempts to place the data points in a two- or three-dimensional coordinate system such that the ranked differences are preserved. For example, if the original distance between points 4 and 7 is the ninth largest of all distances between any two points, points 4 and 7 will ideally be placed such that their euclidean distance in the 2D plane or 3D space is still the ninth largest. Non-metric multidimensional scaling intentionally does not take absolute distances into account.

The program may converge on a different solution in each run, depending upon the initial conditions. Each run is actually a sequence of 11 trials, from which the one with smallest stress is chosen. One of these trials uses PCO as the initial condition, the others are random. The solution is automatically rotated to the major axes (2D and 3D).

The algorithm implemented in PAST, which seems to work very well, is based on a new approach developed by Taguchi and Oono (2005).

The minimal spanning tree option is based on the selected similarity or distance index in the original space.

Environmental variables

It is possible to include one or more initial columns containing additional “environmental” variables for the analysis. These variables are not included in the ordination. The correlation coefficients between each environmental variable and the NMDS scores are presented as vectors from the origin. The lengths of the vectors are arbitrarily scaled to make a readable biplot, so only their directions and relative lengths should be considered.

Column scores

The columns can be included in the NMDS plot as weighted averages of the row scores, as in Correspondence Analysis. The weighting uses the raw data values, and therefore does not honour the choice of similarity index. However it seems to work well for e.g. ecological data, allowing the plotting of species together with samples (sites).

Shepard plot

This plot of obtained versus observed (target) ranks indicates the quality of the result. Ideally, all points should be placed on a straight ascending line (x=y). The R² values are the coefficients of determination between distances along each ordination axis and the original distances (perhaps not a very meaningful value, but is reported by other NMDS programs so is included for completeness).

Missing data is supported by pairwise deletion (not for the Raup-Crick, Rho and user-defined indices). For environmental variables, missing values are not included in the computation of correlations.

Reference

Taguchi, Y.-H., Oono, Y. 2005. Relational patterns of gene expression via non-metric multidimensional scaling analysis. Bioinformatics 21:730-40.