A number of simple statistics on genetic sequence (DNA or RNA) data. The module expects a number of rows, each with a sequence. The sequences are expected to be aligned and of equal length including gaps (coded as ‘?’). Some of these statistics are useful for selecting appropriate distance measures elsewhere in Past.
Total length
The total sequence length, including gaps, of one sequence
Average gap
The number of gap positions, averaged over all sequences
Average A, T/U, C, G
The average number of positions containing each nucleotide
Average p distance
The p distance between two sequences, averaged over all pairs of sequences. The p (or Hamming) distance is defined as the proportion of unequal positions
Average Jukes-Cantor d
The Jukes-Cantor d distance between two sequences, averaged over all pairs of sequences. d = -3ln(1 - 4p/3)/4, where p is the p distance
Maximal Jukes-Cantor d
Maximal Jukes-Cantor distance between any two sequences
Average transitions (P)
Average number of transitions (a/g, c/t, i.e. within purines, pyrimidines)
Average transversions (Q)
Average number of transversions (a/t, a/c, c/g, t/g, i.e. across purines, pyrimidines)
R=P/Q
The transition/transversion ratio
Missing data: Treated as gaps.