Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology (1995)

Chapter: The Duality Between Similarity and Difference Measures

Previous Chapter: Variations in Gap Cost Penalties
Suggested Citation: "The Duality Between Similarity and Difference Measures." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 72

sequence comparison. It has been postulated that such a model is natural for biological sequences where gap costs would be expected to have a decreasing marginal penalty as a function of length. For this model, investigators have been able to design algorithms that take O(MN(log N + log M)) time or less (Miller and Myers, 1988; Eppstein et al., 1989).

It is also possible to design an algorithm for completely arbitrary gap cost functions. However, such generality comes at a price: the best available algorithm takes O(MN(M + N)) time (Waterman et al., 1976). For this reason and because the more restricted affine and concave models appear adequate to most needs, the general algorithm is rarely used.

The Duality Between Similarity and Difference Measures

Thus far we have considered the comparison problem to be one of exposing the similarity between two sequences and thus have naturally thought in terms of maximizing the score of alignments. Another natural perspective is to think about how a sequence A may have evolved into sequence B over time. In this context, one seeks alignments that reveal the minimum number of mutational events that might have effected the transformation. In this view, an aligned symbol of B is substituted for its counterpart in A, an unaligned symbol in A is deleted, and an unaligned symbol in B is inserted. For example, in the alignment image, the first T in ATTACG is deleted, and the second T in ATATCG is inserted. In the alignment image, T is mutated into A, and A is mutated into T. As before, the scoring scheme d assigns a score to each evolutionary event modeled by a column, but now the interpretation is that d represents the differences rather than the similarities between symbols. Note that for formal purposes it is assumed that an A mutates into an A in the alignments above at no cost; that is, one chooses d(A,A) to be 0.

Given a scoring scheme d reflecting an evolutionary or difference-based model, the goal is to find an alignment of minimal score, that is, one that indicates the minimum-scoring set of changes needed to go from one sequence to a related sequence. Let D(A,B) be the score of a minimal

Suggested Citation: "The Duality Between Similarity and Difference Measures." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 72
Next Chapter: Aligning More Than Two Sequences at a Time
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.