Previous Chapter: Alignment Given
Suggested Citation: "Alignment Unknown." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 105

Alignment Unknown

The situation for matching between two sequences is closely related, although the dependence structure becomes more complex. Suppose that the two sequences A1A2. . .An and B1B2. . .Bm are made up of letters independently and uniformly chosen from a d-letter alphabet. It must be emphasized that whenever the letters are not uniformly chosen, Theorem 4.2 holds but is not straightforward to apply. In matching DNA, d = 4; for protein sequences, d = 20. Let

I = {(i, j): l £ i £ n - t + l, l £ j £ m - t + l}.

Define indicator random variables

Ei,j =1 if Ai = Bj.

Let p =P(Ea =1)=1/d.

As in the case of head runs, we need to unclump matches and consider ''boundary effects." Let

Xi,j = Ei,j Ei+l,j+l··· Ei+t-1,j+t-1 if i =1 or j = 1

and otherwise

Xi,j = (1 - Ei-1,j-1) Ei,jE i+1,j+l··· Ei+t-1,j+t-1.

With W = SI Xa, calculating l = E(W) yields

l = pt[(n + m - 2t + 1) + (n - t)(m - t)(1 - p)].                        (4.5)

In matching two tRNA sequences, one of length 76, the other of length 77, would a match of length 9 be unusual? For the given parameters, l = 0.0136 and under the model above, the event has a probability of approximately

Suggested Citation: "Alignment Unknown." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 105
Next Chapter: APPLICATION TO RNA EVOLUTION
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.