Previous Chapter: Alignment Given
Suggested Citation: "Alignment Unknown." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 96

therefore cannot be optimized. We give the statistical distribution of the alignment score for completeness, however. Let s(A,B) be a real valued random variable. Define the score S by

image

and let E(S) denote the expectation of S and Var(S) denote the variance. Clearly, E(S) = nE(s(A,B)) and

Var(S) = nVar(s(A, B)).

Since S is the sum of independent, identically distributed random variables s(A,B), the central limit theorem implies that for large n

S » Normal(nE(s(A, B)), nVar(s(A, B))).

If s(A, B) Î {0, 1}, then S is binomial (n, p), where p = P{s(A, B) = 1}. Even when the letters are not identically distributed, the limiting distribution is normal under mild assumptions (Chung, 1974).

Alignment Unknown

The assumptions of the last section are carried over: AlA2. . . An and B1B2. . . Bm are composed of independent and identically distributed letters and s(A, B) is a real valued random variable on pairs of letters. We extend s(·,·) to s(A,-) and s(-,B) so that deletions are included. We assume that the value of s for all deletion scores is smaller than maxs(A,B). An alignment score S is the maximum over all possible alignments

image

Suggested Citation: "Alignment Unknown." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 97

The optimization destroys the classical normal distribution of alignment score, but an easy application of a beautiful theorem known as Kingman's subadditive ergodic theorem gives an interesting result:

Theorem 4.1 (Kingman, 1973) For s,t nonnegative integers with 0 £ s £ t, let Xs,t be a collection of random variables that satisfy

(i) Whenever s  t  u, Xs,u £ Xs,t + Xt,u,
(ii The joint distribution of Xs,t} is the same as that of {Xs+l, t+1},
iii The expectation gt = EX0,t exists and satisfies gt ³ - Kt for
some constant

Then the finite limt®¥ X0,t / t = l exists with probability 1 and in the mean.

The essential assumption is (ii), the subadditivity condition. To motivate and illustrate this theorem, recall the strong law of large numbers (SLLN), which treats independent, identically distributed (iid) random variables W1,W2,. . . with µ = E(Wi). The SLLN asserts that

image

with probability 1.

It is easy to see that additivity holds. Set

image

Of course (i) is satisfied:

image

Suggested Citation: "Alignment Unknown." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 98

Since the Wi are iid, (ii) is evidently true. Finally, g(t) = E(U0,t) = , so that (iii) holds with µ = -K. Therefore the limit

image

exists and is constant with probability 1. Notice that this setup does not allow us to conclude that the limit is µ. This is a price of relaxing the assumption of additivity.

Returning to the statistical distribution of alignment score, recall that an alignment score S is the maximum over all possible alignments

image

Define X,, by

image

Then evidently,

-Xs,u ³ (-Xs,t) + (-Xt,u)

and

Xs,u £ Xs,t + Xt,u.

We have that gt = E(X0,t) exists since the expectation of a single alignment exists and -X0,t is the maximum of a finite number of alignment scores. The final hypothesis to check is gt ³ -Kt for some constant K and all > 1. Clearly,

E(-X0,t) £ t maxs(A, B)

so that

gt ³ -(maxs(A, B))t = -Kt.

Suggested Citation: "Alignment Unknown." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 96
Suggested Citation: "Alignment Unknown." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 97
Suggested Citation: "Alignment Unknown." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 98
Next Chapter: Alignment Given
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.