Page 110
H(A, B; ¥, ¥) ~ b log n.
It is natural to ask if there are other growth rates. The answer is presented in Waterman et al. (1987) and Arratia and Waterman (1994), where the following result is proved: Assume both sequences have equal lengths n. There is a continuous curve in the nonnegative (µ,d) plane such that when (µ,d) belongs to F0, the same component as (0,0), the growth of H is linear with sequence length. When (µ,d) belongs to F¥, the same component as (¥,¥), the growth is logarithmic with sequence length. In any curve crossing from F0 to F¥ there is a phase transition in growth of the score H(µ,d). This behavior is quite general, and in Arratia and Waterman (1994) it is shown to hold with very general penalties for scoring matches, mismatches, and indels. The behavior of H(A, B; µ,d) when (µ,d) lies on the line between F0 and F¥ remains an open question.
How do the results in the previous section apply to our comparisons of 16S rRNA with tRNAs? As we have seen, the matchings of Bloch et al. (1983) were the result of applying a local algorithm, and so we will apply the local algorithm H to the data and study the distribution of scores. The first task is to compare the sequences using the statistic H(A, B; µ,d) with µ = 0.9 and d = 2.1. These values have been used in several database searches, and the growth of scores from aligning random sequences lies in the logarithmic region. The results of the algorithm applied to our data can be found in Table 4.2. No closed-form Chen-Stein method has been arrived at for alignments with indels, so the results are presented in number of standard deviations (#s) above the mean value for comparing two random sequences of similar lengths. (See Waterman and Vingron (1994) for recent work on estimating statistical significance.) The estimated mean as a function of the tRNA length is
H(A, B; µ = 0.9, d = 2.1) = 5.04 logn -30.95,
Page 111
| Table 4.2 Scores and Alignment Statistics | |||||
| tRNA | Score | # s | Matches | mms. | Indels |
| ala-la | 12.2 | -.02 | 14 | 2 | 0 |
| ala-lb | 12.2 | -0.1 | 14 | 2 | 0 |
| cys | 21.0 | 6.2 | 40 | 10 | 5 |
| asp-l | 10.8 | -1.1 | 22 | 8 | 2 |
| glu-l | 10.9 | -0.8 | 21 | 9 | 1 |
| glu-2 | 12.8 | 0.6 | 22 | 8 | 1 |
| phe | 13.0 | 0.6 | 32 | 10 | 5 |
| gly-1 | 9.4 | -1.4 | 15 | 4 | 1 |
| gly-2 | 9.5 | -1.2 | 35 | 15 | 6 |
| gly-3 | 14.4 | 1.5 | 41 | 14 | 7 |
| his-1 | 13.2 | 1.1 | 28 | 12 | 2 |
| ile-1 | 13.6 | 0.9 | 41 | 26 | 2 |
| ile-2 | 14.0 | 1.3 | 35 | 10 | 6 |
| lys | 10.7 | -0.5 | 23 | 7 | 3 |
| leu-1 | 13.8 | 0.7 | 49 | 28 | 5 |
| leu-2 | 11.7 | -0.7 | 33 | 17 | 3 |
| leu-5 | 13.4 | 0.4 | 36 | 14 | 5 |
| met-f | 12.0 | -.03 | 44 | 20 | 7 |
| met-m | 11.4 | -0.2 | 21 | 4 | 3 |
| asn | 15.3 | 2.4 | 33 | 13 | 3 |
| g1n-1 | 11.8 | 0.1 | 23 | 8 | 2 |
| gln-2 | 12.1 | 0.2 | 26 | 11 | 2 |
| arg-1 | 13.3 | 0.7 | 48 | 23 | 7 |
| arg-2 | 12.8 | 0.3 | 26 | 8 | 3 |
| ser- | 11.1 | -1.3 | 29 | 11 | 4 |
| ser-3 | 13.8 | 0.3 | 42 | 18 | 6 |
| thr-ggt | 10.1 | -1.3 | 15 | 1 | 2 |
| val-1 | 11.9 | -0.2 | 22 | 9 | 1 |
| val-2a | 11.3 | -0.7 | 14 | 3 | 0 |
| val-2b | 11.3 | -0.4 | 14 | 3 | 0 |
| trp | 11.0 | -0.7 | 22 | 10 | 1 |
| tyr-1 | 11.7 | -0.4 | 31 | 17 | 2 |
| tyr-2 | 10.9 | -0.9 | 42 | 19 | 7 |
| # s, the number of standard deviations above the mean value (for comparing the two random sequences of similar lengths); mms., mismatches; indels, insertions/deletions. | |||||