Page 106
l-exp(-l) = 0.135.
A bound on the error may be calculated in a way similar to that for coin tossing. We note that again b2 = 0, and by breaking the sum for b1into two sums, one of which is made up of all terms that involve the boundary, we find
b1 < l2 (2t + 1) / ((n - t + l)(m - t + 1)) + 2lpt.
Hence, the probability above is correct to within 8.5 ´ 10-7.
Now we bring these ideas to bear on our RNA evolution problem. We have a set of 33 tRNA molecules and one 16S rRNA molecule from E. coli. In Bloch et al. (1983), matchings between 16S and each of the tRNAs were intensely studied. tRNA evolution is a complex topic and tRNA/tRNA comparisons were not made in this study. Table 4.1 shows the length of the longest exact match Hn between these sequences, along with estimates of significance or p-values (1-e-ln) from our ChenStein method. There are no exceptionally good matchings in this list, and so this analysis discounts any deep relationship between the sequences. In fact the p-values seem unusually large. In the 33 comparisons the minimum p-value is 0.26. Still we should not give up the search. One estimate puts the origin of these sequences at 3 billion years ago. We should not expect large segments of sequence to be preserved in every position over such vast amounts of time. Instead, mutations such as substitutions, insertions, and deletions will accumulate, greatly complicating our task. It is possible that the hy-pothesis of common origin is correct and that so much evolutionary change has taken place that no significant similarity remains. The next section, "Two Behaviors Suffice," examines the results of this search for unusual similarity using more subtle sequence comparison algorithms.
Page 107
| Table 4.1 Exact Match P-Values | ||||||
| tRNA | GenBank | Length | Hn | 1 - e-ln | b1 | |
| ala-la | ECOTRA1A | 76 | 9 | 0.26 | 1.87 ´ 10-5 | |
| ala-lb | ECOTRA1B | 76 | 9 | 0.26 | 1.87 ´ 10-5 | |
| cys | ECOTRC | 74 | 8 | 0.69 | 2.67 ´ 10-4 | |
| asp-l | ECOTRD1 | 77 | 8 | 0.71 | 2.79 ´ 10-4 | |
| glu-1 | ECOTRE1 | 76 | 10 | 0.71 | 1.25 ´ 10-6 | |
| glu-2 | ECOTRE2 | 76 | 10 | 0.71 | 1.25 ´ 10-6 | |
| phe | ECOTRF | 76 | 9 | 0.26 | 1.87 ´ 10-5 | |
| gly-l | ECOTRG1 | 74 | 7 | 0.99 | 3.90 ´ 10-3 | |
| gly-2 | ECOTRG2 | 75 | 6 | 1.00 | 5.70 ´ 10-2 | |
| gly-3 | ECOTRG3 | 76 | 9 | 0.26 | 1.87 ´ 10-5 | |
| his-1 | ECOTRH1 | 77 | 9 | 0.26 | 1.89 ´ 10-5 | |
| ile-1 | ECOTRI1 | 77 | 9 | 0.26 | 1.89 ´ 10-5 | |
| ile-2 | ECOTRI2 | 76 | 10 | 0.71 | 1.25 ´ 10-6 | |
| lys | ECOTRK | 76 | 6 | 1.00 | 5.78 ´ 10-2 | |
| leu-1 | ECOTRL1 | 87 | 8 | 0.76 | 3.19 ´ 10-4 | |
| leu-2 | ECOTRL2 | 87 | 8 | 0.76 | 3.19 ´ 10-4 | |
| leu-5 | ECOTRL5 | 87 | 9 | 0.29 | 2.16 ´ 10-5 | |
| met-f | ECOTRMF | 77 | 9 | 0.26 | 1.89 ´ 10-5 | |
| met-m | ECOTRMM | 77 | 8 | 0.71 | 2.79 ´ 10-4 | |
| asn | ECOTRN | 76 | 7 | 0.99 | 4.01 ´ 10-3 | |
| gln-1 | ECOTRQ1 | 75 | 8 | 0.70 | 2.71 ´ 10-4 | |
| gln-2 | ECOTRQ2 | 75 | 8 | 0.70 | 2.71 ´ 10-4 | |
| arg-1 | ECOTRR1 | 76 | 7 | 0.99 | 4.01 ´ 10-3 | |
| arg-2 | ECOTRR2 | 77 | 7 | 0.99 | 4.07 ´ 10-3 | |
| ser-1 | ECOTRS1 | 88 | 8 | 0.76 | 3.23 ´ 10-4 | |
| ser-3 | ECOTRS3 | 93 | 9 | 0.31 | 2.33 ´ 10-5 | |
| thr-ggt | ECOTRTACU | 76 | 7 | 0.99 | 4.01 ´ 10-3 | |
| val-1 | ECOTRV1 | 76 | 8 | 0.70 | 2.75 ´ 10-4 | |
| val-2a | ECOTRV2A | 77 | 8 | 0.71 | 2.79 ´ 10-4 | |
| val-2b | ECOTRV2B | 77 | 9 | 0.26 | 1.89 ´ 10-5 | |
| trp | ECOTRW | 76 | 7 | 0.99 | 4.01 ´ 10-3 | |
| tyr-1 | ECOTRYI | 85 | 8 | 0.75 | 3.11 ´ 10-4 | |
| tyr-2 | ECOTRY2 | 85 | 8 | 0.75 | 3.11 ´ 10-4 | |
| Hn, the length of the longest exact match between the listed tRNA molecule and a 16S rRNA molecule; 1 - e-ln, the p-value (estimate of significance) for nth tRNA molecule; b1, column entry is the calculated bound on b1. | ||||||