Previous Chapter: OVERVIEW
Suggested Citation: "THE COALESCENT AND MUTATION." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 119

records the allelic partition of the sample, leads to the sampling theory of the infinitely-many-alleles model initiated by Ewens (1972). The Ewens sampling formula is then described, followed by a brief digression into the simulation structure of mutations in the coalescent, both in top-down and bottom-up form. Next, the infinitely-many-sites model is introduced as a simple description of the detailed structure of the segregating sites in the sample. Finally, we return to classical population genetics theory, albeit from a coalescent point of view, to discuss the structure of K-allele models. This in turn develops into the study of the finitely-many-sites models, which play a crucial role in the study of sequence variability when back substitutions are prevalent.

In the next section we digress to present a mathematical vignette in the area of random combinatorial structures. The Ewens sampling formula was derived as a means to analyze allozyme frequency data that became prevalent in the late 1960s. Current population genetic data is more sequence oriented and requires more detailed models for its analysis. Nonetheless, the combinatorial structure of the Ewens sampling formula has recently emerged as a useful approximation to the component counting process of a wide range of combinatorial objects, among them random permutations, random mapping functions, and factorization of polynomials over a finite field. We show how a result of central importance in the development of statistical inference for molecular data has a new lease on life in an area of discrete mathematics.

The final section briefly discusses some of the outstanding problems in the area, with particular emphasis on likelihood methods for coalescent processes. Some aspects of the mathematical theory, for example, measure-valued diffusions, are also mentioned, together with applications to other, more complicated, genetic mechanisms.

The Coalescent and Mutation

The genealogy of a sample of n genes (that is, stretches of DNA sequence) drawn at random from a large population of approximately constant size may be described in terms of independent exponential random variables Tn,Tn-1,. . .,T2 as follows. The time Tn during which the sample has n distinct ancestors has an exponential distribution with parameter n(n - 1)/2, at which time two of the lines are chosen at random to coalesce,

Suggested Citation: "THE COALESCENT AND MUTATION." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 120

giving the sample n - 1 distinct ancestors. The time Tn-1 during which the sample has n - 1 such ancestors is exponentially distributed with parameter (n - l)(n - 2) / 2, at which point two more ancestors are chosen at random to coalesce. This process of coalescing continues until the sample has two distinct ancestors. From that point, it takes an exponential amount of time T2 with parameter 1, to trace back to the sample's common ancestor. For our purposes, the time scale is measured in units of N generations, where N is the (effective) size of the population from which the sample was drawn. This structure, made explicit by Kingman (1982a,b), arises as an approximation for large N to many models of reproduction, among them the Wright-Fisher and Moran models. A sample path of a coalescent with n = 5 is shown in Figure 5.1.

image

Figure 5.1
Sample path of the coalescent for  n = 5. Tj  denotes the time during which
the sample has  j distinct ancestors.  Tj  has an exponential distribution
with mean 2/j(j - 1).

From the description of the genealogy, it is clear that the time tn back to the common ancestor has mean

Suggested Citation: "THE COALESCENT AND MUTATION." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 121

image

or approximately 2N generations for large sample sizes. Further aspects of the structure of the ancestral process may be found in Tavaré (1984). Rather than focus further on such issues, we describe how the genealogy may be used to study the genetic composition of the sample.

To this end, assume that in the population from which the sample was drawn there is a probability u that any gene mutates in a given generation, mutation acting independently for different individuals. In looking back r generations through the ancestry of a randomly chosen gene, the number of mutations along that line is a binomial random variable with parameters r and u. If we measure time in units of N generations, so that = [Nt]  (that is, r is Nt rounded down to the next lower integer), and assume that 2Nu® qas N ® ¥, then the Poisson approximation to the binomial distribution shows that the number of mutations in time t has in the limit a Poisson distribution with mean q t / 2. This argument can be extended to show that the mutations that arise on different branches of the coalescent tree follow independent Poisson processes, each of rate q / 2. For example, the total number of mutations µn that occur in the history of our sample back to its common ancestor has a mixed Poisson distribution—given Tn, Tn-1,. . .,T2, µn has a Poisson distribution with mean image. The mean and variance of the number of mutations are given by Watterson (1975):

image                      (5.1)

and

image                           (5.2)

We are now in a position to describe the effect that mutation has on the individuals in the sample.

Suggested Citation: "THE COALESCENT AND MUTATION." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 119
Suggested Citation: "THE COALESCENT AND MUTATION." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 120
Suggested Citation: "THE COALESCENT AND MUTATION." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 121
Next Chapter: The Ewens Sampling Formula
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.