Previous Chapter: The Finitely-Many-Sites Models
Suggested Citation: "Approximations for the Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 136

Mathematical Vignette:
Approximating Combinatorial Structures

Our mathematical vignette takes us from the world of population genetics to that of probabilistic combinatorics. We show how the Ewens sampling formula (ESF), whose origins in population genetics were described above, plays a central role in approximating the probabilistic structure of a class of combinatorial models. This brief account follows Arratia and Tavaré (1994), to which the interested reader is referred for further results. Our first task is to describe the combinatorial content of the ESF itself.

Approximations for the Ewens Sampling Formula

First, we recall Cauchy's formula for the number N(a) º N(a1,a2,. . .,an) of permutations of n objects that have a1 cycles of length 1, a2 cycles of length 2, . . ., an cycles of length n.

image                             (5.21)

1(A) denoting the indicator of the event A. If each of the n! permutations is assumed to be equally likely, then a random permutation has cycle index a with probability

image                              (5.22)

image

where Cj º Ci(n) is the number of cycles of size j in the permutation. Comparison with (5.4) shows that (C1,C2,. . .,Cn) has the ESF with

Suggested Citation: "Approximations for the Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 137

parameter q = 1. To give the permutation representation of the ESF for arbitrary q, we need only suppose that for some q > 0,

P(p) = cq |n|, Sn,                           (5.23)

where |p| denotes the number of cycles in the permutation Sn, the set of permutations of n objects. The parameter c is a normalizing constant, which may be evaluated as follows. The number of permutations in Sn with k cycles is image, the absolute value of the Stirling number of the first kind. Hence

image

so that c-1 = q(n). It follows that under this model,

image                          (5.24)

In summary, we have shown that q-biasing a random permutation gives the ESF.

The next ingredient in our story is the observation that the law in (5.24) can be represented as the joint law of independent Poisson random variables Z1,Z2,. . .,Zn, having E[Zj] = q / j, conditional on image

image                               (5.25)

where image denotes the law. This follows because

Suggested Citation: "Approximations for the Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 138

image

which agrees with (5.24) apart from a norming constant that does not vary with a1,a2,. . .,an; since both formulas are probabilities, the norming constants must be equal.

Equation (5.25) suggests that we might usefully approximate the dependent random variables C1,C2,. . .,Cn, by the independent random variables Z1, Z2,. . .,Zn,. This turns out to be too ambitious, but we can get away with just a little less. For any b Î [n] º {1,2,. . .,n}, we can approximate the joint laws of Cb º Cb(n) º (Cl,C2,. . .,Cn) by those of Zb º (Z1,Z2,. . .,Zb), with an error that tends to 0 as n ® ¥as long as b = o(n), that is, b / n ® 0.

As our measure of how well such an approximation might be expected to work, we use total variation distance dTV as a metric on the space of (discrete) probability measures. Three equivalent definitions of the total variation distance db(n) between (the law of) Cb and (the law of) Zb are given below:

image                     (5.26)

In (5.26), the infimum is taken over all couplings of Cb and Zb on a common probability space, and No º {0,1,2,. . .}. Arratia et al. (1992) use a particular coupling to show that there is a universal constant c = cq with c(1) = 2 such that

Suggested Citation: "Approximations for the Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 136
Suggested Citation: "Approximations for the Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 137
Suggested Citation: "Approximations for the Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 138
Next Chapter: Combinatorial Assemblies
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.