Page 42
Returning to the problem of colon cancer, we applied this analysis to the mouse genome, which has C » 20 chromosomes and genetic length G » 16. By genetic mapping, we found a striking region on mouse chromosome 4 for which Zmax = 4.3. The nominal significance level of the statistic is p = 1.7 ´10-5. After correcting for searching over an entire genome (by multiplying by 2G(Zmax)2, the significance level is p » 0.01. This suggests that there is indeed a modifying gene in this region of chromosome 4.
On the strength of this analysis, several additional crosses were arranged to confirm this result. With more than 300 animals analyzed, the results are now unambiguous: the corrected significance level is now < 10-10, and it appears that a single copy of the suppressing form of the gene can decrease tumor number at least twofold. Experiments are now under way to clone the gene, in order to learn its role in reducing colon cancer in genetically predisposed mice. With luck, it may suggest ways to do the same in humans.
Genetic mapping is only the first step toward positional cloning of a gene. Once a gene has been determined to lie between two genetic markers, the geneticist must produce a physical mapconsisting of overlapping clones spanning the chromosomal region between the two flanking markers. Traditionally, physical maps have been produced by the process of chromosomal walking: one starts with clone C1 containing one of the genetic markers, uses C1 as a probe to find an overlapping clone C2, uses C2 as a probe to find C3, and so on until the region has been spanned (Figure 2.6). Chromosomal walking is an inherently serial procedure, and each step may take several weeks (due to the laboratory procedures involved in making and using a probe).
This tedious process could be eliminated if one simply constructed a complete physical map of overlapping clones spanning the entire genome. The idea is more practical than it may seem at first glance. Whereas chromosomal walking proceeds serially, a physical map of an entire
Page 43

Figure 2.6
Schematic diagram illustrating chromosome walking. One starts by isolating a clone
C1 containing the initial starting point. C 1 is then used as a probe to isolate
overlapping clones, such as C2. The process is iterated to obtain
successive steps in the walk. Although at each step one isolates
clones extending in either direction, only those clones extending
the walk to the right are shown in the diagram.
genome can be constructed in parallel. The idea is to describe each clone C by an easily determined fingerprint F(C)which can be thought of as a set of "attributes" of C. If two clones have substantial overlap, their fingerprints should be similar. Conversely, if two clones have very similar fingerprints, they are likely to overlap. In principle, one should be able to construct a physical map by fingerprinting a large collection of clones and using computer analysis to compare the fingerprints and recognize the overlaps.
The choice of a fingerprinting method depends principally on laboratory considerations; certain types of clones are more amenable to certain types of analysis. Given a large collection of random subclones taken from a genome G, possible fingerprints include the following:
· Complete DNA sequence. For very small genomes such as those of viruses, it is practical to reassemble the genome from very short subclones of length ~300 to 500 base pairs. For such short subclones, the best fingerprint is the complete DNA sequence of the subclone. It turns out to be relatively easy to sequence such short subclones in one laboratory step, and the resulting sequence provides the most complete possible fingerprint of the clone. Using this information, one can attempt to find the overlaps and piece together the sequence. In fact, this is a widely used technique, referred to as "shotgun" sequencing (Figure 2.7). However, the method is effective only for genomes of length < 100,000 base pairs. For larger genomes (such as the genome of even the simplest bacterium), it is difficult to analyze enough subclones to ensure that the entire genome is covered
Page 44

Figure 2.7
Schematic diagram illustrating "shotgun" DNA sequencing assembly. To obtain the sequence
of a larger piece of DNA, one determines the sequence of random subclones and pieces
together the complete pieces based on the overlaps. In practice, the subclones are
considerably larger than those shown (typically 300 to 500 base pairs) and the
overlaps used in assembling the sequence are much larger.
(see the discussion of the coverage problem below). Moreover, the ability to reassemble the sequence is stymied by the frequent occurrence of repeat sequences, which hamper the recognition of overlaps. Nonetheless, shotgun sequencing of small subclones is the method of choice for sequencing moderate-sized DNA fragments.
· Restriction map. Larger genomes must be analyzed by studying larger subclones. Such subclones are typically too large to be conveniently sequenced. Instead, restriction maps can provide a useful fingerprint. Restriction maps show the positions of recognition sites at which particular restriction enzymes cut. For example, the restriction enzyme EcoRI cleaves at the sequence GAATTC. In effect, a restriction map is an ordered list of the restriction fragments in a clone. To make a restriction map, one can use the method of partial digestion: one radioactively labels one end of a clone, adds a restriction enzyme briefly so that only a random selection of the sites are cut, and measures the lengths of the resulting fragments (Figure 2.8). Restriction maps can be efficiently constructed for clones of moderate size (up to about 50,000 base pairs), although the procedure can be tedious and exacting. If two clones have restriction maps that share several
Page 45
consecutive fragments, it is a good bet that they overlap. With this strategy, Kohara and colleagues (1987) constructed a complete physical map of the bacterium Escherichia coli with a genome of 4.6 million base pairs using phage clones containing fragments of about 15,000 base pairs.
Restriction fragment sizes. Rather than constructing an ordered list of the restriction fragments, one can construct an unordered list. This turns out to be technically simpler, because one need not carefully control the rate of cutting as in partial digestion. Clones can instead be digested to completion and the fragment lengths measured. Although the unordered list contains less information, it can still provide an adequate fingerprint. For

Figure 2.8
Schematic diagram illustrating restriction mapping of a DNA fragment by partial
digestion. The DNA fragment at the top has several sites (denoted by E) that can
be cleaved by the restriction enzyme EcoRI. A large collection of molecules of
this DNA fragment is radioactively labeled at one end (denoted by a star) and
then exposed briefly to the restriction enzyme. The period of exposure is
sufficiently brief that the enzyme can cleave only about one site per
molecule, resulting in a collection of radioactively labeled fragments
terminating at the various E sites. The length of these fragments (and
thus the positions of the E sites) can be determined by gel
electrophoresis of the fragments and subsequent exposure
of the gel to x-ray film.
Page 46
example, Olson and colleagues (1986) used this approach to construct a physical map of the yeast Saccharomyces cerevisiae with a genome of 13 million base pairs.
· Content of sequence tagged sites. For very large genomes such as the human genome with 3 ´ 109 base pairs, it is necessary to work with large subclones of length > 100,000 base pairs. For such large subclones, a different fingerprinting strategy has gained favor in recent years. The method is based on sequence tagged sites (STSs), which are very short unique sequences taken from the genome which can be easily assayed by the polymerase chain reaction (PCR). The fingerprint of a clone is the list of STSs contained within it; the data form an incidence matrix of clones by STSs (Figure 2.9). Clones containing even a single unique STS in common should overlap. As an aside, the determination of which clones contain a given STS is typically made using a combinatorial pool scheme that avoids having to test each STS against each clone (Green and Olson, 1990). Using this approach, Foote et al. (1992) and Chumakov et al. (1992) constructed the first complete maps of human chromosomes (Y and 21, respectively).
Regardless of the experimental details of the fingerprinting scheme, there are two key mathematical issues pertinent to the construction of a physical map:
1. Algorithms for map assembly. Given the fingerprinting data, what algorithm should be used for constructing a physical map? This question is closely related to graph theory: given information about adjacency among clones inferred from their fingerprints, one must reconstruct the underlying geometry of the physical map.
2. Statistics of coverage. How many clones must be studied to yield a map covering virtually the entire genome? This question belongs to probability theory: assuming that subclones are distributed randomly across the genome, one needs to know the distribution of gapsuncovered regions or undetected overlapsin the map.