Page 51
· False negatives: one may fail to identify some proportion of the YACs containing an STS;
· False positives: some proportion of the YACs detected as containing an STS may not actually do so; and
· Chimeric YACs: some proportion of the YACs may not represent a single contiguous region, but two unrelated regions that have been joined together in a single clone.
Moreover, the occurrence of false negatives and positives may not be random but systematic (owing to deletions of clones or contamination of samples). In short, algorithms must be robust to errors in the data. Producing such algorithms is an interesting challenge that draws on methods from graph theory, operations research, and statistics. As of this writing, the best approach has not yet been determined.
Genetic and physical mapping are key tools for describing the function and structure of chromosomes. Only in the simplest cases is such mapping completely devoid of mathematical issues. In the case of human genetics, mathematics plays a crucial role.
In essence, mapping problemslike many problems in computational biologyinvolve indirect inference of the structure of a biological entity, such as a chromosome, based on whatever data can be effectively gathered in the laboratory. It is not surprising that mapping problems draw on statistics, probability, and combinatorics. Although the field of mapping dates nearly to the beginning of the 20th century, the area remains rich with new challengesbecause new laboratory methods constantly push back the frontiers of the maps and features that can be mapped in DNA.
Page 52


Page 53

Figure 2.10
Expected coverage properties for STS content mapping, as a function of the coverage a in YACs and b in STSs.
Calculations assume YACs of constant length L and a genome of length G. The graphs show (A) the expected
proportion of the genome covered by anchored ''contigs"; (C) the expected number of anchored contigs, and
(D) the expected length of an anchored contig. Graphs show the situation for a = 1,2,. . .,10 (only the cases
a = 1,5,10 are explicitly marked). Results are expressed in units of G/L. Table B lists the value of G/L for
certain representative genomes and cloning vectors, including two different sizes of YACs. Reprinted, by
permission, from Arratia et al. (1991). Copyright © 1991 by Genomics.