LOOKING BACK, THERE ARE A FEW OCCASIONS THAT YOU THINK OF as pivotal events that changed your life. For me, agreeing to sequence the worm genome was one of those, but in many ways the critical time was earlier. I was at a conference on developmental biology in the United States, listening to Matt Scott, a researcher from the University of Indiana who worked on the fruit fly Drosophila. Matt’s group had been working for some time to map the antennapedia region of Drosophila—a region of the genome where mutations cause the fly to grow a leg on its head instead of an antenna. Understanding mutations such as this is fundamental to understanding the normal process of development that puts legs and antennae in all the right places. They’d done it by chromosome walking: working sequentially along part of the chromosome. And just listening to him describe his beautiful work, I could see with stark clarity that if you could cover the genome in parallel instead of serially, then for only a few times more effort you could map the whole thing.
Mapping is a step towards understanding the flow of information from the coded instructions in our genes to the molecular
interactions that go on within and between our cells. It is these interactions that underlie all the functions of the living body: burning food, fighting infections, healing wounds, even running the 100 meters or composing a symphony. At the LMB we were trying to put together a picture of how a very simple animal, the nematode worm, is programmed to grow from egg to adult and to carry out its essential tasks of moving, feeding and reproducing. Those who worked on Drosophila were trying to do the same with a much more complex animal, one that has eyes and legs like us—and wings, too.
Achieving a fuller understanding of life in all its richness and complexity is immensely interesting for its own sake, just as it is worth struggling to understand the origins of the universe. But molecular biology—even of worms and flies—also offers huge potential spin-offs for human health. Right up to the end of the twentieth century much of modern medicine was based on a rather hit-and-miss approach to finding what works. Antibiotics are a classic example: Alexander Fleming’s accidental observation that a mould could kill bacteria led Howard Florey and Ernst Chain to extract and purify the active agent in the mould, penicillin, and use it to cure patients with infections. But thousands of lives had been saved before anyone understood how the drug killed bacteria. For the past couple of decades the aim of the pharmaceutical industry has been to replace this serendipity with a more rational approach in which treatments are based on what we know about how living things are put together. The techniques of molecular biology potentially make it possible to understand the difference between sickness and health right down at the level of one molecule interacting with another. And many of these molecular interactions are common to animals as simple as a worm and as complex as a human.
A crucial first step is to find the genes that control these interactions. Each gene is typically thousands of bases long, and its sequence of As, Ts, Cs and Gs usually encodes a protein. The code is a set of three-letter words—TTT, CAG and so on—each of which
corresponds to one of the twenty amino acids that are the building blocks of proteins. It’s the proteins—there are hundreds of thousands of them, but some better-known examples are insulin, pepsin, hemoglobin and keratin—that actually carry out the business of building and maintaining a living body. Each human gene has its place on one of the twenty-four chromosomes (numbered 1–22, plus the X and Y sex chromosomes), which together constitute the whole human genome. Some genes are on one strand of DNA and some on the other, read in opposite directions. Finding them is not easy. We now know that only about 1.5 percent of the total DNA actually codes for proteins. The rest is often pejoratively termed ‘junk’ DNA, though it is more accurate to call it non-coding DNA: much of it may well be junk, but scattered in it are all manner of control sequences to which signalling proteins can bind, causing genes to be turned on or off as required and defining the stop and start positions. These controls are vital, because nearly all of our 100 million million cells contain exactly the same DNA, yet each has a specialized job to do. When a gene is turned on, the appropriate stretch of sequence is transcribed into a single strand of another sort of nucleic acid called RNA. The RNA code is very similar to that of DNA, with minor chemical differences. The RNA transcript then moves out of the nucleus, and its nucleic acid code is translated into the amino acid chain of a protein.
The junk itself is a collection of fossils from our evolutionary history, which makes it interesting in the same way that a midden is interesting to archeologists. To make life even more complicated, the coding part of most genes is split into numerous segments called exons, separated by much longer non-coding sections known as introns. Some small genes on one strand can actually sit inside the introns of other, larger genes on the other. You can begin to see why finding genes is a far from simple matter.
Now, in order to find things it is useful to have a map. A map allows you to home in on the place you want and avoid going round
in circles. In the search for genes, researchers have used two kinds of map: genetic maps and physical maps. Genetic maps have been made for almost a century; the cartographers were the geneticists who bred generations of mutant fruit flies or other species in order to track the mutations through the generations and work out how closely they were linked (just as Sydney did with the worm mutants). None of this work involved handling the DNA itself; it was almost 1950 before people generally accepted that DNA was the stuff of inheritance, and until the early 1970s the tools were simply not available to begin to analyze it in detail. So a genetic map is an abstract entity that tells you the relative positions of genes on chromosomes. When geneticists say they have ‘found the gene’ for a particular trait or disease, they usually mean they have placed it on a genetic map.
Finding the actual piece of DNA that constitutes a gene requires a physical map. Physical maps are much newer than genetic maps. The essential tools to make them were restriction enzymes, used to cut the DNA into fragments a few tens or hundreds of thousands of bases long, and cloning techniques, enabling each fragment to be inserted into a bacterium and multiplied as the bacterium grew into a colony. A physical map is a collection of cloned DNA fragments that have been arranged in the right order along the chromosome by looking for overlaps between them.
Mapping becomes really powerful when you line physical maps up with genetic maps. Then, once some genes are located on particular fragments, you can make a good guess about where intervening genes are located, and so the whole system becomes more powerful as one proceeds. The combination of the two is a genomic map. With a genomic map, you don’t just know the location of a gene as an abstract point on a diagram; you know that a particular colony of bacteria in your freezer contains that gene within the clone it harbors, and so the sequence of the gene itself is within your reach.
By the 1980s genetic maps were good enough for human
geneticists to go fishing for genes among the families of people with inherited diseases such as Huntington’s disease or cystic fibrosis. The maps were staked out with markers, short sequences of DNA that come in two or three different varieties in the population. If it turned out that everyone in a family who had a particular variety of marker also had the disease, then there was a high probability that the gene responsible was on the same chromosome as that marker and close to it. Having narrowed down the location of the gene in this way, geneticists then switched to ‘walking’ to try to clone the gene with a view to reading its sequence and understanding how it works.
In order to walk part of the genome you lay out clones covering the whole genome (or sometimes a chromosome) on a membrane, and ‘probe’ them with a radioactively labelled sequence from one of the marker clones. You pick those clones that stick to the probe, and analyze them to see which extends furthest in the direction you want to go. At first, even knowing which way to go is unclear, so you have to go both ways. You then choose a piece of sequence from the far end of the new clone to use as a probe, put a radioactive label on it, and repeat. As you proceed, you find more markers, get the direction sorted out, and start to form a physical map. With walking, mapping is a serial process—you can’t map the next clone until you’ve done the one before it. It was slow, but it was this method that in the late 1980s and early 1990s tracked down a number of genes altered in disease, for example those for cystic fibrosis, muscular dystrophy and Huntington’s disease.
Matt Scott was doing the same kind of thing to try to track down the faulty gene that makes fruit flies grow legs on their heads. What I thought, listening to Matt’s talk at that conference, was that you could do the whole process in parallel. Instead of picking the clones one by one, you could take all of them and characterize the whole lot in some way that would allow you to detect overlaps in a computer. When I first had this thought I was unsure how to begin; I’d been looking at cells down my microscope for the previous eight years
and didn’t know anything about molecular biology. So I went to talk to Sydney and to one of his colleagues, Jon Karn, for advice about exactly what I would need to do to characterize the clones. Jon, who was working on the molecular genetics of worm muscle, suggested a method which we came to call fingerprinting. (Our method was not the same as the DNA fingerprinting developed two years later by Alec Jeffreys, and now used in forensic testing and paternity testing, but in both cases we were after a way of uniquely identifying a sample of DNA.) I was vaguely aware that I could fingerprint a clone by treating it with an enzyme that cut it at specific sequences and then sorting the resulting pieces by size. A standard lab technique for sorting mixtures of biological molecules is gel electrophoresis. Various jelly-like substances can act as molecular sieves. If you place the gel in an electric field the molecules move through it, sorting themselves out from smallest to largest (smaller pieces move faster than larger), giving a unique pattern of bands on the gel.
Jon’s method was slightly more sophisticated and gave better resolution, because it generated smaller fragments that could be run through a different sort of gel. It involved two different enzymes and a radioactive label that marked the ends of the pieces of DNA after the first cut; the second cut broke the DNA into smaller fragments ready to run through the gel. Exposing the gel to photographic film would give me a pattern of dark bands like a bar code, showing the positions of the radioactively labelled fragments. Each clone would have a unique bar code, because the enzymes would cut it at specific sequences that they recognized. The bands were not in order, but that didn’t matter. The idea was simply to look for partial matches between the bar codes of different clones, which would indicate an overlap. Once we had overlapping clones for the whole genome, we would have a complete map.
I saw that by digitizing the fingerprint information it would be possible to hunt for overlaps automatically in a computer. Instead of crawling along the genome clone by clone, you could map the whole
thing in one go. That was the point at which I discovered for myself the power of genomics. I had just finished the embryonic lineage, and was looking for something else to do. And at that point I was driven by an obsession that to map the worm genome was the right thing to choose. Just as the lineage had been a resource for developmental biologists, a genome map would be a godsend to worm biologists who were looking for genes. I wasn’t interested in looking for genes myself. Nor was I particularly keen to specialize further in the developmental biology of the worm by looking for lineage mutants. I was just captivated by the idea that here was another opportunity to map a big chunk of the biological landscape.
Simply gathering data without having any specific question in mind is an approach to science that many people are doubtful about. Modern science is supposed to be mostly ‘hypothesis driven’—you have a hunch about how the world works, and do experiments that ask if your hunch is right. If it is, you can make predictions about how the world might work in other, similar situations. My first studies of the worm lineage didn’t require me to ask a question (other than ‘What happens next?’). They were pure observation, gathering data for the sake of seeing the whole picture. Making a worm map would be the same. This is sometimes called ‘ignorance-driven’ or, more grandly, ‘Baconian science.’ The seventeenth-century philosopher Francis Bacon suggested a system for understanding the world that began with the accumulation of sets of facts, based on observation. Naturalists who collect and classify living species or astronomers who map the stars in the sky are examples of Baconian scientists. This kind of project suits me—it’s never bothered me that it doesn’t involve bold theories or sudden leaps of understanding, or indeed that it doesn’t usually attract the same level of recognition as they do.
First I had to find somewhere to work. I’d already relinquished my cell lineage space, but Sydney, who by this time had succeeded Max Perutz as director of the LMB, had a vacant room in the
ominous-sounding extension of the lab called Block 7. It was Room 6024, and that number is etched on my mind as the place where in 1983 worm genomics began.
Jon Karn told me that he’d heard via Bob Waterston that Maynard Olson was thinking along similar lines, in order to map the yeast genome. Maynard was working in the genetics department at Washington University in St. Louis. It was a year or so before I met him. An austere figure with heavy-framed spectacles, Maynard came from a very different scientific tradition from me, in which new methods were thoroughly worked through theoretically in advance before you ventured to start an experiment. I was always much more keen to get into the lab and try things out. But despite our differences it was straight away good to talk to a fellow believer, and Maynard also appreciated the benefits of a ‘mutual support group’ of genome enthusiasts. We kept in regular touch over the next few years; later, as the Human Genome Project gathered pace, Maynard’s capacity to analyze the possible outcomes of different courses of action made him a respected voice in the occasionally fractious debates about strategy that blew up from time to time.
Sydney Brenner and Jon Karn were supportive, of course, but most people were skeptical of this enterprise that had nothing directly to do with biological problems. Maynard told me that most people he talked to went round on a circular path of criticism. ‘First they say it can’t be done. Then when I’ve dealt with their objections, they say “Well, but there’s no point to it.” So I explain exactly why it’s valuable, and they say, “But it won’t work, because…”’ I was having the same experience. And indeed it wasn’t very successful at first. I remember in particular a Drosophila post-doc whom I knew coming in and saying, ‘What on earth are you doing this for?’ She was really quite angry. She looked at my messy filters; it obviously wasn’t working very well. She said, ‘You’ve got the embryonic lineage of the worm in your head. You ought to be sitting at the microscope picking worm mutants.’ I felt
like a little boy in short trousers, but I said, ‘I’m sure you’re right, but I want to do this—I think this is important.’ Another Drosophila researcher said dismissively, It’ll all be over in five years—we’ll have solved developmental biology in the fly.’ It was true that in a spectacular development in fly genetics, Christiane Nüsslein-Volhard and her colleagues in Tübingen, Germany had described a whole lot of Drosophila mutants that disrupted the normal development of the animal’s body plan. But I said, ‘I don’t think so; it’s more complicated than that. We’re going to need all the genes.’ And that was the reason for doing the map. I felt that we had to go for more than could be done by looking for mutants. With mutants alone, one would not be able to see everything; if a mutation was lethal, for example, or if more than one gene carried out the same function, the link from gene to function would be hard to analyze or undetectable. But even just speeding up the isolation of known genes was going to be valuable in itself.
The phrase that was in my mind was: ‘We’re going to clone all the uncs’—the uncoordinated mutants of the worm, of which Sydney and his colleagues had identified more than a hundred. The interesting thing about the uncs is that they can have all sorts of things wrong with them—it could be muscle, or nerve, or something else—but among them are going to be interesting, complex things about the worm that are the sorts of things that we might be interested in finding in higher animals. But people were cloning very slowly—it took a lab years to clone a gene. John White remembers how the excitement of discovery that pervaded the lab began to deteriorate as everyone began to try to clone genes.
Lab meetings became nothing but progress in mapping. It was absolutely mind-numbing. The field became less interesting because all people cared about was cloning and they forgot what the biology was behind it.
John and I had occasionally talked about this problem during our Friday evening sessions in the pub. And before long I realized that if we could make the map, we could then give the geneticists all the uncs—they’d all be cloned, just like that. And that’s how it turned out. Not quite as easily as I’d hoped, but a few years later we’d done it. All the uncs were cloned for anybody who wanted them, though it was easier to pull out the ones where there was a higher density of genetic markers that could be used to align the maps.
The mapping project did not really take off until Alan came. Alan Coulson had been a research officer working with Fred Sanger in the LMB on developing first RNA and then DNA sequencing techniques. Having been Fred’s assistant since leaving college in 1967, he had worked indefatigably in the late 1970s on developing and implementing Fred’s dideoxy, or chain termination method of reading DNA sequences.
Fred’s method is essentially the same one that all sequencers use today, although improved chemistry, miniaturization and automation have speeded it up immeasurably. He used a piece of single-stranded DNA a few hundred bases long as a template. The template was used to produce many complementary copies, by means of one of the polymerase enzymes that copy DNA in real life: the enzyme pairs Cs with Gs and As with Ts, all the way along the strand. Each copy began from the same point, defined by a short ‘primer’ fragment of DNA that was bound to that spot and nowhere else. The enzymes gradually added bases to the primer fragment from a pool of all four normal bases spiked with an altered, ‘dideoxy’ version of one of them. For each template there were four different reaction tubes, each with just one of the four dideoxy bases. When the enzyme constructing the copy randomly added a dideoxy base instead of a normal one, it stopped the chain. The four mixtures were run side by side through a gel to separate them by size, giving a set of ‘ladders’ with unevenly spaced rungs showing the positions of all the dideoxy bases relative to one another. Taking the four
ladders together, there was just one rung at each successive level, and by noting which tube each rung had appeared from Fred and his colleagues could tell which base it represented and so could ‘read’ the sequence.
Using this technique they successfully carried out several of the first ever whole-genome sequencing projects, each more ambitious than the last. They would begin by ‘shotgunning’ the genomic DNA into random fragments, and carry out a set of sequencing reactions for each fragment, so generating from each a sequence read. They had no advance knowledge of where each read would fall, and simply continued randomly until they had ‘coverage’ of tenfold or so—that is to say, there were 10 times as many reads as would be needed to cover the genome if fitted end-to-end precisely. Then, by looking for matches between the reads, they could ‘assemble’ them into a composite, which was the sequence of the genome. Additional work would be needed to fill gaps and resolve poorly read sections.
To begin with, they assembled the reads with pencil and paper in their notebooks. But it soon became apparent that this was a job for computers, and so Rodger Staden, an assistant from the LMB’s structural studies division, was ‘borrowed’ to write some of the first programs designed to look for matches in the sequence that would reveal overlaps between one fragment and another. The first organism to be sequenced was bacteriophage phiX174, a tiny virus that infects bacteria, with a genome of about 5,000 bases. They then moved on to the DNA in the human mitochondrion (the energy-generating part of the cell) with about 17,000, and bacteriophage lambda with nearly 50,000—each time increasing the size by about a factor of three. All these were successfully completed using the time-consuming manual methods of the day. Alan realized that genomes much bigger than these would be too unwieldy to sequence straight off with existing technology. Fred’s previous research officer Bart Barrell, who had joined Fred from school in 1963 and was now working independently at the LMB, was using the method for viral
genomes with up to a quarter of a million base pairs, but that seemed to be the limit for the moment.
When Fred Sanger retired in 1983, Sydney Brenner suggested that Alan Coulson might like to come and work with me. Alan was interested to hear that I was planning to map the worm genome, which at 100 megabases seemed at that time impossibly large for sequencing. Mapping seemed like a more realistic approach to understanding larger genomes, ‘a logical extension of what I had been doing with Fred’, says Alan. Fred had been a hero of mine since I read about his work on protein sequencing as a schoolboy in the 1950s, and it was a strange quirk of history that his assistant was about to become my colleague. Although we had worked in the same building for fourteen years, I’m not sure we had ever spoken before. Alan remembers that each division had its own characteristic style:
I saw the crystallographic people on the ground floor as sports jackets and brogues. The middle floor, cell biology [where John was] was much more sandals and beards. And Fred’s group, protein and nucleic acid chemistry, we were on the top floor—and we were just normal!
So Alan—tall, bearded, soft-spoken and self-effacing—came and met me, and we looked each other up and down. We started to chat about what he might do, and then I suggested we continue the conversation in the pub. For Alan, it was an early indication that working with me was going to be different from working for Fred, who was rather more formal in his approach. I’m not a good manager, but I do like partnerships; I assumed that we’d just share things and do everything together. And so it turned out. We worked so well together that we tended to be regarded as a unit, John’n’Alan. Once a technician in Bob Waterston’s lab in St. Louis was vainly looking for our e-mail address, and it turned out that
he was trying to find the address of someone called John N.Allen.
We settled into a routine. I prepared the clones that we would need as a source for the map. In order to make clones, you need a vector—a piece of DNA that can be opened up to accept an insert of worm DNA, and which will in turn carry the worm DNA into bacteria where it will be replicated as the bacteria multiply. For our map we used vectors called cosmids which could accept 40 kilobases of DNA. Then Alan fingerprinted the clones and ran the gels. Our strategy depended on digitizing the images so that they could be analyzed by computer, and that was my main task. I began by using a manual device developed by Rodger Staden. It involved accurately hitting each band with a stylus, with the computer automatically recording the position. Rodger also wrote us a simple program that would search for matches. Once we had the data, Alan would assemble the clones into ‘contigs’—Rodger’s word for regions of the genome covered by overlapping, or contiguous, clones. The idea is to end up with a small number of large contigs—in a perfect world it would be one per chromosome—although of course at the beginning of a project you have a very large number of small contigs. It’s very satisfying when the number of contigs starts to fall as they link up with one another.
We scanned hundreds of clones this way, but the method was clearly going to be too limited for the whole job. Scanning with the stylus was tedious and depended on the accuracy of the operator, and the output was just a list of matches with no means of manipulating those further. Alan recorded the contigs as pencil lines in his notebooks which he rubbed out and redrew as we found more matches. We needed an electronic alternative, and as by this time Rodger Staden was busy with his thesis, I decided to make it myself. I learned some Fortran (at that time the standard scientific programming language), talked to Rodger, and, just as Sydney had, found myself completely hooked on programming. What I built was a graphics-based program called contig9 that represented the clones
by lines on the computer screen. All Alan had to do was position two contigs, press a key and they would join. Later I made the matching part more automatic. The program gradually grew into a large piece of code, developing from what Alan and I found we needed as we went along.
We also needed a way of automatically reading in the data from the films. This was more difficult. There were no commercial scanners available then, so the LMB workshop built one. I took on the job of putting together the associated software. It was important to have an automatic program that could read the bands in the sample lanes and decide how they lined up with the marker lanes— which had many bands, rather like a ruler—that we ran alongside as a point of reference. It wasn’t easy because each gel is slightly different; they tend to distort slightly, so that the positions are never exact.
At the time I was wrestling with this problem, John White was supervising a very bright graduate student, Richard Durbin. Richard is now the deputy director of the Sanger Centre and has played a key role in developing the software without which large-scale genome projects would have been impossible. He is immensely thoughtful and weighs his words carefully before committing himself. (One day—this was years later, when we were at the Sanger Centre—I was chairing a management meeting, and, having had my say on one of the agenda items, popped out to the copier to make some duplicates for the next one. When I came back the room was silent, so I assumed they’d finished, dished out the copies, and moved on. After a moment, the others gently interrupted me to say that Richard was in the middle of a sentence. He gets away with it, because everyone knows it’s worth waiting to hear what he has to say.)
When I started work on the worm map Richard was doing a Ph.D. on the nervous system of the worm, but he had a mathematics degree and had worked for a year before that in the computing industry. While doing his Ph.D. he had written the software for the
confocal microscope which John White had designed, a beautiful piece of work. I took him my problem with the lining up of the bands, and he solved it in no time by contributing a dynamic programming algorithm that considers all possible matches and picks the most likely one. I built in nice, easy-to-use editing arrangements so that I could look at the scan, check immediately whether the algorithm had picked the bands correctly, and, if not, make a small edit. I learned how powerful it is to have an interface where the computer does all the clerical work for you and presents an easy-to-edit image. There was nothing theoretical or fancy in the software, but it was just what we needed to get the job done.
It was very important to us from the start that the whole community of worm researchers should be involved in what we were doing. Without this small but enthusiastic group (at that time there were probably not more than a couple of hundred worm people in the world—maybe a tenth of the number working on the fly), the information in the map would be meaningless. They would provide the genetic markers from known locations, so that the physical map could be aligned with the genetic map. Once enough of these markers appeared on the map, people would know where to look for particular genes that they were interested in—and we could supply them with the clones that they needed to make the search.
One of the leading members of the worm community was Bob Waterston. After training in medicine at the University of Chicago he had come to the LMB as one of Sydney’s post-docs in 1972. He had worked on uncs that had mutations affecting their muscles, and had gone back to set up his own lab at Washington University in St. Louis, using the worm to study the genetics of muscle. ‘It was clear that the way we were going to learn about muscle was by cloning genes,’ he says. ‘But it was really slow going—we needed a new means to do it.’ In the laboratory next door to Bob, Maynard Olson was making progress with his yeast map. Bob and
Maynard were good friends, so Bob was aware early on of the idea of taking a systematic approach to understanding an organism’s biology, and he was excited when he heard what we were up to. When he had the opportunity to take a sabbatical, he wrote to ask Sydney if he could come back for a year. He arrived to find that there was nowhere to sit in Sydney’s main worm group, where he had planned to work on embryonic muscle mutants. And so he, too, came to Room 6024.
Bob is lean and muscular, with a moustache and a fringe of wavy red-gold hair around a bald crown. His most outstanding characteristics are his unfailing amiability and his total honesty; he’s also clever and wise. This was the second time he had come back on sabbatical, but during his previous LMB visit I was so absorbed in the lineage that I didn’t socialize much, hardly ever going to communal coffee and tea sessions, so I hadn’t seen much of him. And then in 1985 he came and just sat there with Alan and me, and we talked about the genome.
We had a problem at the time Bob came. For reasons we didn’t understand, some parts of the worm sequence just refused to clone as cosmids, so that there were gaps in our map—about 700 of them. Worm biologists were already finding the longer contigs useful, but with so many gaps the smaller pieces were unattached. We needed to join the map up fully. Bob experimented with various methods of linking the clones while he was with us, but none of them worked. Towards the end of his year in Cambridge he went back to St. Louis to see how his lab was doing. While he was there, he dropped into Maynard Olson’s lab and discovered that one of Maynard’s post-docs, David Burke, had found a way of cloning much longer pieces of DNA in yeast cells. He called these clones yeast artificial chromosomes, or YACs. Bob thought that YAC clones might be able to bridge the gaps in our cosmid map. Not only were they longer than cosmids, but, more importantly, they grew in a different host: yeast keeps its genetic material coated with protein and contained in a
distinct nucleus, like the worm and all higher organisms but unlike bacteria. We thought that the problems with cosmids might be that certain worm sequences were rejected by bacteria, and that these sequences might survive in yeast. Almost at once Bob offered to make a YAC library of the worm genome and to become a partner in our mapping project. ‘By that time I had become taken with the idea of the power of the map,’ he says, ‘so I was happy to participate.’ So began another rewarding and fruitful research partnership that endures to this day. Not that we had any idea at the time where it would lead: Bob was just offering to join in on the mapping problem. He made the YACs and sent them to us, and we sent cosmids to him, and we each did experiments looking for matches between the clones to find out which YACs could bridge the gaps in our cosmid map.
Just then Yuji Kohara came from Nagoya University for a sabbatical year to learn about the worm. He had just completed a wonderful map of E. coli, using a different method of fingerprinting that he had devised. He had really come to start working on gene expression, but for a few months he willingly joined in, and in that time we aligned all Bob’s YACs with the cosmid contigs. We continued steadily making joins, and soon most of the map was in large contigs. Our suspicion that bacteria didn’t like parts of worm DNA turned out to be correct.
At this point the genome map became truly useful—and the community of worm biologists came into their own. They used the map to find the genes not just as abstract locations but as physical pieces of DNA. With these in hand they could carry out recombinant DNA experiments to find out how the genes worked, study the expression of the genes in different tissues, make antibodies to the gene products—all the techniques of modern molecular biology. The genes also helped us by providing new landmarks on the map: it was a virtuous circle. And we in the genome labs had determined quite formally among ourselves that we would
not use the mapping information for our competitive advantage, in terms of searching for genes. We realized that if we waited until the map was complete before we published it, we would be sitting on a lot of information that was of value to the community. So almost from the start we began to make the mapping data available electronically over the predecessor of the internet. Every so often I would put all the latest map data onto a computer tape and post it to Bob. He would load it on to the computer at Washington University, and then set it up so that people could access it from their local computers. We made the data available to researchers in Europe in the same way. The reason for storing the information in more than one place was that communications were so slow that it made a real difference how far away the database was. This was long before the World Wide Web; we were using bitnet, which could take only small packages, and sometimes, in extreme cases, they took a month to arrive. So I developed a system of incremental updating, to avoid having to send the whole thing on tape every time.
The map was constantly on display. We had regular updates in the Worm Breeders’ Gazette, the informal newsletter of the worm community; we showed it at conferences; and anyone could request clones at any time, free, immediately, whatever they wanted, so that they could look for genes. There was too much work for Alan and me, and we recruited Ratna Shownkeen—slender and gracious, she is now a project leader in the Sanger Centre. The traffic in worm clones continues unabated to this day. There is no doubt that the science progressed faster because of this two-way exchange of information than it would have done if we had tried to keep the map to ourselves. And the way we handled the worm map set a precedent for our handling of the worm sequence data when it began to flow, and, ultimately, of the human sequence data.
For the whole family, this was a time of moving on. Our children were growing up and about to leave home. Ingrid went to read
biochemistry at Leeds, while Adrian in his turn went to study mathematics at Warwick. And for me, the map meant a different way of working. Previously I had always avoided responsibility for anything other than my own work. But the map was not something I could do on my own; it involved commitment to others, such as Alan Coulson and Bob Waterston, to keep the show on the road. At the same time some changes occurred in 1986, the year we published our first paper on the worm map, that propelled me to greater independence. First, Sydney successfully proposed me for election to the Royal Society, the U.K.’s national scientific academy. Although I always hoped that the cell lineage work would be useful, I didn’t expect that it would be valued to such an extent by the wider scientific community. Second, Sydney himself retired as director of the LMB and started a new MRC Molecular Genetics Unit where he would work on human DNA (and later with the DNA of the puffer fish Fugu, which has a remarkably compact genome with very little ‘junk’). This meant a moving apart, because my place was in the LMB and with the worm—and a recognition that from now on I would have to carry my own can.
We embarked on the worm map because it would be a useful tool in itself—there was no need to justify it as a stepping stone to the next step, which would be to read the complete worm sequence. But it certainly wouldn’t be true to say that we never thought about sequencing. Bob Horvitz remembers a particularly drunken evening at a Cold Spring Harbor worm meeting in the mid-1980s, when we had embarked on mapping the worm.
There were four of us who decided to sit in Blackford Hall [the building that houses the Cold Spring Harbor dining and reading rooms] and think about whether it was realistic to determine the sequence of C. elegans. And it was John Sulston, and me, and Gary Ruvkun [a colleague of Bob’s], and Winship Herr, who was at Cold
Spring Harbor [he is now the deputy director] and supplied the beer. The four of us sat there basically talking through whether, given the technology of the day, it was possible for a small number of people, i.e. John, to determine the sequence of the animal. And we made assumptions, we did calculations, and at the end of the discussion we decided yes, it was feasible. That was unthinkable in the general community. The numbers were just too big, I mean 108 base pairs! It might have taken a few years, but we agreed it was not impossible and that John should do it.
Bob claims that the following morning I showed no recollection of this beer-fuelled discussion, but I suppose something must have lodged in the back of my mind. Maynard Olson also remembers similar discussion over dinner when he was on a visit to Cambridge during Bob Waterston’s sabbatical in 1985–6.
We talked about genomes and where all this was headed. What stuck with me was that this was the first time John expressed a strong desire to move on to sequencing, not just in the worm but generally. He had a better feeling than I did that it might be feasible to sequence genomes on some kind of timescale that was relevant to our discussion.
From the beginning of the 1980s the idea of moving from mapping to sequencing genomes was being aired in relation not only to small organisms but to the human. Sequencing would provide the ultimate in biological information. Already the sequencing of individual genes was revealing an intriguing picture: there was a very high level of similarity between worm, fly and human genes that did the same jobs. Throughout evolution, mechanisms that work seem to have been conserved almost unchanged. Genome sequencing would provide the means to extend these comparisons, using the simpler organisms as a window into what goes on in human cells.
But if the worm seemed beyond reach at 100 megabases, how could anyone do the human at 3,000 megabases? One of the first to dare to think on this scale was Robert Sinsheimer, who was chancellor of the University of California at Santa Cruz. Sinsheimer was a molecular biologist who had isolated, purified and mapped the DNA of the phage phiX174, the very first organism to have its whole genome sequenced (by Fred Sanger and his colleagues). By this time more of an administrator than a bench scientist, he had been involved in the efforts of the University of California’s astronomers to raise the funds for a huge new telescope. It was ultimately the California Institute of Technology that secured the crucial donation, from the Keck Foundation, to build the Keck Telescope on Hawaii at a cost of more than $70 million. As a result, the University of California lost a $36 million donation from the Hoffmann Foundation, which had earlier hoped to name the telescope. In 1984 Sinsheimer began to wonder why such large sums should not be raised for biology. ‘I wondered if there were scientific opportunities in biology that were being overlooked, simply because we were not thinking on an adequate scale,’ he said.
He conceived the idea that a sufficiently ambitious biological project might be able to win back the Hoffmann donation for the University of California. He consulted colleagues in the biology faculty about the feasibility of an Institute to Sequence the Human Genome and so put Santa Cruz ‘on the map.’ Among those he consulted was Bob Edgar, a worm biologist who had worked in Sydney’s lab and knew at first hand what Fred Sanger had achieved. A letter went off to Cambridge seeking Fred’s opinion. He replied, ‘[It] will probably need to be done eventually, so why not start now? …I think the time is ripe.’
As a first step, Sinsheimer convened a workshop at Santa Cruz in May 1985 to which he invited a mixed group of around a dozen scientists who had some expertise in DNA mapping, automated sequencing or data management. Sydney was on the list but couldn’t
go, so he sent me and Bart Barrell, head of large-scale sequencing at the LMB. Others included Walter Gilbert of Harvard University, and Leroy Hood from the California Institute of Technology, whose laboratory was making steady progress towards automated DNA sequencing machines that used fluorescent dyes instead of radioactivity to label the fragments. I felt amazed that we were all sitting there discussing making an attack on the human genome. But at the same time I felt confident that we knew what we were doing in mapping the worm, and that if necessary we could scale up the approach to map the human. In retrospect I was probably wrong to think that we could have done it in cosmids—there are extra problems in cloning human DNA that we didn’t have to worry about in the worm—but in principle I would have been happy to sign up to mapping the human at that point, because I knew we would have found a way.
Sinsheimer recalled that ‘as we analyzed the problems to be solved and the likelihood of progress towards their resolution, the mood of the participants swung from extreme skepticism to confidence in the feasibility of such a program.’ Whether it should be done was another question: there was a lot of suspicion of the ‘big science’ approach, and some doubted the value of sequencing the 98 percent or more of the genome that does not code for protein. But overall the conclusions of the workshop, which Sinsheimer wrote up in a short report, were positive. He thought it made sense systematically to develop genetic and physical maps of the genome, and singled out the worm map as evidence that ‘the technique clearly would permit development of a physical map for the human genome within 3–5 years by a reasonably sized group (20 people).’ At an estimated rate of 100,000 bases per person per year (and this was itself a very optimistic figure; Bart put the productivity of his own group at about half that, and he was one of the most experienced sequencers in the world), a complete human sequence was not deemed feasible. Instead, the report suggested that the emphasis should be on
sequencing ‘regions of expected interest’, such as genes and genetic markers, until expected improvements in technology for high-throughput sequencing became available.
Robert Sinsheimer, thwarted by the internal politics of the University of California system, never got his genome institute. But he circulated the workshop report to United States funding bodies such as the Howard Hughes Medical Institute, the Department of Energy and the National Institutes of Health, where it added to a grounds well that was beginning to emerge in favor of making a concerted approach to the human genome. One of the leading advocates was Walter Gilbert, who launched what was to become a tradition of hyperbole in the field by calling the total human sequence ‘the grail of human genetics…an incomparable tool for the investigation of every aspect of human function.’ Over the next few years a series of meetings ensued. Charles de Lisi, head of the Department of Energy’s Office of Health and Environmental Research, convened a workshop at Santa Fe in February 1986, and went on to draw up plans for a genome sequencing initiative that would give new purpose to his department’s national laboratories (their interest in genes came out of studies of the effects of radiation). In June the same year a Cold Spring Harbor symposium on ‘The Molecular Biology of Homo sapiens’ found the idea being discussed for the first time in front of a large audience, more than 300 of the world’s top human geneticists and molecular biologists. At an informal discussion convened there at the last minute, Walter Gilbert’s estimate that the project could cost $3 billion ($1 per base) caused uproar: many of his listeners assumed that funding for biological research would essentially be diverted to this one goal, leaving nothing for the traditional, bottom-up approach to science funding that favored individual innovation.
But although there were reservations within the scientific community, the United States Congress had by now seized on the idea with enthusiasm, so that the momentum for a structured approach
to the genome was unstoppable. The National Academy of Sciences (the United States equivalent of the Royal Society), through its National Research Council, set up a panel to examine the whole question of an international genome effort. It was chaired by Bruce Alberts, a molecular biologist from the University of California at San Francisco (now president of the National Academy of Sciences), and its members included Jim Watson, Sydney Brenner, Walter Gilbert and many other luminaries in the field. Meanwhile the National Institutes of Health (NIH) was unhappy about the Department of Energy taking the lead and became increasingly involved in discussions about the funding of a large-scale genome initiative.
As part of this process I was invited to a workshop convened by the United States government’s Office of Technology Assessment in the summer of 1987 to talk about the potential costs of the project. I found myself suddenly drawn into a sharp exchange between Jim Watson and Ruth Kirschstein, who at that time was head of one of the NIH’s component institutes, the National Institute of General Medical Sciences. This institute had been given responsibility for dispensing grants for genome research, but Jim felt that there needed to be a much more proactive approach if the project was to have any coherence. He specifically argued that one person, and a scientist rather than an administrator, should be put in charge of the program, and unexpectedly turned to me for support. ‘Doesn’t one person really have to finish up that last 10 percent and live or die for the thing?’ he asked. I demurred, worried about giving so much power to one person, but Jim countered, ‘Someone has got to do it.’ I accused him of wanting to do it himself, which, as the true politician he is, Jim neither confirmed nor denied.
At a meeting on strategy for the genome the following February in Reston, Virginia, Jim again urged that the genome project should be headed by an active scientist. In his absence, several of his fellow participants told the director of the NIH, James Wyngaarden, that
Jim himself was the only credible choice. In May Wyngaarden called Jim to propose that he head an Office of Genome Research; Jim accepted, and the appointment was confirmed that October.
Many people have since wondered why Jim should have wanted to exchange the relative tranquillity of his life as director of the Cold Spring Harbor Laboratory for the bearpit of Washington politics; but, as he himself explained to his Cold Spring Harbor colleagues, ‘I would only once have the opportunity to let my scientific life encompass a path from the double helix to the three billion steps of the human genome.’ After receiving a favorable report from the Alberts committee, the United States Congress decided to fund genome programs at both the Department of Energy and the National Institutes of Health, but with Jim in charge at the latter there was no question which would be the senior partner. For his first year he had only a planning and advisory role, but in 1989 the Office of Genome Research became the National Center for Human Genome Research, with its own annual budget of almost $60 million. The Human Genome Project officially began in 1990, with a target of a complete human sequence by 2005. Its initial goals were to develop methods and technology through smaller-scale projects, such as the sequencing of simple organisms, before beginning a full-scale assault on the human genome itself. At Jim’s urging it also included a program of research into the ethical, legal and social issues raised by genome sequencing.
Although agencies in the United States were furthest ahead in committing serious money to genome research, genome projects were also starting up on a variety of scales in many European countries, as well as in the Soviet Union and Japan. Indeed, a Japanese program had been set up in the early 1980s, recruiting support from a number of private technology companies, to build an automated sequencing facility, and it had been partly a perceived need to keep up with the Japanese that had motivated support for the genome project in the United States. Scientists from the U.K., which had a
research record in molecular biology out of all proportion to the country’s size or financial resources, were drawn into the earliest discussions about a coordinated human genome project. Bart Barrell and I represented the LMB at the 1985 meeting at Santa Cruz. Walter Bodmer, director of the Imperial Cancer Research Fund laboratories in London and internationally recognized for his work on the genetics of the immune system in relation to cancer, gave the keynote speech at the 1986 Cold Spring Harbor meeting and soon afterwards chaired another critical discussion at the Howard Hughes Medical Institute. Sydney Brenner sat on the National Academy of Sciences panel that built the framework of the Human Genome Project. As well as pushing the project in the United States, Bodmer and Brenner were both active in generating support for genome research at home. Sydney persuaded the MRC to launch a U.K. Human Genome Mapping Project, and was instrumental in the bid made to the Prime Minister Margaret Thatcher, prepared by Professor Keith Peters of her Advisory Committee on Science, for extra government funds.
I wrote a little note that went in [to the committee], and that became the project. I said we should catalyze and expand the work already being done, especially developing work on computers. We finally got the money; it was not very much, but I thought it was a great accomplishment to get extra money out of Mrs T.
While waiting for the government money to come through, Sydney got started in 1986, using his own funds, working first at the LMB and later at his Molecular Genetics Unit. In February 1989 the U.K.’s Department for Education and Science announced an £11 million grant to the MRC over three years to support the project, with the promise of more to follow. The U.K. project’s priorities were very much focused on mapping protein coding regions, especially those that were relevant to disease; its ambitions, like its
funding, did not extend to mapping the whole genome and certainly not at this stage to sequencing it.
Walter Bodmer and Sydney Brenner were also instrumental in setting up an international organization with the aim of coordinating genome research. The Human Genome Organization, known as HUGO (Sydney’s idea) emerged from a discussion at the 1988 Cold Spring Harbor meeting on genome mapping and sequencing, and was formally founded at a meeting in Switzerland later that year. Bodmer was elected first vice-president and later president. HUGO adopted from the outset a more elitist than inclusive philosophy, with membership by election only. I was ‘elected’ quite early on—I think Sydney put me in—and I was quite pleased to be associated with the organization; but as I became more involved in it I felt more and more that these people were interested primarily in medical genetics rather than the wider biological importance of genomes. They did not see sequencing the whole genome as the central thing, whereas as far as I was concerned it was going to change everything.
What HUGO did do was to organize regular single-chromosome mapping workshops, at which everyone looking for genes on the same chromosome got together and argued with one another about the positions of markers. It also performed the valuable service of coordinating the way new genes were named, and it collected all this information in a Genome Database, originally set up at Johns Hopkins University in Baltimore. Managing the information from genome projects was going to be the key to the success of the enterprise, and the database was in itself a valuable resource. But the genetic mapping data it held could not easily be integrated with the data on gene sequences that was already being collected by another public database, GenBank, funded by the NIH, and its sister databases the European Molecular Biology Laboratory Data Library in Heidelberg and the DNA Data Bank of Japan. The other problem that HUGO never fully resolved was that as an international organization of individual scientists it found it very hard to attract
funding. It got started on a grant from the Howard Hughes Medical Institute, and the Wellcome Trust also came in with substantial funds in 1990; but the sums were small in relation to HUGO’s ambitions, and other than its coordination of the chromosome workshops and larger biennial meetings, it was never able to establish a position of leadership in the direction of genome research. Inevitably the Human Genome Organization and the Human Genome Project have become intertwined in the public mind; but in practice HUGO played little role in the effort to produce a complete human genome sequence. The push for the genome ultimately came from molecular biology rather than genetics.
The framework of the Human Genome Project emerged under Jim Watson’s leadership essentially along the lines recommended by the panel chaired by Bruce Alberts. It would begin with the necessary groundwork and only later move on to large-scale sequencing. The groundwork consisted of making genetic linkage maps, then physical maps that were aligned to the genetic maps; developing new sequencing technologies to increase speed and reduce costs; and testing methods on smaller organisms before moving to the human. Although I had been invited to attend two of the critical meetings that shaped the development of the Human Genome Project, I did not initially think it would have any direct impact on what I was doing. But with hindsight it was obvious that Bob Waterston, Alan Coulson, and I were already doing most of the things that the project was setting out to fund in its early days. We were making a physical map, working with the worm community to align it with genetic maps, and developing technology to allow us to generate data faster and in a form that was easy to use. And in 1989, the year the United States government put its financial weight behind a coherent genome sequencing program, we reached the point in our own project when sequencing suddenly seemed not just possible but the only thing to do. Our map was essentially complete—with the
help of the YACs we had got most of the clones into big contigs and were steadily closing the remaining gaps. We presented it at the biennial Cold Spring Harbor C. elegans meeting in May. Alan printed it all out on A4 sheets and taped them together, side by side, to form a series of banners, one for each of the six chromosomes. These he stuck one above the other, right across the back of the Bush lecture theatre; and there they stayed throughout the meeting. ‘It was impressive,’ says Bob Waterston. ‘We had a lot of continuity, and we knew how all the pieces lay on the chromosomes. All the other participants at the conference were talking about how they were using data from the map. And that was enormously rewarding.’
Earlier in the year I’d become aware that there were conspiratorial rumblings going on. I was getting messages that Jim Watson was interested in sequencing model organisms before embarking on the human project, and that we had better get our act together if we wanted to participate. Jim had learned of our worm mapping project from Sydney at a Howard Hughes Medical Institute meeting in June 1986, where momentum in support of a human genome sequencing project had begun to build. With the genome project now officially under starter’s orders, he was wondering how to promote sequencing. He saw very clearly that the way to convince people of the value of the project, as well as to drive the technology, was to start small. That’s the way biology is always done: you don’t study humans right off, you begin with something small. By now a series of viruses of increasing size had been sequenced, mainly by Fred Sanger and Bart Barrell, and the time had come to take a hundredfold leap to animals. Jim also knew that it was important to recruit successful labs to the project, both to make rapid progress and to gain the respect of the community. A successful model organism sequencing project would not only act as a trial run for the human, it would perhaps persuade the wider biological community that the genome project was a good idea and not of benefit only to human geneticists.
Our main source on Jim’s thinking was Bob Horvitz, my old lineaging partner, who now ran a large worm laboratory at MIT. Bob had known Jim from his time as a Harvard graduate student, and was still in close touch:
In my conversations with Jim he made it perfectly clear to me that he was thinking of simple organisms, and certainly worms were amongst them. But he wasn’t convinced that the worm community had pulled it together enough to launch the operation in the way that would really get the sequence done.
Bob came back to me with the message that there was a list of model organisms that were going to be sequenced, which would include the fruit fly, because it was ahead of the worm in terms of individual genes sequenced. But the worm hadn’t quite got on the list—the worm hadn’t made it! And that, of course, made us enormously determined to go and get the worm sequenced. Bob Horvitz didn’t want to get involved in sequencing himself, but he clearly thought I should. ‘We can’t muck around—it’s a real opportunity and we could lose it,’ he told me urgently when we met at another meeting earlier in 1989.
Bob Horvitz now thinks (and Bob Waterston and I agree) that Jim’s doubts about the worm were a calculated ploy to sting us into action.
My impression was that Jim believed the best prospect for serious sequencing was John, and the best way to get John moving was to make him think that if he didn’t move the money would go someplace else—like the fly.
If that was Jim’s plan, then it worked. We detailed Bob Waterston, who was going to be at a meeting on muscle at Cold Spring Harbor in April, to make sure Jim would be at the worm meeting and would
make time to see us. We made sure that he saw our map at the worm meeting. And we booked a private meeting with him in his office for the Saturday evening, just before the close of the conference. We’d had a somewhat hasty discussion beforehand about how to play it, but hadn’t begun to think about the details of who would do what. Still, there we were: Bob Waterston, Bob Horvitz, Alan Coulson and me, suddenly doing a deal with Jim Watson about how to start on worm sequencing.
As the meeting was clearly going well, at one point I deployed a tactic that the others had agreed to in advance. ‘Look,’ I said, ‘if you just give us $100 million we’ll have it done by 2000.’
Jim barely blinked. He just said, ‘That’s not the way we do things in this country.’
‘Why not?’ I wondered privately. But in the end we came to an agreement: between us we would sequence the first 3 million bases (out of 100 million) for $4.5 million during the next three years to show that we were capable of doing it. The work would be divided equally between my lab in Cambridge and Bob Waterston’s lab in St. Louis; I would have to seek funding from the Medical Research Council in the U.K., but the National Institutes of Health would fund a third of my costs during this pilot phase to help the British side get started. Jim told us he would be able to justify giving NIH money to a British group for a pilot study on the grounds that it would buy the United States access to the LMB heritage. Always an internationalist, he also believed that the project would be stronger for having an equal partner outside the United States.
At that time 3 megabases was a ridiculous amount of DNA to contemplate sequencing. For comparison, Bart Barrell was just nearing the end of his sequence of the human cytomegalovirus, which at around 240,000 base pairs was the largest genome sequenced to date. It took him five years. But we said, ‘All right, we’re game; we’ve got the clones, we’ll see what we can do.’ None of this was in writing at this time. I didn’t go off with a check from
Jim, but I did go off with his verbal promise, and it worked out exactly the way he said. Impressive.
I returned from the United States in tremendous excitement, and came straight from the plane to the LMB to find Aaron Klug, who had replaced Sydney as director. Aaron is a structural biologist, a mild-mannered but decisive man who ran the lab in the same way Max Perutz had—as a chairman rather than a director. He has always been very supportive of me, but when I said, ‘We’re going to sequence the worm,’ his initial response was not encouraging. He said, ‘Oh, no!’ He knew much better than I did what the task implied—the huge amounts of money, the unremitting work. Then he said, ‘All right, if you really want to—but why don’t you do the fly? The fly’s much more useful.’ And the answer was simply that I was not in the fly, I was in the worm. The worm was a much less competitive area; the fly map was a mess, the people who did the YACs were in competition with the people who did the cosmids, everybody was competing with everybody else, it was hopeless. To go into this Colosseum of gladiators was out of the question. The worm had got us where we were—the worm map was the most advanced animal genome map in the world. All I could say to Aaron was, ‘Yes, the first animal to be sequenced ought to be the fly, but it’s going to be the worm.’ There were fly people at the LMB, but they were cell biologists, not mappers and sequencers. Of course, Aaron knew this perfectly well. He just wanted to be sure I knew what I was up against; there was no doubt that, the decision made, he would support me all the way. As he says himself, ‘John was the standard bearer [for genome studies], and that’s why it had to be the worm.’
We had to go through the formal process of writing grant applications to the National Institutes of Health and the Medical Research Council. After twenty years at the LMB it was the first grant application I had ever written, and I was having to justify £1 million over three years to do our 1.5 megabases. Nor could I ignore
the slightly chilling feeling that this was only 3 percent of the genome, and we were going to have to scale up. All this contributed to the ‘prison door’ effect I had first experienced at Syosset: I was going to have to stop playing and be a little more professional.
Initially we wrote our applications on the assumption that we would do the work the conventional way, using radioactive labels to tag the DNA fragments and film to record the sequence from the gels. We knew that there were some automated machines on the market but were initially skeptical about them; still, in order to cover ourselves, we applied for funds to buy one for each lab to experiment with. These machines attached different colored fluorescent tags to the DNA fragments instead of radioactive labels, and read them off automatically.
In September that year Bob Waterston came over to Cambridge so that we could write the grant application to the NIH together. Then he and I went off on a world tour that he had organized, to look at what was available in the labs where the new machines were on trial. There were really only two machines on offer, one made by Applied Biosystems Inc. (ABI), the company headed by Mike Hunkapiller that developed the inventions of Lee Hood’s CalTech group, and one made by the Swedish company Pharmacia, which had the license from Wilhelm Ansorge at the European Molecular Biology Laboratory in Heidelberg. There was a third that had been developed by DuPont, which Sydney was using in his unit, but the company had never been able to get it to work as effectively as the others.
One of the labs we visited was at an outpost of the National Institute for Neurological Disorders and Stroke in Rockville, Maryland, where a researcher called Craig Venter was working on receptors for brain chemicals. He had been one of the first to take delivery of a prototype ABI sequencer, in February 1987. Since then he has focused all his energies on sequencing more and faster, moving into the private sector when he couldn’t get what he wanted
from publicly funded science. But all that came later. When we visited his lab Craig was keen to exploit the advantages of automated sequencing, having become frustrated at the years it had taken him to isolate and sequence the gene for a single brain protein using manual methods. He gave us the hard sell on the virtues of fluorescent sequencing, but it was not his lab that convinced us; we did not really see what was going on. The place that convinced us was a lab at Baylor College of Medicine in Houston, Texas, where Richard Gibbs (an Australian molecular geneticist who now heads the human genome sequencing center at Baylor) was putting all kinds of things through an ABI machine and getting good, clean data. We found it very impressive. Then we went with Alan Coulson to Heidelberg to talk to Wilhelm Ansorge, who was also getting good results with his own machine, which we could buy through Pharmacia. The two machines had different advantages and disadvantages, but we came home convinced that both were superior to the radioactive method because they instantly gave you the data in digital form. So we rewrote our grant proposal to include one ABI machine and one Pharmacia machine for each lab. And the funding agencies gave us everything we asked for. I still have the notification from the MRC, a classic missive. It was a really big grant for them, over £1 million for three years—and it came in the form of a hand-scrawled fax.
While we were in Heidelberg something else was decided. If you cross the River Neckar from the old town and head up the steep north side of the valley, you find yourself on the Philosophenweg. Presumably it was a traditional promenade for the academics of Heidelberg. You can follow it up through the forest until you come to a place where there’s a pub and they keep wild boar in pens. And then you go back down to the Neckar and so home along the riverbank. Bob, Alan and I walked that way, and as we went we talked about what we had to do. I had quite expected that after seven years off Alan might be ready to go back to sequencing and to take charge
of the project on our side. No way: the map was nowhere near finished and he would have a lot of work to do on an interface between it and the sequence. He also expressed the general principle that one should never go back. I returned from that walk in the woods feeling a mixture of resignation and excitement that I would have to learn a new skill.
I thought I had better teach myself sequencing the old-fashioned way before the machines arrived. I got the protocols from Alan and ran some radioactive gels (Alan had a good laugh at my first results), and started sequencing a cosmid. And that’s what I did for the few months before we got an ABI machine. I personally was not mentally dependent on the machine. If it had been impossible to get the fluorescent readout going and none of this had worked, we could even have done the worm the old way, on film. At the time I was working with the LMB workshop and Amersham International to adapt the film reader we had built for mapping so that it could read DNA sequences. If we hadn’t succeeded, perhaps some other form of direct readout would have taken over. Ever since Fred’s invention of dideoxy sequencing, the improvement of technology in this field has been helpful but not crucial—rather like the development of the car since the internal combustion engine was invented. It’s not the difference between being able to do nothing, and suddenly having a new technique—as it was with the lineage, which was absolutely dependent on Nomarski microscopy. Sequencing has been improved incrementally through machines, chemistry, enzymes and software, gradually allowing us to reduce costs and automate. Like the car, though, cumulatively it’s come a long way. Although we couldn’t have known this in advance, the way to make progress in sequencing has been to do the best you can with the technology you have available, not to spend years trying to invent a better technology (or, worse still, waiting for someone else to invent it).
A much bigger change in my life than starting sequencing was the business of taking on new people and running a group. Bob and I
reckoned we needed about ten people each; I had never supervised more than one technician before, and hadn’t made a very good job of that. With a technician who is entirely dependent on you, you have to come in in the morning and give them something to do. This used to make me sweat, thinking, ‘I’ve got to give this person something to do.’ I don’t want to be thinking, ‘What’s so-and-so going to be doing in the morning?’ I want them to know what they’re going to be doing in the morning and looking forward to it. It wasn’t easy for me to make the move to running a group. I had not learned to be a manager, or to delegate at all. I had never had to: until Alan came I just did everything, and then I worked happily as half of an equal partnership. As Rodger Staden once commented, ‘The trouble with John is, he always wants to do everything himself.’ The few post-docs I had, such as Judith Kimble and Jim Priess, who worked on the development of the worm embryo, were very strong and did things for themselves—Judith organized me rather than the other way round. It never occurred to me to think about how the numbers of people would grow. In so far as I thought about it at all, I thought there would just be more “John’n’Alans” around—a completely horizontal organization. It took me a while to realize that going into sequencing was going to lead to a big management structure, a very different way of organizing things from the research labs I was used to.
Some new space had been earmarked for us in the building adjacent to us in Hills Road. It had previously been occupied by the Neurochemical Pharmacology Unit, so we called it the Old Pharm, which I thought had a nice domestic ring to it. Michael Fuller, the LMB’s indispensable laboratory superintendent, who always knew where to find that elusive piece of kit, was responsible for getting the space fitted out and equipped—and I appreciated more than ever before how vital his role was. But at first the whole group just piled into 6024. We stacked the sequencing machines vertically, and we each had a meter or so of bench. My entire office space consisted of
another meter of table sticking out sideways with the phone on it. More like bedsit than open plan, but a great way to work.
We started advertising for staff in the press and by word of mouth, looking initially for graduates and Ph.D.s, and quickly acquired another half-dozen colleagues. I had no idea how everyone would fit in, and at first everyone did everything. This was the way the manual sequencing groups had worked, when techniques were developing rapidly and it was satisfying for graduates to learn all the skills. But the disadvantages of this method soon became clear: it took too long for each person to learn everything, and it was all rather erratic, with different things in the complex process going unexpectedly wrong, and difficulties in finding out what actually had gone wrong. So we started to sort out the jobs. The worst bottleneck had been making the libraries of worm DNA, so for the time being I took that on. This quickly started things moving, and had the added advantage that I could experiment with different ways of breaking up the clones and cloning the fragments in bacteria. The others all carried out the routine tasks of first growing the subclones, then reacting together the DNA templates, enzymes and nucleotides, and finally loading the fluorescently labelled products of these reactions on to the sequencing machines. They also performed the much less routine task of what we came to call finishing. This involves sitting at a computer screen analyzing the shotgun results: comparing the initial assemblies with the evidence of the raw data from the machines, represented on the screen by a set of four different colored traces (one for each base), then either editing on the basis of this visual evidence or performing additional reactions to get more data.
In the next wave of recruitment, as we expanded into the Old Pharm and bought more machines, we realized that it would be worth dividing up the work to make the best use of everyone’s abilities. The existing staff, now skilled in sequencing, would become full-time ‘finishers’, as well as participating in technical
development. We would recruit unskilled people, who would carry out the template preparation and reactions, but would also have the opportunity to train for finishing. This group would have no need of academic qualifications. We judged them on school achievements, interview and something by which I set great store: the pipetting test. I showed the candidates how to use a pipette—a hand-held tool for manipulating small volumes of liquid—and invited them to have a go. It’s really simple, but the way in which a person goes about it gives an indication of their manual dexterity.
Our management principle was simple. You bring in people, find out what they do better than you do yourself, and hand that over. When there’s a gap, you fill that yourself until that job can be handed on in its turn. And in that way things informally move forward.
Computer analysis was a vital part of the work, and we really needed someone on the team with the right skills. Richard Durbin, who had helped us out with the image analysis program during the mapping phase, had gone off to California on a postdoctoral fellowship, and John White thought it would be a very good thing if we tried to lure him back to work on the software side of sequencing. Richard is an extremely bright mathematician, but unlike many mathematicians he also likes making things work. So I asked him to join us, and he agreed. While he was in Cambridge Richard had become friendly with Jean Thierry-Mieg, the husband of a French biologist called Danielle Thierry-Mieg who had worked as a visitor in my lab for a couple of years. Jean, a great bear of a man who reminds me of the actor Gérard Depardieu, was a theoretical physicist, but was very keen to move into bioinformatics and work on the worm project (indeed, I once looked out of the window early one morning to find him sitting on the lawn waiting to start a discussion). Although the Thierry-Miegs had moved to Montpellier by the time Richard came back, Richard and Jean between them got to work and developed a new database program called ACeDB (A C. elegans Data Base) that provided a way of displaying the sequence,
and the genetic map, and a bibliography of relevant articles on the genes. They distributed copies to all the other worm labs, so that everyone could have it sitting on the desk and update it as new data became available. It was a superb piece of work that set standards throughout the whole field, and went on to be used for many other genomes.
Bob also made some critical new appointments. He’d talked to some colleagues at St. Louis about making an appointment in computation jointly with them. While he was in England with me writing the grant application, one of them called saying they had found the ideal person in LaDeana Hillier, and could Bob agree long-distance to his part of the hire? So she joined without ever seeing the person she would be working for. She proved to be brilliant at developing the resources Bob needed to analyze the data that began to pour out. Then he hired Rick Wilson, who came from Lee Hood’s lab at CalTech. Rick, whose background was in cellular immunology, had been testing the new sequencing machines on some clones of regions of the T-cell receptor gene, and held the record at the time for the length of a sequence from an organism more complex than a bacterium: 91,000 bases. It was very valuable for our collaboration to have someone like him who already had some familiarity with the technology, but who was a biologist at the same time. And for Bob it was a major boost to his request for supplementary funds to buy the sequencing machines to be able to say that Rick was joining.
Rick knew the machine’s limitations and what you had to do to get it to work. It took twenty-four samples, and at first we were running it once a day for fourteen or sixteen hours. We were so proud when we figured out a way to run it twice a day.
(For comparison, modern ABI sequencers run ninety-six samples at a time and can do eight runs a day—and some of the bigger genome
labs have 100 or more of them in operation at once.)
Once the first fluorescence sequencing machines arrived, it became clear that we had to take control of the software. The machines worked well, but ABI wanted to keep control of the data analysis end of things by forcing their customers to use their own proprietary software. In order to finish a sequence properly in the way I described above, you must have easy access to the raw data in order to evaluate their quality from point to point. A good way of displaying the readout from the gel is as a set of colored traces on the screen. ABI’s software produced a display, but not in a form that we could combine flexibly with Rodger Staden’s assembly programs. It was inconvenient to use and slowed us down. I could not accept that we should be dependent on a commercial company for the handling and assembly of the data we were producing. The company even had ambitions to take control of the analysis of the sequence, which was ridiculous. I had a complete obsession with getting data out—I saw that as the bottleneck. There were an awful lot of people out there theorizing about genomes, so for the moment I didn’t see that as our job. The best way to drive the science was to get the sequencing machines going, cheaper and faster, and get the data out so that all the theorists in the world could work on the interpretation.
So, one hot summer Sunday afternoon, I sat on the lawn at home with printouts spread out all around me and decrypted the ABI file that stored the trace data. I don’t think it was deliberately encrypted; it was just constructed in a rather Christmas-tree-like fashion, which I needed to track from one point to another. I came in on Monday morning and said, ‘Look, this is how we get the file data.’ Within a very few days, Rodger and his group had written display software that showed the traces—and there we were. The St. Louis team joined in, and they all went on to decrypt more of the ABI files, so that we had complete freedom to design our own display and analysis systems. It transformed our productivity. Previously we’d only been able to get the traces as printouts, which we bound
together in fat notebooks, infuriating to fast workers such as Rick Wilson.
You’d sit down at the computer, and you’d have to flip through this stupid notebook until you found the trace that you wanted. And hopefully it would be in the right direction. The great idea was to figure out a way to get this on line, but ABI would not help us out. So John sat down and cracked it. That was a huge advance, really an important development. If we hadn’t done that we’d have been way, way behind on the worm project.
ABI was not at all happy that we had done this. We had been negotiating towards the idea that they would sell us a key that would unlock the files, but it was quite clear that even then they would always have control and they could take it away again. There remained a real risk that they would re-encrypt the file in a way we couldn’t get at; so we made sure that their other customers were aware of what was going on, and they did agree quite quickly to keep their formats public. We went on to become one of their biggest customers. I think I was the first to decrypt the files, but I’m not certain—there were others doing it at about the same time. I certainly feel that between us we did push ABI back a bit and denied to them complete control of this downstream software. It was my first experience of the kind of battle for control of information that I seem to have been fighting with commercial companies ever since: a foretaste of the much larger battles that would later surround the human genome.
Working with Bob and the rest of the St. Louis lab was a great experience. We compared notes constantly. We divided up the job by each starting at the same place on a chromosome and sequencing away from one another in opposite directions. That way we had only one overlap between the labs to worry about per chromosome. If it seemed like one lab had a particular problem covered, then the other
left it to them. Problems that were easy to solve and didn’t take too much effort could be tackled by both labs, just as it was worth each lab taking a different approach to solving the really hard ones. Yes, there was competition, particularly over how much sequence each side was producing; but it was a competition in which nothing was kept hidden. For example, early on Bob’s lab had difficulty assembling cosmids.
There were huge gaps—it just wasn’t coming together. We eventually worked out it was due to how we were making the libraries—one of the extraction steps was partially melting the DNA. But Cambridge didn’t have the same problem, so we had to work through our protocol with John. It turned out that we weren’t doing it cold enough. We were trying to help each other as much as we could, but still trying to beat each other!
Being thousands of miles apart wasn’t really a problem. We used e-mail a lot, and at some point Bob and I got into the habit of talking together on the phone once a week. Individual members of the two labs visited each other regularly. The highlight of the year was the annual lab meeting, when we took it in turns to host a visit from all the members of the other lab. These gatherings had a serious purpose—to see at first hand how the other group was working—but they’re chiefly remembered for the social side. The Cambridge lab meetings typically involved punting, picnics and a healthy quantity of beer, a continuation of the tradition established by the worm group.
It was an extremely productive formula. After a couple of years we were far and away the biggest producers of genome sequence in the world. To our irritation we found ourselves having to respond to critics who simply refused to believe our production figures. Bob had a particularly unpleasant argument with one of the E. coli sequencers who had massively overestimated the capital costs of the
project, simply because he refused to believe that Bob’s lab ran the machines twice a day, seven days a week. A more serious question was whether people would believe what we were doing was worthwhile. We were getting flak from some members of the usually harmonious worm community who felt that we were taking resources away from them. Some people thought it was wrong to spend millions of dollars of research funds on work that was not directly going to solve problems in biology. What we had to do was to demonstrate that this was seedcorn, and that the money invested in the genome was money well spent.
A year after we began, at the 1991 worm meeting, some of the best labs were complaining. But two years further on, they found that people were pouring into the field, because they saw how easy it was to find genes. So the leaders of the labs suddenly saw that, far from reducing their grants, the genome project was actually increasing them, because it made everything more attractive to funding agencies. I began to notice that the worm grant proposals I was asked to referee all began to mention the genome.
What was perhaps more significant was that our project was proving the point that genome sequencing was worthwhile, that production went up and costs went down with experience and technology development, and that the clone-based approach was efficient and accurate. There was no doubt that, given the funding, the worm genome would be completed, and within a few years. And if two labs could do the worm for a few tens of millions of dollars, then the human was within reach too.