Scott P. Layne, Tony J. Beugelsdijk, and C. Kumar N. Patel
Headline news stories about illnesses and deaths in Hong Kong caused by avian influenza viruses, multidrug resistant Mycobacterium tuberculosis, and foodborne Salmonella, plus new threats from bioterrorists, have become commonplace in recent years. Such headlines are reminders that we are not immune to the dangers of infectious diseases just because we have been immunized, consume regulated food and water, and are protected by modern systems of public health and national security. Through concerted efforts we have eradicated or virtually eliminated certain infectious diseases that once devastated the world 's population (e.g., smallpox and polio), but our past achievements in these areas—plus the widespread availability of lifesaving antibiotics —have led to complacency. In fact, we are engaged in ongoing contests with no clear end in sight against infectious diseases, foodborne pathogens, and international terrorism. These battles have many common elements that require a coordinated armamentarium from scientific research, modern technology, and public health preparedness.
In comparison to other weapons of mass destruction (i.e., chemical and nuclear), biological weapons are relatively simple and inexpensive to develop. Today, many experts believe it is no longer a question of if but rather when bioweapons will be unleashed by rogue states or terrorist groups. Small attacks may strike as few as 10 to 100 people yet may lead to lasting changes in the fabric of civilized societies. Thus, in order to deal with the looming threat of bioterrorist attacks, public health, law enforcement, and military-based laboratories are now preparing to identify
pathogens that are seldom seen in patients and would likely go unrecognized if the diagnosis depended on initial clinical signs and symptoms.
Within the next few years, scientists will sequence the entire human genome for a dozen or so people. Such information will move the practice of medicine in the direction of prevention and treatment based on molecular makeups. Initially, discovering how health (epidemiologic outcomes) is dependent on single mutations (genotypic features) and single proteins (phenotypic features) will be relatively straightforward. For example, it may involve understanding how single nucleotide polymorphisms (SNPs) are related to developing cancers, diagnosing diseases, and tailoring medications. With time, however, more intricate relationships likely will be discovered with population-based studies, entailing the generation, storage, and analysis of enormous quantities of epidemiologic, genotypic, and phenotypic data.
On the “digital” side, researchers have growing access to computers that perform 1012 floating point operations per second (teraflops), Internet backbones that transmit 109 bits of information per second (gigahertz), and databases that contain 1015 bits of information (petabits). On the “physical” side, researchers must still depend on graduate students, postdoctoral fellows, and technicians to perform many repetitious tests. In yesterday 's model of research and discovery, such manual approaches were sufficient. In today's model, however, such “digital” versus “physical” mismatches are constraining the exploration of enormous experimental and informatics relationships (or phase spaces) in medicine and biology.
Thus, more than ever, new tools and technologies from engineering, computer science, mathematics, chemistry, and physics hold great promise for tackling “grand” problems in medicine and biology. The challenge is to identify important scientific needs and then focus on the current technologies and approaches that are available for speeding up the pace of laboratory-based research. Given such perspectives, the colloquium considered specific needs in medicine and biology, assessed current research practices and their limitations, and then considered strategic ways for developing new high-throughput laboratories and informatic resources. This kind of multidisciplinary approach attracted broad-based participation from academic, governmental, and industrial sectors.
The triple threats of communicable infectious diseases, pathogenic foodborne infections, and bioterrorism/biowarfare pose enormous challenges to established research and public health infrastructures. Such challenges share a common set of themes: (1) they affect large numbers of people; (2) they require large quantities of data to achieve solutions;
(3) they require access to appropriate and often large numbers of samples for testing and analysis; (4) they demand reproducible laboratory procedures; and (5) they require sufficient quantities of supporting reagents. In today's scientific research and public health laboratories, most activities are carried out by technicians using various labor-saving devices that are “semiautomated.” Although such approaches have led to insights and breakthroughs, they also demand the completion of highly repetitive and tedious tasks by armies of laboratory technicians.
The model of semiautomated/semimanual laboratories has three significant shortcomings. First, because such facilities require significant human involvement, they scale linearly in cost with the overall size of the problem. Second, because such facilities demand humans to perform highly repetitive and tedious tasks, they are prone to random errors. Third, because such facilities inevitably contend with rate-limiting tasks, they can be overwhelmed by large surges in demand. Such acute surges in demand may occur, for example, during a rapidly growing epidemic or bioterrorist attacks, where swift diagnosis and follow-up are necessary for an effective response.
Fortunately, a critical number of scientific disciplines and powerful technologies can be combined to level the playing field against the triple threats outlined above. Molecular biologists and biochemists have developed a variety of laboratory-based assays that are powerful and readily adaptable to large-scale efforts. Engineers have developed innovative robotics and automation technologies that are capable of skyrocketing the number and variety of laboratory experiments. Computer scientists have developed programming languages and database management systems that have provided the basic building blocks for improved environments for scientific collaboration. And physicists —driven by the need to share large amounts of data generated in high-energy particle physics—have catalyzed the development of the Internet, which is literally transforming the ways in which scientific and technical collaborations take place.
Colloquium participants recognized that integrating such disciplines and technologies holds great promise for accelerating infectious disease research—including basic science, clinical trials, and public health and epidemiologic investigations worldwide. In the coming years, naturally occurring and maliciously initiated outbreaks could be met with fast and accurate automated systems that facilitate sample collection, sample testing, data storage, and informatics analysis. Such automated systems could be flexible, modular, and remotely accessible via the Internet, thereby enabling a convenient means of mass customized testing.
In the twenty-first century, infectious diseases will pose major challenges from a variety of sources. For example, new and lethal strains of influenza A H5N1 have surfaced in Hong Kong, which reinforces the need for an expanded program of global surveillance by the World Health Organization (WHO). New epidemics of multidrug-resistant Mycobacterium tuberculosis (MDR-TB) are emerging in countries belonging to the former Soviet Union, and the ability to administer the right combination of directly observed therapies (DOT) may depend on large-scale drug sensitivity testing. Worldwide, human immunodeficiency virus (HIV) infections and acquired immune deficiency syndrome (AIDS) cases are increasing exponentially, and discovering new drug therapies and vaccines may ultimately depend on brute force research and development efforts. As further described below, colloquium participants identified critical needs and limiting factors for each of these infectious diseases. They also considered ways to build high-throughput laboratory and informatics resources for accelerating basic science, clinical trials, and public health/epidemiologic investigations throughout the world.
Influenza pandemics have swept the world for centuries, three times in the twentieth century alone (1918, 1957, and 1968). The 1918 influenza A H1N1 pandemic is the biggest infectious disease catastrophe on record, topping even the medieval Black Death in the number of deaths. Within months following the initial outbreak, an estimated 500 million people were infected and 40 million were dead from a worldwide population of 2 billion people.
In the United States the Centers for Disease Control and Prevention (CDC) has estimated that the next influenza pandemic may cause 89,000 to 207,000 deaths, 314,000 to 734,000 hospitalizations, 18 million to 42 million outpatient visits, and 20 million to 47 million additional illnesses. The overall economic impact may range from $71 billion to $167 billion dollars, excluding disruptions to commerce and society. Yet an often overlooked statistic is that the cumulative impact from numerous smaller epidemics is even greater than the toll from the recent pandemics combined. In “milder” epidemics, 10 percent of the world 's population may be afflicted, which results in an estimated 500,000 annual deaths directly from influenza and its related complications. In “harsher” epidemics these grim worldwide statistics may readily double to 20 percent infected and 1 million annual influenza-related deaths.
At present, it is unclear whether the extreme virulence of the 1918 strain is a one-time event, perhaps due to some combination of unlikely factors, or whether similar catastrophes are forthcoming. Whatever the mix of factors, it is becoming increasingly clear that deciphering the 1918 strain will demand far more organized information on influenza than currently exists. New data are needed that correlate epidemiologic, phenotypic, and genotypic features of influenza viruses from around the world. High-throughput automated laboratories and informatics resources could transform the approach to deciphering the 1918 strain and others, like the avian influenza virus that struck residents in Hong Kong (see below).
One-third of the world's population has been exposed to Mycobacterium tuberculosis, resulting in latent infections that may reactivate at any point in a person's lifetime. As a consequence, there are now 16 million people with active cases of tuberculosis and 2 million deaths per year globally. Tuberculosis can be effectively treated with two to four first-line drugs (e.g., isoniazid, rifampin, ethambutol, and pyrazinamide) over several months but serious problems may arise when strains develop resistance to one or more of these agents.
Recent worldwide surveys indicate that a variety of MDR-TB strains are dispersed throughout the world and are apparently increasing in countries with strained public health infrastructures (Pablos-Mendez et al., 1998). In the United States, MDR-TB strains are especially prevalent among HIV-positive people in New York City, and infections with such highly resistant strains carry high mortality rates (even in otherwise healthy persons). In the former Soviet Union the emerging MDR-TB epidemic may involve as many as 11,400 cases, with the majority occurring in prisoners. In the coming years a public health disaster may arise from newly released prisoners transferring numerous drug-resistant strains to the general population.
In the United States, treating one MDR-TB patient with DOT may cost as much as $250,000 annually. For the world's public health system, which expends an average of $5 per person each year, such costs are staggering. Preventing the emergence and spread of MDR-TB is a global priority. Cost-effective means are needed to test for MDR-TB strains in large numbers of afflicted and/or exposed people throughout the world, and high-throughput automated laboratories could transform the very approach to diagnosing resistant strains. Such testing resources could rapidly identify optimal therapies for individuals and, because many second-line drugs are in scare supply, enable pharmaceutical companies to plan for future worldwide demands.
In less than three decades HIV has grown from an unknown pathogen to a pandemic disease, chronically infecting 50 million people globally and killing 3 million persons each year. If such trends continue, AIDS will likely become the number one infectious disease killer in the world. In response to this catastrophe, pharmaceutical companies are manufacturing new antiviral drugs (e.g., reverse transcriptase and protease inhibitors) that extend lives, and investigators are examining promising leads for newer drugs and much-needed vaccines. Despite such progress, strains that are resistant to one or more antiviral drugs are becoming increasingly common, and reports of multidrug-resistant strains of HIV (MDR-HIV) being transmitted from person to person are now appearing. Given the cost of combination antiviral therapies and mounting problems with resistance, the only effective solution for the world will be the delivery of affordable vaccines that prevent new infections.
A simple estimate leads to the realization that there could be an enormous number of unique HIV strains (i.e., millions to billions) infecting people throughout the world. At present there is only sketchy information on these genetic mutations and their immunological and physiological correlates, making it difficult to relate the significance of such variations (if any) to therapies and vaccine development. High-throughput automated laboratories would enable investigators to conceptualize and attack the challenges pertaining to HIV from entirely new directions. For many such problems with enormous phase spaces, high-throughput laboratories that serve as investigational resources are perhaps the only feasible means to move ahead.
Fifty years ago the WHO established a global influenza surveillance network. Today, the network has grown to include 110 national collaborating laboratories in 80 countries worldwide. In the United States the domestic network includes over 70 collaborating laboratories, with many located in state health departments. Collaborating laboratories focus on collecting influenza samples from stricken people and verifying that samples contain influenza A or B with standardized reagents and test kits supplied by the WHO in cooperation with the CDC. Collaborating laboratories then forward influenza samples with accompanying epidemiologic information to one of four international collaborating centers for reference and research (located in Australia, the United Kingdom, Japan, and the United States).
A major goal of the WHO network is to monitor the emergence and
spread of variant viruses using a defined set of phenotyping and genotyping assays. Such assays enable the four collaborating centers to (1) screen influenza samples in a short period of time, (2) determine if significant influenza activity is associated with the isolation and spread of variant viruses, and (3) judge whether there are reduced postvaccine immune responses to variant viruses in individuals who have received the current vaccine. Cumulative data from the relevant geographic locations are then used by the WHO to make vaccine strain recommendations for the Northern and Southern Hemispheres on a semiannual basis.
Although gaps exist in the global surveillance of influenza, due to sparse sampling in underdeveloped countries and lack of sampling in animal reservoirs, currently there is no shortage of samples collected by the WHO program. On a worldwide scale, more samples are collected each year (~170,000) than are characterized (6,500) by the four collaborating centers. What is limiting is the laboratory work force for performing repetitive tasks on the available samples and then inputting the results into databases. It is important to realize that the testing procedures used by the WHO are reproducible among independent laboratories, which avails them to standardization, scale-up, and automation.
Thus, what influenza surveillance and research efforts really need are high-throughput automated laboratories and database resources that offer seamless integration of “digital” and “physical” tasks from start to finish. This includes tasks pertaining to (1) collecting samples and recording epidemiologic observations in the field, (2) screening samples to see whether they contain influenza A or B, (3) growing and titering viral samples, (4) phenotyping viral samples using a number of key assays, (5) genotyping RNA segments in viral samples, (6) archiving viral samples, (7) storing observations and laboratory results in relational databases, (8) analyzing data with informatics tools, and (9) sharing key findings with scientific collaborators, public health officials, and vaccine manufacturers on a timely basis. An automated influenza laboratory and database effort may lead to more accurate influenza vaccines in the short term (with more yearly information) and broader flu vaccines in the long run (with more cumulative information).
Recent surveys conducted by the CDC show that foodborne infectious diseases account for 76 million illnesses, 325,000 hospitalizations, and 5,000 deaths in the United States each year. Three known pathogens —Salmonella, Listeria, and Toxoplasma—cause 1,500 deaths each year while unknown infectious agents account for 3,200 deaths (Mead et al., 1999). Most foodborne outbreaks in the United States have been confined to rela-
tively small numbers of people, mainly because of early detection and aggressive intervention in the course of events. Despite such luck, however, certain trends pose new and mounting threats to the food supply such as (1) the consolidation of food processors, (2) the increasing import of all types of foods, (3) the consumption of more meals away from home, (4) the emergence of new strains of plant and animal pathogens, (5) the shift from multicrop to single-crop farming, (6) the threat of bioterrorism, and (7) the inevitable rise of global commerce.
The U.S. system for ensuring safe food is composed of two complementary arms. The enforcement arm deals with rules and regulations for manufacturing, transporting, storing, importing, inspecting, and testing foods. It is administered by a patchwork of over a dozen federal agencies, often with limited resources and antiquated procedures. The investigatory arm deals with managing acute outbreaks and formulating recommendations for minimizing future ones. Federal agencies, such as the CDC and the Food and Drug Administration, have the authority to initiate investigations of outbreaks but can do so only if state and local public health agencies have adequate systems in place for detecting and reporting them. Therefore, the CDC offers informatics resources (such as Internet-based software and computerized databases) to state and local public health agencies in order to facilitate the reporting of outbreaks.
At the federal level, new methods from molecular epidemiology are now offering rapid and accurate tools for investigations of outbreaks and follow-up surveillance. The CDC and participating public health laboratories can now subtype enteric pathogens and then deposit their molecular fingerprints into a computerized database, enabling queries from other laboratories on related outbreaks and pathogenic strains. With such tools the public health infrastructure is able to compile up-to-date records of Escherichia coli O157:H7 outbreaks, for example. In addition, the centralized system helps to organize data on outbreaks that are dispersed over large geographic areas, involve many small clusters of cases, and/or occur in places with limited investigatory resources.
Most foodborne outbreak investigations use combinations of epidemiologic, laboratory, and informatics tools at every step. The customary steps include (1) identifying the circumstances that triggered the investigation, (2) ascertaining the outbreak's size, (3) examining the individual cases, (4) identifying the common links and risk-related causes, (5) implementing measures to control the outbreak, and (5) formulating recommendations for future prevention. Outbreak investigations can be difficult because of the increasing number of new pathogens and risk factors and also because major outbreaks crossing state and national boundaries involve overlapping enforcement and investigatory jurisdictions.
Ensuring a safer food supply will necessarily entail increased testing of various foods plus rapid investigation and control measures once outbreaks have occurred. The colloquium therefore considered scientific and risk assessment approaches for protecting the food supply and the applications of high-throughput automated laboratory and informatics resources in reaching such goals.
The problems of bioterrorism and biowarfare are particularly demanding because their solution involves many scientific disciplines plus institutional coordination on federal, state, and local levels. From medical, public health, and emergency response perspectives, bioweapons may unleash viruses, bacteria, fungi, and/or toxins that cause unusual symptoms and highly virulent forms of disease. To save lives, the first responders to an attack will require accurate and rapid information for diagnosis, treatment, and quarantine measures if indicated. From law enforcement and national security perspectives, threats may come from rogue states and/or terrorist groups that crisscross the jurisdictions of several government agencies. And because each agency has its own operating procedures, the application of technologies across federal, state, and local jurisdictions will require careful planning. From political and economic perspectives, biological arsenals are classified as weapons of mass destruction and they can be stockpiled at a fraction of the cost of chemical and nuclear weapons. In response to this, international laws that ban the proliferation of bioweapons will require the means for inspection and verification. It will thus be incumbent on policymakers and scientists to define laboratory technologies that do not infringe on commercial enterprises (e.g., pharmaceutical manufacturers) yet remain fully capable of spotting significant violations.
The dismantling of the former Soviet Union and the system of two opposing superpowers has led to an uncertain world order. It now includes one global superpower and numerous responsible governments —yet it also includes several rogue states, multiple religious fringe groups, and a few shadowy international syndicates that are forming new networks and posing new challenges to national and global security. Today, at least 17 countries are known to be developing or producing bioweapons, and the list may be expanding (Alibek, 1999). The former Soviet Union was engaged in a secret and extensive offensive biological weapons program directed against personnel, materiel, animals, and plants and involving many organizations and facilities in Russia and the republics. Tens of thousands of personnel were involved— in the scientific academies; government departments responsible for defense, agriculture, and public
health; and in an entity known as Biopreparat, which alone had over 32,000 employees. Presently, it is not clear whether all elements of the program have been halted, especially in the Russian Ministry of Defense. Given the severe economic conditions since the collapse of the former Soviet Union, there are also significant proliferation risks for trained personnel and weapons materials through recruitment of former weapons scientists by rogue states. In contrast, the United States ended its bio-weapons program in 1969 and now retains few living experts with first-hand knowledge of such programs. Such loss of expertise makes it even harder to deal with various evolving threats.
To evaluate the threats of bioterrorism and biowarfare, one must understand basic steps in the weaponization process. Exact details of each step will depend on the scale of the effort, but in general they include (1) choosing infectious agents, (2) isolating useful seed stocks and pathogenic biotypes, (3) formulating stabilizers and scatter enhancers, (4) growing and packaging enhanced bioagents, (5) storing and monitoring bio-weapons, (6) maintaining one or more reliable delivery systems, and (7) identifying targets and contingencies. The detailed process of building fully strategic bioweapons may take years, yet simpler ones may be produced in less time. Since weather conditions influence the spread and viability of infectious agents, sophisticated bioweapons may contain pathogens with different physical stabilities and methods of dispersal.
In the United States as few as 10 to 100 afflicted people may stretch hospital, public health, emergency response, law enforcement, and national guard services beyond their current limits. Even larger attacks involving major metropolitan areas may literally require the delivery of tons of antibiotics to exposed persons within days, totally overwhelming current response capabilities. In the twenty-first century, mitigating the threats of bioterrorism will thus demand sizeable laboratory and informatics resources, which can be organized in terms of four overall phases.
First, in preventing an attack, responsible governments may rely on the ability to fingerprint infectious agents efficiently with high-throughput automated laboratories. An extensive database of molecular fingerprints would give responsible governments a new means of rapid attribution and therefore deterrence. It would also put rogue states, religious fringe groups, and international syndicates on notice that there is little chance to evade blame for any bioattack.
Second, in the unfortunate event of an attack, U.S. public health laboratories may be overwhelmed within the first few moments—quite simply because there would be too many samples to process and test within hours. With manual laboratories it would be impossible to answer even the simplest of questions. How many different infectious agents were released? How do they differ? What are the best initial ways to treat those
afflicted? Information from high-throughput automated laboratories may reduce confusion and save lives by offering timely testing in acute situations.
Third, in the aftermath of an attack, U.S. public health, agricultural, and law enforcement officials would need accurate answers to yet another set of questions. What is the stability of each infectious agent? What are their geographic boundaries? What are the effects on animals and plants? What are the molecular fingerprints and origins of the agent(s)? Information from high-throughput automated laboratories may speed the recovery process by offering testing for cleanup and investigatory operations.
Fourth, in response to an attack, U.S. law enforcement officials may need to collect evidence in accordance with chain-of-custody procedures. Intelligence agencies and military services may need to make accurate attributions and then take swift actions to protect national security. Information from high-throughput automated laboratories and their associated databases may prevent further attacks by rapidly pinpointing sources and locations.
Quite clearly there are scientific needs for high-throughput automated laboratories and informatics resources in other areas, such as (1) human genetics, (2) molecular medicine, and (3) pharmaceutical screening. The colloquium therefore considered the scientific needs in these areas by inviting participation from a widespread scientific and technical constituency. For instance, researchers are now considering how to approach problems in medicine and biology after the human genome is fully sequenced and 90 percent of its SNPs are found. Such data are expected to become available within the next few years from corporate and governmental sequencing programs, opening unprecedented opportunities for molecular medicine and pharmaceutical screening. It will therefore be essential to have high-throughput laboratory and informatics resources to take advantage of such achievements.
Data for most problems in medicine and biology can be classified in terms of three spaces. Epidemiologic space pertains to exposures, expressions, and outcomes in human populations over time—for example, the incidence of certain cancers that are due to environmental exposures, inherited traits, and/or acquired mutations. Phenotypic space pertains to physical properties or biomarkers at the cellular level—for example, the expression of abnormal proteins and molecular histologies in cancer cells. Genotypic space pertains to the actual DNA sequences, whether they represent normal variations within a population or abnormal ones—for example, the presence of oncogenes and translocations in chromosomes that increase cancer risks. Certain diseases without environmental factors that are due to
one abnormal protein, or that are caused by one SNP may have the simplest spaces to explore and understand. Yet such simple spaces are believed to represent only the tip of the iceberg in medicine and biology. To explore a wider range of diseases and develop tailored medications, it will be necessary to discover correlations and mappings in more intricate spaces.
One approach is to study simple inherited diseases in relatively homogenous human populations and then extend the scientific approach to more heterogeneous populations. Another approach is to use living organisms that reproduce rapidly (e.g., bacteria, yeast, mice) to explore a small group of mutations and fitness, with rapid-design strategies known as DNA shuffling. The colloquium examined ways to build high-throughput laboratory systems for supporting such approaches. It also examined the available informatics tools for discovering relationships from enormous phase spaces that contain various epidemiologic, phenotypic, and genotypic data.
All the necessary building blocks are available for establishing petabitgenerating user facilities in medicine and biology. In fact, for laboratory automation and robotics, key technologies have matured to the point where manufacturers and consumers alike are asserting that products must conform to a single interconnection standard recently adopted by the American Society for Testing and Materials (ASTM). By conforming, products from one manufacturer would be “plug and play” with products from another, simplifying high-throughput automated systems. Thus, the real challenge is to identify compelling problems and then build integrated systems that are flexible and capable of utilizing advances in technology as they become available.
The available technologies and approaches considered by the colloquium included (1) laboratory automation and robotics, (2) interconnection standards, (3) genomic sequencing, (4) flow cytometry, (5) microtechnologies, (6) advanced diagnostics, (7) mass customized testing, (8) databases, (9) informatics, (10) mathematical modeling, (11) risk assessment, and (12) molecular breeding. As summarized below, each has applications in fighting infectious diseases, ensuring safe food, mitigating bioterrorism and biowarfare, and facilitating work on human genetics and molecular medicine.
Since its founding in 1995, the Association for Laboratory Automation has held annual conferences on the rapidly advancing fields of clinical
and laboratory automation. The recent LabAutomation'99 conference was attended by over 2,200 participants and 85 corporate exhibitors. In addition to scientific and technical presentations, the exhibition halls displayed a large variety of commercial automation hardware, such as robotic arms/conveyers, bar code readers, material supply modules, liquid handlers, plate washers, plate readers, incubators, filtration modules, centrifuge modules, genomic sequencers, flow cytometers, image analyzers, and disposal stations. Companies also displayed a number of ready-made systems that integrate hardware, software, and reagent streams for complete high-throughput procedures. A few examples of such offerings included (1) systems that grid filter papers and glass slides with arrays of molecular probes, (2) systems that identify, pick, and culture bacterial colonies, and (3) systems that screen compounds for drug activity and biological toxicity. The key point is that flexible hardware from such systems can serve as components in petabit-generating user facilities.
Many companies now only sell proprietary software for communicating with their powerful hardware, yet no single company offers a full range of high-throughput components. This fragmentary environment has often led to frustrations because it takes an inordinate amount of time and effort to interconnect hardware from different manufacturers. To overcome this obstacle, the ASTM has adopted the Laboratory Equipment Control Interface Specification (LECIS). The new interconnection standard was formalized in 1999 (as ASTM E1989-98) and, already, several leading laboratory automation and pharmaceutical manufacturers have announced their intentions to support it. In the coming months, laboratory automation and robotics vendors will offer an increasing number of LECIS-compliant products, with many plug-and-play features like those expected in personal computers (Committee E01, 1999).
The LECIS instrument control concept is based on interactions between a single controller and any number or type of standard laboratory modules under its command. Essentially, LECIS organizes equipment behavior into a small number of “states” and also defines a small number of “messages” for managing transitions between these states. These two elements are fully independent of programming languages and physical scales (e.g., length, volume, time). It is therefore feasible to build flexible instruments with older macrotechnologies (e.g., pipettes, test tubes, 96-well plates); newer microtechnologies (e.g., microchannels, microchambers, microdetectors, microarrays); or modular combinations of both. Various hardware devices that comply with the ASTM standard would be plug and play with others using it, and, like today's personal computers, large
instruments could house duplicate modules for rate-limiting procedures and expansion slots for adding new capabilities over time.
LECIS also enables an extension known as the Device Capability Dataset (DCD), which describes unique characteristics for a particular piece of equipment. To facilitate a plug-and-work environment, each DCD would contain information on the equipment's function, identification, physical characteristics, location, communication ports, commands, events, exceptions, errors, resources, and maintenance. Such information is also independent of programming languages and physical scales, permitting flexible integration of macro- and microtechnologies.
The colloquium considered two different technologies for high-throughput genomic sequencing. One is based on chain-terminating amplification and gel electrophoresis, which has been revolutionized by microcapillary structures. The other is based on selective hybridization to complementary DNA oligomers, which are formed and immobilized on microchip arrays. Both technologies are compatible with high-throughput automation, yet in many semiautomated laboratories the tasks of sample preparation and data analysis have limited their full potential.
Apt uses of chain-terminating amplification and gel electrophoresis are the sequencing of long segments of stable DNA (e.g., mammalian and bacterial genomes) and the sequencing of short segments of variable DNA/RNA (e.g., viral genomes). Commercial microcapillary gel-based instruments are capable of processing approximately 300,000 bases per day, and such outputs are increasing as the number of cycles per day increases. Current efforts to sequence human, animal, and plant genomes use several hundred instruments in parallel.
Apt uses of complementary DNA oligomers are the sequencing of short segments of variable DNA/RNA (e.g., viral genomes) and, quite importantly, the real-time monitoring of many different complementary DNA (cDNA) molecules in living cells. Commercial microchip-based instruments are capable of processing approximately 100,000 bases per day, and such outputs are increasing as the density of oligomers per microchip increases. Current efforts to monitor cDNA in Mycobacterium tuberculosis before and after exposure to antibiotics, to determine the frequency of SNPs in human populations, and to find genetic differences in normal versus cancer cells use a variety of microchip instruments.
The ability to perform various types of high-throughput genomic sequencing completely alters the scale of possibilities for infectious disease control and threat reduction. For example, an automated influenza laboratory could integrate and use commercial instruments to sequence
RNA segments for every sample tested each year. An automated food safety and/or bioterrorism mitigation laboratory could also integrate and use commercial instruments to fingerprint literally thousands of bacterial pathogens within 24 hours after an event.
In essence, flow cytometry uses hydrodynamic focusing to guide a thin column of fluid through laser beams. When samples pass through the lasers, photons are scattered and emitted at various angles and intensities that reveal physical-chemical features about each sample. Because thousands to millions of samples can pass through the laser beams within seconds, flow cytometry utilizes signal averaging and thereby produces large signal-to-noise ratios. Several companies now offer flow cytometers that are designed to work in conjunction with laboratory automation and robotics. The colloquium therefore considered various roles for this flexible technology.
Detectable samples may be as small as individual molecules and viral particles or as large as intact bacteria and animal cells. In practice, biological samples are often labeled with nonspecific DNA intercalaters and/or specific antibodies that fluoresce under laser light, thereby providing quantitative information on DNA sizes and protein expression. Alternatively, biological samples may be mixed with special microspheres (e.g., 100 to 1,000 different kinds at once) that are painted with different fluorescent dyes (color multiplexing) and covered with different capture molecules on their surfaces (probe multiplexing). The mixtures are then flowed in order to see which microspheres captured the samples. Such methodologies are analogous to performing multiple enzyme-linked immunosorbent assays (ELISA) on the same sample at once, and they are now commercially available. Once samples flow through the laser beams, the fluidic column may be divided into microscopic droplets by piezo-electric or inkjet devices. This permits droplets with samples to be dispensed in accordance with detection events. Flow cytometry thus provides a fast means for sample detection, separation, and purification.
The ability to perform various types of high-throughput sample characterization completely alters the scale of possibilities for infectious disease control and threat reduction. For example, an automated influenza laboratory could integrate and use commercial instruments to phenotype viral samples (i.e., determine subtypes and/or immunologic relatedness) or genotype (i.e., select optimal polymerase chain reaction primers) within seconds. An automated food safety and/or bioterrorism mitigation laboratory could also integrate and use commercial instruments to phenotype
and genotype literally thousands of bacterial pathogens within 24 hours after an event.
Forward-looking universities, national laboratories, and companies are fostering programs to model, design, test, and manufacture miniaturized devices that perform chemical and measurement-based operations. For medical and biological researchers, this push to miniaturization holds great promise. It is comparable to revolutions seen in the computer electronics industries, where vacuum tubes gave way to transistors, which then gave way to integrated circuits. Test tubes have already given way to microtiter plates with 96, 386, and 1,536 wells. Thus, the next and inevitable step is for microtiter plates to give way to miniaturized and integrated laboratories-on-a-chip (LOC), where performances may be rated by the number of chemical instructions per second.
With miniaturization comes the ability to integrate operations such as separation, reaction, detection, and signal processing and build them into micron- and submicron-sized areas. In general, heat and mass transport constraints on chemical reactions are insignificant from microliter (10−6) to femtoliter (10−15) scales, so LOC devices can perform subnanomolar-scale reactions and syntheses as well. The development of successful and flexible LOC devices will require the efforts from several disciplines, including physics for modeling microhydrodynamics and fluid mechanics; engineering for developing chemical circuits, sensitive detectors, control, and power systems; computer science for dealing with enormous quantities of data from microtechnologies; and biologists for scaling down assays and finding useful applications. The colloquium therefore considered various applications for microtechnologies, particularly with regard to the advanced diagnostics summarized below.
In national security, public health, and medical practice areas, there are growing demands for rapid, one-step, throwaway devices that can diagnose infectious diseases on the spot. Ideally, these point-of-use devices would be capable of detecting infectious disease agents without having to grow them or amplify their genomes. They would also have detection limits equal to the agent's infectious dose 50 percent (ID50) for humans and/or animals, enabling them to signal a bioterrorist attack as it occurs. Such diagnostic devices are now under development by several corporate and government-based programs, but many desirable capabilities are still futuristic. For instance, many biowarfare agents have ID50s in
the range of 101 to 103 organisms, and over the next decade it will remain difficult to build point-of-use devices with such minuscule sensitivities. Thus, for the foreseeable future, point-of-use devices may offer help only in diagnosing exposed and/or acutely ill people.
During acute influenza illnesses, people shed viruses in their respiratory secretions at titers ranging from 103 to 109 tissue culture infectious doses per milliliter. Such in vivo titers correspond to viral-associated protein concentrations of 10−15 to 10−9 moles per liter, which falls within the detection limits of cutting-edge technologies being developed by corporate and government-based programs. Such benchmarks are typical for many other bioterrorist agents, suggesting that advanced diagnostics may offer help against them as well. The colloquium therefore considered how advanced diagnostic devices may function in conjunction with high-throughput automated laboratories and their associated databases.
One concept is to mass produce rapid, one-step, throwaway devices as combined screening dipsticks and sample containers. For the global influenza surveillance network, for example, such devices would be used to test for influenza A and B in humans or animals on the spot. They would be the size of small matchboxes, display bar codes for identification, provide simple (+/−) answers, and contain testing and sample holding chambers. For positive samples one would record key epidemiologic information and forward the device to the automated influenza laboratory for subsequent characterization. For negative samples the devices would be discarded. Such procedures would save time and expense by reducing work on negative samples.
Another concept is to use advanced diagnostic devices for rapidly screening and diagnosing people during the acute phases of bioterrorist attacks. For positive individuals, emergency medical personnel could initiate antibiotic therapies on the spot, which may help save their lives. The diagnostic/sampling devices would then be sent to an automated forensics laboratory, where molecular fingerprints and antibiotic sensitivities would be used for characterizing the overall attack and saving more lives.
Because of Internet connectivity and the availability of worldwide commerical shipping services, the first high-throughput automated laboratory may be situated in the United States yet be accessible from practically any geographic location. Use of the automated laboratory would begin by downloading a set of process control tools (PCTs) over the Internet and installing them on personal computers. These software tools would be written in a platform-independent language (e.g., Java), would
be run in conjunction with web browsers, and could create a flexible environment where the automated laboratory functions like an army of programmable technicians (i.e., mass customized testing). Since collection sites and laboratory facilities may be located on entirely different continents, the system would necessarily operate on a nonreal-time rather than a real-time basis. The key point is that nonreal-time systems demand small communication bandwidths that are supported by today's Internet communications protocols, whereas real-time systems demand large band-widths that are redundant, fail-safe, and not widely available.
In the nonreal-time environment, users would collect samples and then use convenient computerized tools for recording various epidemiologic data and programming laboratory procedures. Testing instructions and background information would arrive over the Internet, and bar-coded samples would arrive via airfreight or other convenient means. Within just a few days the tests would be set up and performed by high-throughput automation in accordance with the “assay scripts” that arrived over the Internet. Results would then be deposited in the database resource and made available to users and others with access privileges. All testing and informatics services would arise from three levels of control (high, intermediate, and low) and would take advantage of the interconnection standards mentioned above.
High-level controls for the automated facility would act like those found in interactive websites, with point-and-click objects and pull-down menus. Outside the facility, users would click on PCT icons to implement and customize various stepwise tasks. Inside the facility, supervisors would click on other PCT icons to manage workloads and maintain automated machinery in optimal working order. A complete set of PCTs, for example, pertaining to access, operation, documentation, submission, storage, analysis, privileges, and accounting, would offer seamless integration of digital and physical tasks from start to finish (Layne and Beugelsdijk, 1998).
Intermediate-level controls would reside in laboratory-based computers, whose functions remain virtually transparent to outside users. To schedule daily “assay runs,” laboratory supervisors would use tools from operations research to simulate automated instrument activities and various constraints (i.e., timing, capacities) imposed by testing procedures. Basically, assay scripts would be modeled by linear equations, fit into a large matrix with others, and solved by numerical algorithms. Typical assay runs may include as many as 10,000 precisely timed and ordered tasks, which far exceeds human capacities for effecting work. Optimized schedules would then go to an instrument's controller that synchronizes and governs the intricate flow of samples, reagents, and supplies.
Low-level controls would also reside in the facility's robotics and automation modules and serve two purposes. The first is to drive internal components like actuators, detectors, and servomotors and coordinate their internal electromechanical activities. The second is to communicate with an instrument's controller and follow its commands, as set forth in the interconnection standard mentioned above.
From the perspective of ease and maintenance, it is often better to build several different high-throughput automated systems that work together instead of one larger one. For this reason the automated influenza, food safety, and threat reduction laboratories would likely use a series integrated systems that work as a seamless unit while enabling mass customized testing via the Internet. Depending on applications, the various systems would focus on, for example, growing, phenotyping, genotyping, biotyping, drug sensitivity testing, and molecular fingerprinting. Such systems may utilize various combinations of immobile laboratory automation and robotics technologies, as well as mobile microtechnologies and advanced diagnostic devices. Since the associated databases would also utilize the Internet to receive and send information, they may be located at any geographic site.
New databases for high-throughput automated laboratories must be set up to store and organize a wide variety of structured data. For example, epidemiologic data may include clinical observations on humans or animals, references to geographic locations and times, and notations on various samples being collected. Such data may come in the form of ASCII text, standardized questionnaires, graphical representations, and audio/video records. Phenotypic and genotypic data may include results from various automated assays and quality controls, programmed instructions for performing such assays and controls, and notations on special reagents or procedures. Such data may come in the form of tabular, numerical, and image-based records. Fortunately, programming languages have reached the point where it is feasible to set up powerful databases that work in conjunction with the Internet. The colloquium therefore considered various technological and proprietary issues pertaining to such undertakings.
High-throughput automated laboratories will require databases that are flexible, scalable, and secure. These requirements are fulfilled to various degrees by commercial database systems that are written in object-oriented or relational languages. Object-oriented systems (e.g., Object Database Management Group) are good at expressing elaborate relationships among objects and manipulating data but are less suited for storing enormous quantities of data. On the other hand, relational systems (e.g.,
Symbolic Query Language) are good at storing and retrieving enormous quantities of data but are less suited for handling elaborate relationships and manipulations. Security issues deal with averting the loss and corruption of data and preventing the unauthorized use of data, whether inadvertent or malicious. Both object-oriented and relational database systems have extensions that enable digital certificates, secure sockets, and secure partitions.
Commercial database systems further support the Extensible Markup Language (XML). This documentation standard is ideal for object-oriented and relational databases that must handle a wide variety of structured data. With XML all structured data become self-describing, platform independent, and transformable into any format. The XML standard also permits multiple pointers, links, and references to multiple sources of data, equipping it to handle many of the problems considered by the colloquium (World Wide Web Consortium, 2000).
With automated laboratories changing the means by which research is conducted, it will be important to maintain traditional rewards for users. At the heart of this system is the freedom to decide how to share data and new information, which can lead to scientific publications and credit for discoveries involving intellectual property. For each category outlined below, data ownership and privileges may be assigned according to the source of financial support.
For the closed category, data would belong solely to the commercial organization (e.g., a pharmaceutical company) that submitted samples and assay scripts and paid for research or testing services. Upon completing such work, the automated laboratory would encrypt and forward all of the raw data to the purchasing organization. Afterwards, it would be the organization's responsibility to manage the security of its private property. For a period of time the automated laboratory would also maintain a secure copy of the digital records to assure redundancy and integrity in accordance with contractual agreements.
For the principal investigator category, data would belong to the person receiving government grant support for a reasonable period of time, say for as long as 2 to 3 years after the grant ends. Good digital practices would be tied to ongoing grant support, requiring each investigator to maintain his or her database records in an orderly manner. After the time embargo had expired, relational links would be attached to the investigator's digital records and the information would become available to others.
For the consortia category, data would belong to all of the collaborating investigators for a reasonable period of time, as suggested above. The collaborators also would have responsibility for maintaining their digital records in an orderly manner, most likely under the supervision of the
group's database manager. After the time embargo had expired, the organized information would become available to others as well.
For the open category, data and its associated links would belong to the public after assuring its quality. The digital records would come from voluntary submissions and time-embargoed data that would be released automatically. The main issue would be maintaining backup copies to assure integrity, plus deciding how to inventory the data and build relational links.
Mining enormous quantities of data from high-throughput automated laboratories (i.e., petabits) poses an open-ended challenge. It requires that users have fast computers and Internet links for undertaking their work. It also requires that users have just the right software algorithms for analyzing their data. Today, researchers have affordable personal computers that perform 109 floating point operations per second (gigaflops), and for large jobs they can rent time on expensive supercomputers that perform 1012 floating point operations per second (teraflops). They have connections to Internet backbones that transmit 109 bits of information per second (gigahertz) and local fiber optic networks that approach such bandwidths. In addition, a growing number of informatics companies are offering new types of software for mining medical and biological data. Just a few years ago these programs were affordable only by global pharmaceutical companies, but prices have fallen to the point now where university-based researchers can buy them. Nevertheless, many informatics tools still need improving, some types are missing, and certain types await definition. The colloquium took a problem-oriented approach to the development of such resources.
In utilizing high-throughput automated laboratories it will be valuable to have two types of informatics tools. One type will be designed to work with smaller datasets, which are generated on a day-to-day basis, whereas the other will work with larger datasets that amass over longer periods of time. This division is practical because short-term informatics tasks are different than long-term ones. Short-term tasks pertain to checking experimental assays against quality controls, spotting obvious patterns or biases, and programming new assays as necessary. Long-term tasks pertain to discovering correlations and order in multidimensional spaces that include epidemiologic, phenotypic, and genotypic data.
Understanding how molecular sequences at the genetic level determine such things as protein structures, enzymatic activities, and antigenic identities is one of the grand challenge problems in medicine and biology. It is also a problem at the crossroads of infectious diseases, threat reduc-
tion, and molecular medicine. For example, researchers have attempted to look for long-term trends in successful mutations for rapidly drifting and shifting viruses such as influenza. Their ultimate goal is to forecast “hot strains” that will cause major epidemics or explosive pandemics and thereby expedite the production of new vaccines against such strains. At the present time, however, researchers have an insufficient database for making solid forecasts on influenza's next step. With sparse data linking the genetic, structural, enzymatic, and antigenic properties of influenza, they have no means for predicting successful mutations. Still needed are many more records containing complete epidemiologic, phenotypic, and genotypic information on influenza isolates over several years. Also needed are new informatics tools and algorithms that can correlate and display enormous quantities of genetic, structural, enzymatic, and antigenic data in clear ways.
Leveraging important scientific problems for the development of informatics tools makes a great deal of sense because fundamental questions in medicine and biology are interconnected at certain levels. For example, influenza has a small RNA-based genome (i.e., kilobases) with many variations, whereas humans have a large DNA-based genome (i.e., gigabases) with comparatively few variations. Nevertheless, the informatics problems associated with finding order by correlating enormous quantities of epidemiologic, phenotypic, and genotypic information are basically the same for the two entities. If properly designed and implemented, informatics tools for influenza will thus carry over to other important problems in infectious disease and molecular medicine.
Important problems in medicine and biology often exhibit non-linearity, complexity, variability, and noise. As already discussed, one aspect of attacking such problems is to build high-throughput automated laboratories that generate enormous quantities of data. Another aspect, however, is to formulate mathematical models that extend intuition about the problem and further organize the collection, generation, and interpretation of data. The colloquium considered various roles for mathematical models in infectious diseases and epidemiology, with the understanding that underlying elements apply to food safety, bioterrorism mitigation, and molecular medicine.
The first role is to understand the complex nonlinear behavior of the problem and determine how certain variables and/or parameters influence the spread and control of infectious diseases. In this regard, epidemiologic models are built on simple assumptions that nevertheless simu-
late complex problems. They can therefore be used to build intuition and insight before starting high-throughput research efforts.
The second role is to guide data collection efforts and thereby create optimal inventories of data. Different infectious disease and epidemiologic models can be used to determine the relative importance of each parameter and its sensitivity to change. They can then be used to prioritize data collection efforts during high-throughput research efforts.
The third is to organize and map enormous phase spaces that are composed of epidemiologic, phenotypic, and genotypic data. One method is to discover relationships by statistical inference. Another is to develop models that may account for the data and then view the data from such perspectives. Infectious disease and epidemiologic models can therefore be used to look for unifying patterns or principles from high-throughput research efforts.
The fourth is to make forecasts or develop interventions with models that are validated against epidemiologic, phenotypic, and genotypic data. With sufficient information, for example, it may become feasible to forecast the next influenza pandemic or to develop broader vaccines with efficacy against many different influenza strains. Infectious disease and epidemiologic models can therefore be used to mitigate risks and save lives.
The total volume and variety of the food supply in affluent countries like the United States are huge. It is therefore impossible and impractical to test every lot or type of food product—even with the help of many high-throughput automated laboratories. With this perspective the colloquium considered the key roles of risk assessment in utilizing expanded laboratory testing capabilities.
Traditionally, risk assessment has been used as the basis for decision-making in chemical safety and only recently has been considered useful in assessing exposures to microbiological pathogens and/or toxins in foods. The underlying assumptions are that (1) all agents are hazardous at some dose, (2) certain agents are hazardous at all doses, and (3) hazard magnitude depends on dose size. “Risk” is therefore defined as the probability that an agent's hazardous properties will be expressed at specified doses, yet such assessments are often plagued by uncertainties in the available data.
Risk assessment aims to determine the likelihood that a hazard will be expressed under specified conditions. For example, given that certain quantities of bacteria are found in lots of ground beef, there is a likelihood that a foodborne outbreak will occur. Risk assessment can be used to model the food supply and identify the products (or product sources) that
pose the greatest risks to public health. Products posing the greatest risks would then be routinely sampled and tested by high-throughput automated laboratories and informatics resources. It is hoped that ongoing validation and adjustment of risk assessments over time would result in safer food supplies.
Life processes consist of reproduction, mutation, and selection. One way of accelerating their exploration is by a technology known as DNA shuffling. Basically, one or more genes are cleaved into fragments and randomly recombined to create many novel genotypes. These sequences are then selected for one or more desired expressions or phenotypes. Subsequently, the selection process is used to identify genes that will become starting points for the next cycle of recombination. The process is repeated until genes expressing the desired properties and/or stabilities are identified.
In practice, DNA shuffling uses living organisms that reproduce rapidly (e.g., bacteria, yeast, mice) in order to explore a small group of mutations and fitness. The colloquium examined ways to build high-throughput laboratory systems for supporting such approaches. It also considered how informatics tools could be used for rapidly discovering relationships from phase spaces that contain various phenotypic and genotypic data. In many areas of biology and medicine, knowledge has been gained by studying a large number of natural mutations and their effects. In DNA shuffling, however, knowledge is now being gained by studying a relatively small number of clever mutations and their effects.
Alibek, K. 1999. Biohazard. New York: Random House.
Committee E01. 1999. E1989-98 Standard Specification for Laboratory Equipment Control Interface (LECIS). West Conshohocken. Pa.: American Society for Testing and Materials (http://www.astm.org).
Layne, S. P., and T. J. Beugelsdijk. 1998. Laboratory firepower for infectious disease research. Nature Biotechnology, 16:825-829.
Mead, P. S., L. Slutsker, V. Dietz, L. F. McCaig, J. S. Bresee, C. Shapiro, P. M. Griffin, and R. V. Tauxe. 1999. Food-related illness and death in the United States. Emerging Infectious Diseases, 5:607-625.
Pablos-Mendez, A., M. C. Raviglione, A. Laszlo, N. Binkin, H. L. Rieder, F. Bustreo, D. L. Cohn, C. S. B. Lambregts-van Weezenbeek, S. J. Kim, P. Chaulet, and P. Nunn. 1998. Global surveillance for antituberculosis drug resistance, 1994-1997 New England Journal of Medicine, 338:1641-1649.
World Wide Web Consortium. 2000. XML 1.0 Second Edition Working Draft Released (http://www.w3.org).