This appendix describes scenarios used by the planning committee to facilitate the workshops and discussed in Chapter 2: Biotechnology Based Scenarios. The committee developed three scenarios, two focused on Multi-omics and on DNA synthesis. The Scenario questions were developed as discussion prompts to support moderators during the meetings.
Overview: Scenario Details and Policy Landscape
The field of multi-omics, which includes the analysis of genomic, transcriptomic, proteomic, and metabolomic data, has revolutionized our ability to understand and influence human health. The integration of these datasets through in silico modeling and other bioinformatic approaches is enabling a deeper understanding of the molecular mechanisms underlying diseases and informing the development of personalized medicine. However, this emerging biotechnology presents significant ethical, legal, and social implications due to its dual-use potential and the risks associated with data security, privacy, and misuse.
Following the sequencing of the first human genome in 2003, the National Research Council published a report describing its vision for creating a knowledge network for biomedical research that would integrate human genomic information with information on other molecules, such as transcripts (mRNA), proteins, metabolites, histone acetylation and DNA methylation patterns, and microbiomes, to understand disease at the molecular level, gene-environment interactions resulting in or otherwise affecting various health conditions, and global changes within a human resulting from various environmental exposures (a field referred to as epigenomics).1 This multi-omic approach to human health contributed nascent efforts toward providing data-driven precision health care to patients, wherein biological, environmental, and lifestyle data could be integrated and analyzed using statistical and computational methods to assess an individual’s risk toward a particular disease and tailor clinical approaches for preventing and treating disease in people.2 During the past fifteen years, numerous countries have established national-level genomic sequencing initiatives of their citizens and scientists have continued to examine genomic markers associated with human traits and disease (i.e., in genome-wide association studies) and to determine what markers are closely associated with their populations.3,4,5,6 In addition, private sector investment in advancing sequencing equipment for increasing the length and accuracy of reads, developing equipment to sequence other molecules such as RNA and proteins, analyzing single
___________________
1 https://nap.nationalacademies.org/read/13284/chapter/5#43
2 https://www.cancer.gov/publications/dictionaries/cancer-terms/def/precision-medicine
3 https://www.ncbi.nlm.nih.gov/books/NBK554738/
4 https://www.who.int/health-topics/genomics#tab=tab_1
5 https://academic.oup.com/database/article/doi/10.1093/database/baaa009/5812711
nucleotide polymorphisms, and applying advanced computational tools (e.g., machine learning and artificial intelligence models) has increased. Philanthropic organizations and academic researchers are working to use these technologies, the data obtained, and analytic tools to understand molecular basis for health, disease, and aging, and to gain comprehensive understanding of interactions between genes, transcripts, and proteins (e.g., efforts for creating cell atlases7,8,9 to aid with precision medicine). Recent efforts by the National Institutes of Health have been focused on ensuring that a diversity of people are included in genomic research, including the United States’ All of Us program, to allow for better understanding of the human genome and natural differences among individuals from across all populations.10 Further, in 2022, the full human genome sequence was completed.11
Recent efforts toward multi-omics analyses for human health have focused on deriving insights on human physiology and complex physiological processes. Scientists are collecting molecular data from various bodily fluids and Scientists are beginning to identify and analyze all of the molecules associated with various bodily fluids and with a particular trait to gain a deeper characterization of the molecular basis of traits.12 Within the last few years, advances in computing power and machine learning have led to the imitation of studies to integrate and analyze data over an individual’s lifespan to predict disease risk, molecularly classify diseases and molecular variability leading to disease, discover new biomarkers, gain new insights about an individual’s health, and determine treatments that are tailored to individuals’ particular circumstances.13 Further, as more information is learned about the human microbiome–both skin and gut–studies looking at the interaction between these microbes and human physiology are initiated. Similarly, efforts, such as the All of Us program, and work of the environmental health community are continuing to work toward greater integration of environmental factors and their effects on human health, in part via the exposome.14 Although chemicals, microplastics, and pollution tend to be a significant focus at the intersection of the environment and human health, others such as access to sunlight (affecting circadian rhythms), stress, and diet also are being analyzed using multi-omics.15,16 Several challenges current exist in this AI/ML-enabled multi-omics for health field, including bias, lack of comprehensive representation of populations, and imbalance in datasets; lack of data standards that limit
___________________
7 https://www.humancellatlas.org/
8 https://www.proteinatlas.org/
9 https://www.nature.com/articles/550451a
10 https://www.genome.gov/about-genomics/fact-sheets/Diversity-in-Genomic-Research
11 https://pubmed.ncbi.nlm.nih.gov/35357919/
12 https://www.nature.com/articles/s41467-024-51134-x
13 https://pmc.ncbi.nlm.nih.gov/articles/PMC10220275/
14 https://pmc.ncbi.nlm.nih.gov/articles/PMC10220275/#bib64
15 https://www.nature.com/articles/s41551-022-00999-8
16 https://www.sciencedirect.com/science/article/pii/S2405471221004518
integration of datasets; lack of uniformly high quality, annotated data and metadata; small sample sizes and data heterogeneity creating challenges of interpretability of the results of AI/ML models; and challenges in computing resources needed to for use of AI/ML with multi-omic data; lack of verifiability of results; and limited understanding of the foundational biology for many traits and diseases.17,18,19
___________________
17 https://onlinelibrary.wiley.com/doi/10.1002/mco2.315
The integration of multi-omic data, which includes genomic, transcriptomic, proteomic, and metabolomic information, has the potential to revolutionize agriculture by enhancing crop improvement strategies. The use of in-silico modeling and bioinformatic approaches allows for the precise prediction and manipulation of crop traits, improving yield, and resistance to diseases. However, this emerging biotechnology also presents challenges in terms of data security, regulatory frameworks, and ethical considerations, especially with regard to biodiversity and food security.
Crop science was one of the earliest fields of life sciences research, focusing initially on selective breeding to generate new varieties of agricultural crops to be resistant to pests or disease, withstand environmental conditions such as drought, and eliminating traits that may be considered undesirable, such as low yield.20 Identification of genetic elements tied to these traits, beginning with the work of Gregor Mendal in the late 1800s, advances in methodologies for mutating plants, and the discovery of DNA in the mid-1950s, enabled scientists working in agricultural biotechnology to understand the genetic basis of traits and more directly alter plants to obtain desirable traits through genetic modification. Other advances in the later half of the 20th century further enabled research and development of modified crops, including the discovery of movable elements called transposons, establishment of techniques for studying and propagating plant tissue in laboratories, new methods for breaking down the plants’ cell walls for easier modification, and development of new tools (e.g., plasmids, restriction enzymes, DNA amplification equipment) for genetically engineering plant cells, have transformed the field of agricultural biotechnology. Development of genetically modified crops also led to numerous concerns about their safety leading to regulations in many countries to govern this type of research, including assessing risks of engineered crops.
Today, new tools are being used to further understand the molecular basis of crop traits; manage agricultural production using soil, ecological, and environmental data; precisely edit the plants; and develop technological strategies for altering plant ecosystems. These new tools build on advances in plant genomics, which accelerated during and following the 2003 sequencing of the first human genome, which drove the development of technologies such as high-throughput sequencing, robotics, microfluidics,
___________________
20 https://www.nature.com/scitable/knowledge/library/history-of-agricultural-biotechnology-how-crop-development-25885295/
among others. These developments enabled the sequencing of genomes of 1000 plants from 788 species, including several crops (e.g., rice, maize, grape, cotton, peach, potato, tomato, soybean, wheat).21,22 Advances in sequencing technologies also have enabled the development of methodologies and technologies for identifying and/or sequencing other critical molecules including transcripts (messenger RNA, mRNA), proteins, DNA methylation patterns, and metabolites.23,24 Data are generated from various experimental approaches including empirical studies in functional genomics, which aims to discern the genes (both core and dispensable genes, which is referred to as pangenomics25,26), gene variants, and gene interactions that are responsible for specific traits; sequencing of various molecules using different next generation sequencing technologies; micro-array studies, which aim to examine global changes in biological molecules (e.g., increases or decreases in mRNA levels for different genes providing information about gene expression27); and protein sequences and their interactions.28 These data are integrated and analyzed at scale using advanced data analysis tools (e.g., data mining, machine learning, statistical methods) to gain insights into the molecular markers of various traits in crops, such as abiotic stress tolerance, yield, flavor or nutritional properties, and flowering time.29 In addition to identifying molecular markers of traits under different conditions in different plants, multi-omics analyses using are starting to be used to predict complex traits in model plants, specifically Arabidopsis thaliana, maize, and rice.30
___________________
21 https://www.sciencedirect.com/science/article/abs/pii/S1360138521002818
22 https://www.sciencedirect.com/science/article/pii/S017616172030242X
23 https://link.springer.com/chapter/10.1007/978-981-99-4673-0_6
24 https://www.sciencedirect.com/science/article/pii/S017616172030242X
25 https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2022.1062952/full
26 https://www.sciencedirect.com/science/article/pii/S017616172030242X
27 https://www.mdpi.com/2073-4425/14/6/1281
28 https://www.sciencedirect.com/science/article/pii/S017616172030242X
29 https://www.sciencedirect.com/science/article/pii/S017616172030242X
Specific biotechnologies could apply to sectors such as agricultural, marine, wildlife; or disciplines such as the study of host immune response to viruses to develop medicine.
In 2002, scientists published the very first paper describing chemical synthesis of poliovirus genomic DNA, which was capable of producing infectious virus in the laboratory.31 This experiment, which differed from others involving the cutting and pasting DNA form organisms into intermediate bacterial, viral, or plasmid vectors, involved chemically stitching together short pieces of DNA (oligonucleotides) purchased from a company based on published poliovirus sequence. Not too long after, scientists synthesized the genome of a bacteriophage and a full-length coronavirus (SARS-CoV-132). Nearly a decade later, scientists at the J. Craig Venter Institute developed a new biochemical method for assembling DNA fragments into larger DNA pieces.33 Using this method, along with in vivo recombination in yeast (Saccharomyces cerevisae), to create the first chemically synthesized bacterial genome.34 Initially, the scientists chemically synthetized the full genome of Mycoplasma genitalium, but experienced problems in producing the intact bacterial chromosomes in yeast and subsequently introducing those chromosomes into bacterial cells. Ultimately, the scientists had to match the genus and species of the genomic and cellular components to create bacterial cells powered by a single, chemically-synthesized genome.35,36 This effort alone cost the Venter Institute $40 million and took 15 years to achieve. Following these advances, scientists at the Venter Institute worked to synthesize the first minimal genome for Mycoplasma mycoides. Researchers designed the bacterial genome into discrete pieces, each of which were tested for their ability to create viable bacteria; determined which genes are essential or nonessential using the company’s global transposon mutagenesis approach; and developed rules for genome design to enable removal of specific genes without affecting expression of the other genes.37 Using this approach for minimal genome design, the scientists build on their previously published methods to produce the DNA in yeast and assessed the synthetized genome’s viability in a recipient bacterial cell in the same genus (Mycoplasma
___________________
31 https://www.science.org/doi/10.1126/science.1072266
32 https://www.sciencedirect.com/science/article/pii/S009286742200798X
33 https://www.nature.com/articles/nmeth.1318
34 https://www.science.org/doi/10.1126/science.1190719?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed
35 https://www.theguardian.com/science/2010/may/20/craig-venter-synthetic-life-genome
36 https://www.theneweconomy.com/technology/artificial-life-synthetic-genes-boot-up-cell
capricolum). Other research groups have developed different methodologies for designing (i.e., determining which genes can be removed) and synthesizing minimal genomes.38 These cells do not behave like natural cells so in 2021, scientists identified seven genes that, if included in the synthetized minimal bacterial genome, could allow the resulting synthetic cell to behave and divide naturally.39 In 2017, scientists in Canada announced their ability to resurrect an extinct virus, the horsepox virus, from the published horsepox viral genomic sequence.40 For nearly two years, the scientists worked with a gene synthesis company to improve its DNA synthesis methodology to create high-fidelity DNA fragments of 10-30 kilobases each that covered the entire horsepox genome (212 kilobases), attached virus-specific end pieces (from vaccinia virus) that the laboratory had on hand to the synthesized DNA, and used a helper virus system previously developed and published by the laboratory to create the live horsepox virus.41 Horsepox is thought to be either identical to or a close relative of vaccinia virus, which is why the end pieces could function properly with the horsepox virus DNA. Further, several scientific and technological advances were necessary to enable synthesis of the very large DNA fragments. More recently, the genomes of E. coli, Caulobacter crescentus, and yeast have been synthesized.42
Since the early days of chemical synthesis of DNA, the methods have improved toward “writing” longer pieces of DNA.43 The increased in speed of DNA synthesis contributed to these improvements.44 The foundational technologies enabling DNA synthesis include solid-phase synthesis using phosphonamidite chemistry to add individual bases one at a time, which has been in use since the 1980s when the first automated DNA synthesizer was developed, and using enzymes to synthesize short fragments of DNA. Early enzymatic technologies required a template DNA to create new DNA pieces, but scientists developed and built on newer methods for synthesizing DNA without the use of a template. DNA synthesis companies iteratively improved template-independent methods to address known limitations and optimize synthesis performance. Newer technologies developed in the 21st century include Gibson assembly, which was described previously and polymerase cycling assembly, which takes advantage of DNA pairing and amplification using PCR. Both of these methods can result in inclusion of impurities in the synthesized DNA and are dependent on molecular biology methods for confirming the sequences. In addition, the polymerase cycling assembly method requires use of a
___________________
38 https://pubs.acs.org/doi/10.1021/acssynbio.7b00296#
39 https://www.nist.gov/news-events/news/2021/03/scientists-develop-cell-synthetic-genome-grows-and-divides-normally
40 https://www.science.org/content/article/how-canadian-researchers-reconstituted-extinct-poxvirus-100000-using-mail-order-dna
41 https://pmc.ncbi.nlm.nih.gov/articles/PMC5774680/
42 https://www.sciencedirect.com/science/article/pii/S009286742200798X
43 https://www.nature.com/articles/s41570-022-00456-9
44 https://www.sciencedirect.com/science/article/pii/S009286742200798X
high-fidelity enzyme to prevent inadvertent mutation of the DNA during the PCR amplification step. More recently, companies are developing thermal methods that use temperature to control DNA synthesis, double-stranded oligonucleotide pools to create libraries of larger DNA fragments, microarrays using silicon microarray chips that can create and elongate tens of thousands of DNA fragments, and DNA amplification in the presence of a precisely designed template and primer. As these and other techniques are developed, companies are seeking to scale up production, as evidenced by the use of microarray chips to synthesize numerous DNA pieces simultaneously.