Read "Reference Manual on Scientific Evidence: Fourth Edition" at NAP.edu

Page 47 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

How Science Works

MICHAEL WEISBERG AND ANASTASIA THANUKOS

Michael Weisberg, Ph.D., is Bess W. Heyman President’s Distinguished Professor of Philosophy and Deputy Director of Perry World House, University of Pennsylvania.

Anastasia Thanukos, Ph.D., is principal editor of the University of California Museum of Paleontology, University of California, Berkeley, and editor of the Understanding Science project.

CONTENTS

Introduction

What Is Science: “Textbook” Science vs. the Practice of Science

An Example of Science in Practice: Connecting CFCs to Ozone Depletion

The Science Flowchart

The Science Flowchart and Admissibility

Key Traits of Science

Science Investigates the Natural World and Natural Explanations

Science Investigates Testable Hypotheses

Science Responds to Evidence

Science Does Not Prove Hypotheses

Science Is Carried Out by a Community that Holds Members to Norms

Nonscience, Pseudoscience, Bad Science, and Wrong Science

Science as a Human and Community Endeavor

Peer Review

Bias

Misconduct

Correction and Retraction

Scientific Expertise

Testing Hypotheses

Scientific Methodologies and Data

Experimental and Nonexperimental Methods

Qualitative Data, Quantitative Data, and Mixed Methods

Modeling

Correlation and Causation

Assumptions and Auxiliary Hypotheses

Replication and the “Replication Crisis”

Achieving Scientific Consensus

Systematic Reviews and Meta-analyses

Not Necessarily Consensus

Myths About Science

Page 48 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Science and the Law

Assessment of Evidence

Precedent and Self-Correction

Conclusion

Glossary of Terms

FIGURES

1. Science is complex, iterative, dynamic, and social. It does not proceed according to a linear, step-by-step process

2. Relationships among the Daubert factors and the process of science

3. Indicators of scientific consensus. Scientific consensus is formed based on a body of evidence, not a single study or investigation

TABLE

1. Textbook Science vs. Science in Practice

Page 49 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Introduction

Science typically appears in the courtroom as scientific evidence presented by witnesses deemed experts by the courts. Such evidence may be considered by a jury or judge, depending on the type of case. Judges are the gatekeepers for this evidence, determining what information and testimony will be allowed to enter the court and influence the outcome of a trial, a challenging and essential role for several reasons. The broad public, from which juries are selected, has a limited understanding of many basic, well-established, and uncontroversial scientific concepts.¹ Yet in a trial, a jury may have to weigh evidence regarding advanced scientific knowledge that is still emerging and about which lines of evidence may conflict or be inconclusive. Complicating matters, public relations campaigns have misled the public about the true state of scientific consensus regarding certain scientific issues.² Further, scientific expertise and the trappings of science can be made to seem persuasive, even when there is little substance to them, making it more difficult for a jury to come to a just decision when presented with shaky evidence about which they have limited scientific understanding. Thus, the responsibility to ensure that the testimony the jury hears is scientifically sound and meets minimal standards of reliability is an important one.

The purpose of this reference guide is to provide a nuts-and-bolts look at how science works today to frame subsequent reference guides of this reference manual and to inform judges’ consideration of proffered scientific evidence and their decisions about what testimony to allow in a trial. Other reference guides will help judges assess questions of law (e.g., What opinions have shaped interpretation of the Federal Rules of Evidence?; see The Admissibility of Expert Testimony, in this manual) and questions specific to scientific disciplines (e.g., Does bitemark evidence emerge from “good” science and should it be permitted at trial?; see Reference Guide on Forensic Feature Comparison Evidence, in this manual). In this reference guide, we provide background and guidance to inform questions that turn on the processes and nature of science writ large: Does peer review ensure that a study is scientifically sound? What qualifies someone as an expert within the scientific community? Should the sponsor of scientific research (e.g., a government agency versus a company) factor into interpretation of its findings? Fairly evaluating such questions requires an understanding of the nature and process of science that goes beyond media and textbook portrayals. The view of science presented below is based on current research and thinking in an academic discipline that studies the theory and processes of scientific knowledge building: the philosophy of

1. Pew Research Center, What Americans Know About Science (March 2019).

2. Naomi Oreskes & Erik M. Conway, Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming (2010) [hereinafter Oreskes & Conway 2010].

Page 50 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

science.³ We focus in this reference guide on the Western conception of science as it currently operates, recognizing that science is a social and cultural practice that has changed and will continue to change over time.

What Is Science: “Textbook” Science vs. the Practice of Science

Science is both a body of knowledge and the process for building that knowledge based on evidence acquired through observation, experiment, and simulation. The term is accurately applied to knowledge on a wide variety of topics and to diverse lines of inquiry. These can include familiar disciplines in the natural sciences, like medicine and physics, as well as the social and behavioral sciences. Issues involving economics, history, education, politics, social trends,

Clarifying vocabulary: Scientific knowledge

Misconceptions about and alternative definitions of the terms hypothesis, theory, model, and law abound. Herein, we use the following definitions for these elements of scientific knowledge:

Hypothesis—a potential explanation for a natural phenomenon, regardless of the extent to which the explanation has been investigated or the amount of evidence supporting or refuting it

Theory—a concise, coherent, and predictive explanation for a broad range of natural phenomena that integrates and makes sense of many hypotheses

Model—a mathematical representation of hypothesized mechanisms and interactions within a system that can be used to investigate the impact of parameter changes on predicted system outcomes

Law—an often-mathematical statement of the relationship among observable phenomena

Note that within and outside of science, these terms are used inconsistently, and their popularity has risen and fallen in different disciplines and at different points in history. For further discussion of these terms, see sections titled “Not Necessarily Consensus” and “Myths About Science” below.

3. Readers interested in an accessible introduction to this discipline, its history, and current perspective on the nature and process of science are referred to Peter Godfrey-Smith, Theory and Reality: An Introduction to the Philosophy of Science (2003) [hereinafter Godfrey-Smith 2003].

Page 51 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

human opinions, and much more can be investigated with scientific rigor. This broad view of the sorts of topics within the purview of science aligns with the wide variety of topics addressed by later reference guides in this manual. In short, what makes a unit or body of knowledge scientific is the process by which it was established, not the topic it concerns.

What is this special process that makes an inquiry or investigation scientific? How do we generate scientific knowledge? Textbooks and media often present a sterile and simplified, but commonly held, view of the process of science, sometimes called the scientific method,⁴ which goes something like this: First, a scientist forms a hypothesis, a potential explanation for a natural phenomenon. For instance, a scientist might hypothesize that a particular class of chemicals is causing a hole in the ozone layer. The scientist performs an experiment to test the idea, perhaps by releasing the chemical in an ozone-filled chamber and measuring changes in the gases present in the chamber. The scientist reaches a conclusion about the hypothesis—that it is correct because ozone decreased significantly in the presence of the chemical. The results of the experiment are then written up and published. Other scientists read the article and mainly agree with the conclusions, and, voila, a new scientific fact is established. This simplified view correctly calls out an essential feature of scientific knowledge building—that it is based on evidence from the natural world that can stand up to outside scrutiny—but is misleading in other ways.

Table 1. Textbook Science vs. Science in Practice

Common representation of the scientific method	Characteristics of science in practice
Scientific investigations follow a codified step-by-step method.	Scientists engage in many different activities in different orders as they pursue evidence and explanations.
Scientists work in isolation.	Most modern science is collaborative and depends on social interactions within the scientific community.
Scientists use experiments to gather evidence.	Scientists use different methods to gather evidence; an experiment is just one of these methods.
Scientific studies reach a firm conclusion.	Scientific conclusions are always revisable if warranted by the evidence.
A single convincing study leads to scientific consensus on a hypothesis.	Scientific consensus is generally reached over the course of multiple studies conducted by different groups pursuing different lines of inquiry.

4. William F. McComas, The Principal Elements of the Nature of Science: Dispelling the Myths, in The Nature of Science in Science Education: Rationales and Strategies 53 (1998) [hereinafter McComas 1998].

Page 52 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

An Example of Science in Practice: Connecting CFCs to Ozone Depletion

Although scientific inquiry always involves a reliance on evidence and has certain other traits (see section titled “Key Traits of Science” below), there is no one scientific method.⁵ For example, while scientists have reached consensus on the hypothesis that human production of a certain class of chemicals, chlorofluorocarbons (CFCs), depletes the ozone layer, the actual story is not a simple one. We shall tell this story here, and then refer to it, as well as to other scientific studies that have made their way into the courtroom, throughout this reference guide to illustrate key points about the modern practice of science.

In the 1970s, a medical researcher, curious about the source of the haze near his home, started trying to detect chemicals in the air that come exclusively from human activities—in this case, CFCs. He found them everywhere, even in Antarctica, far from where they were being produced.⁶ The findings were presented at a conference, where they caught the attention of another scientist, Sherwood Rowland, who shared it with his postdoctoral researcher, Mario Molina. Molina then turned to the published literature to figure out what might happen to these molecules once released into the atmosphere.

Molina and Rowland published an article putting forth the hypothesis that CFCs deplete our protective ozone layer, but their evidence did not come from an experiment or even any new observations.⁷ Instead, the pair brought together measurements collected by other researchers, calculations, and established knowledge about basic and atmospheric chemistry, arguing that if all these other ideas and estimates were true, a logical outcome is that CFCs pose a threat to the ozone layer. Later, experiments were performed that suggested the chemical reactions that Rowland and Molina reasoned should happen actually did happen. Other scientists incorporated these ideas into their models of the atmosphere and made predictions about what should be observed if Rowland and Molina’s hypothesis were correct. Still other groups collected atmospheric evidence to test the predictions made by the models. Meanwhile,

5. Philip Kitcher, The Advancement of Science: Science Without Legend, Objectivity Without Illusions (1995) [hereinafter Kitcher 1995]; William C. Wimsatt, Re-engineering Philosophy for Limited Beings: Piecewise Approximations to Reality (2007) [hereinafter Wimsatt 2007]. For an argument that Daubert v. Merrell Dow Pharmaceuticals, Inc. is partly founded on the myth of the scientific method, see Sheila Jasanoff, Law’s Knowledge: Science for Justice in Legal Settings, 95 Am. J. of Pub. Health s49–s58 (2005), https://doi.org/10.2105/AJPH.2004.045732 [hereinafter Jasanoff 2005].

6. J. E. Lovelock, R. J. Maggs, & R. J. Wade, Halogenated Hydrocarbons in and Over the Atlantic, 241 Nature 194–96 (1973), https://doi.org/10.1038/241194a0 [hereinafter Lovelock et al. 1973].

7. Mario J. Molina & F. Sherwood Rowland, Stratospheric Sink for Chlorofluoromethanes: Chlorine Atom-Catalyzed Destruction of Ozone, 249 Nature 810–12 (1974), https://doi.org/10.1038/249810a0 [hereinafter Molina & Rowland 1974].

Page 53 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

the CFC industry backed another scientist to oppose the hypothesis, and Rowland and Molina checked some of the old measurements on which they had based their hypothesis and found them to be inaccurate. After correcting those numbers, models were updated, compared, and updated again. More data were collected and eventually, eight years after the hypothesis was first published, researchers discovered a thinning of the ozone layer over Antarctica much more extreme than expected.⁸ This led to more revisions of the hypothesis, a few additional twists and turns, and eventually a Nobel Prize, the Montreal Protocol, which phased out CFC production, and today, a recovering ozone layer. Concurrently, ozone depletion has made its way into the courts as established science. For example, the National Resource Defense Council (NRDC) brought suit against the Environmental Protection Agency (EPA) for making decisions that violated the Montreal Protocol. The NRDC was found to have standing because of the increased risk of skin cancer that NRDC members would experience as a result of ozone destruction.⁹ Ozone depletion was treated as a fact in that case, reflecting the scientific consensus on the issue.

In this case, science worked exactly as it ought. Hypotheses were sifted through, scrutinized by different parties with different interests, compared against multiple lines of evidence, and iteratively modified, ultimately leading to reliable and actionable scientific knowledge. Such examples are typical of science as it is actually practiced: Rarely do scientists apply a simple linear recipe, rely on a single critical experiment, or reach an immediate and firm conclusion. While the complex and iterative processes that went into establishing depletion of the ozone layer by CFCs are commonplace in science, the speed with which societal and political action followed scientific consensus in this case may be unusual.¹⁰ We also note that this example of scientific consensus building played out over many years, and that, for good reason, much of the scientific testimony presented at trial will concern hypotheses with a much lower degree of certainty and that the scientific community has had less time to scrutinize. Nevertheless, throughout this reference guide the CFC/ozone example will serve as a valuable reminder of how science generally works, as well as the power of science to spur real-world change—a function that is often mediated by the courts and so further highlights the responsibility of the judge in evaluating the reliability of scientific testimony and in appointing scientific experts as appropriate.

8. J.C. Farman, B.G. Gardiner, & J.D. Shanklin, Large Losses of Total Ozone in Antarctica Reveal Seasonal ClO_x/NO_x Interaction, 315 Nature 207–10 (1985), https://dx.doi.org/10.1038/315207a0 [hereinafter Farman et al. 1985].

9. Nat’l Rsch. Def. Council v. Env’t Prot. Agency, 464 F.3d 1 (D.C. Cir. 2006).

10. Oreskes & Conway 2010, supra note 2.

Page 54 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

The Science Flowchart

Science is complex, iterative, dynamic, and social. It emerges out of the work of fallible people, shaped by their unique backgrounds, perspectives, and motivations, engaged in different activities in different orders as they come up with new explanations, and test and retest those ideas.¹¹ Through this process, biases are identified and corrected, inaccurate ideas are eventually rejected, and discrepancies are negotiated and resolved. This practice does lead to scientific consensus, but over many years, investigations, and interactions, not in the context of a single study. These key points are illustrated by the science flowchart (Figure 1), which more accurately represents the process of science than does the five-step scientific method frequently found in textbooks.¹²

This flowchart can help us frame scientific inquiry into a wide range of topics. It applies to the natural sciences, as well as social sciences and engineering research. It applies to both qualitative and quantitative research conducted in laboratories, in the field, in clinical settings, and through the examination of historical records and other datasets. It applies to research conducted by academics, engineers, government workers, industry employees, and experts hired by litigation teams. Any line of inquiry that aims to establish objective, reliable knowledge based on evidence can be examined using this lens. While such knowledge-building activities do not follow a prescribed step-by-step process, they do have some traits in common. At some point, in some way, someone gets an idea (often called a hypothesis) about how to solve a problem, explain a phenomenon, or answer a question; at some point, in some way, that hypothesis is tested against evidence; at some point, in some way, the evidence and hypothesis are scrutinized by others with relevant expertise; and if the hypothesis holds up to this scrutiny through multiple rounds of testing, people will use it—to solve problems, make decisions, and/or build more knowledge.

If mapped onto the science flowchart, our example of the discovery of the link between CFC production and the hole in the ozone layer would produce a winding path that circles back on itself, proceeding through the testing phase multiple times. A map of the investigations establishing that cigarettes cause lung cancer would follow a different path, as would a map of social science and psychological research examining factors that impact eyewitness recall.¹³ When a judge must evaluate

11. For example, see David L. Hull, Science as a Process: An Evolutionary Account of the Social and Conceptual Development of Science (1988) [hereinafter Hull 1988]; Kitcher 1995, supra note 5; Helen E. Longino, Science as Social Knowledge: Values and Objectivity in Scientific Inquiry (1990), https://doi.org/10.2307/j.ctvx5wbfz [hereinafter Longino 1990]; & Wimsatt 2007, supra note 5.

12. University of California Museum of Paleontology, Understanding Science (2022), https://perma.cc/MAB9-NYWL [hereinafter UCMP 2022].

13. Each of these lines of research has, at some point, been applied in the courtroom. See Nat’l Rsch. Def. Council v. Env’t Prot. Agency, 464 F.3d 1 (D.C. Cir. 2006); United States v. Philip Morris USA Inc., 566 F.3d 1095 (D.C. Cir. 2009); and Sanders v. City of Chi. Heights, No. 13 C

Page 55 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Figure 1. Science is complex, iterative, dynamic, and social. It does not proceed according to a linear, step-by-step process.

whether expert scientific testimony is scientifically sound, it is not a simple matter of ensuring that a particular five-step process was followed. Instead, the judge’s task requires a deeper examination of the available evidence and methods by which it was arrived at, as well as an assessment of how the community of experts in this area has evaluated or would evaluate the evidence and reasoning in question.

Widely accepted scientific knowledge is built over the course of many iterations through this flowchart—a process that often takes years. For simplicity’s

0221 (N.D. Ill. Aug. 18, 2016). For more on eyewitness identification, see Thomas D. Albright & Brandon L. Garrett, Reference Guide on Eyewitness Identification, in this manual.

Page 56 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

sake, judges might wish to be presented only with scientific testimony based on mature research programs, in which many studies by different groups have been conducted, evidence has been scrutinized from different viewpoints, and some level of consensus has been reached; but, of course, disputed facts in litigation may hinge on questions of science that are still emerging, and so evidence is lacking, or that are so specific that new research must be conducted to inform the litigation. Such testimony may necessarily rely on one or just a few tests that have not yet received broad scrutiny by the relevant scientific community, and it may be that the scientific investigations that would best inform litigation have not been performed; neither situation renders inadmissible testimony based on the well-conducted and informative (though not conclusive) studies that have been performed. Despite our reliance on an example in which scientific consensus was eventually achieved, we emphasize that scientific consensus is not a requirement for testimony to be admissible, as described in the section below and in The Admissibility of Expert Testimony in this manual. If two experts present conflicting scientific testimony, and the testimony for each expert is grounded in proper scientific methodology, the court may admit the testimony of both experts and allow the trier of fact (i.e., jury or judge) to resolve the conflict.

This reference guide examines the progression of science from speculation to wide acceptance, as illustrated by the CFC/ozone example, to assist with judges’ evaluation of the admissibility of scientific evidence and develop their understanding of the epistemology of science.¹⁴ Subsequent reference guides will discuss the strengths and weaknesses of individual studies and lines of investigation in various scientific disciplines relevant to the judiciary.

The Science Flowchart and Admissibility

Federal Rule of Evidence 702 outlines the findings a judge must make to admit expert testimony over an objection.¹⁵ According to Rule 702, such testimony is

14. Jasanoff 2005, supra note 5, laments the epistemological sophistication Daubert would seem to require to recognize “good science,” as well as judges’ lack of training in this area—a challenge that we hope this manual will help remedy.

15. While the testimony of expert witnesses identified by the parties is the most common way that science makes its way into the courts, it is not the only way. Court-appointed experts may perform a variety of functions and services. They may serve as witnesses themselves under Federal Rule of Evidence 706; advise the court on assessment of evidence that might, for example, be relevant for admissibility decisions; help mediate settlements; and/or provide background information to the judge or jury. In litigation that involves scientific evidence, the use of such independent scientific experts may help address potential challenges to reaching a just resolution that stem from aspects of the process of science and the process of litigation highlighted herein. These include the complexities of the process of science outlined above; the importance of community analysis and

Page 57 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

admissible when the judge finds that (1) the witness is qualified to offer the testimony, (2) the witness’s expertise will help determine questions of fact, (3) the testimony is based on sufficient facts or data, (4) the testimony results from reliable principles and methods, and (5) the witness has reliably applied those principles and methods to the facts of the case. In Daubert v. Merrell Dow Pharmaceuticals, Inc., the Supreme Court found that expert scientific testimony need not be “generally accepted” to be admitted under Rule 702; instead, the Court instructed federal judges to exercise a “gatekeeping” role regarding scientific evidence, admitting for consideration only such evidence that is grounded in an appropriate “scientific methodology.”¹⁶ The Court indicated that an appropriate scientific methodology often includes the process of formulating hypotheses and conducting experiments to test the validity of the hypotheses. The Court then provided a set of illustrative criteria suggestive of the sorts of considerations judges must make in determining whether the proposed testimony meets the standard of an appropriate scientific methodology.¹⁷ The factors selected as examples by the Court include:

whether it can be and has been tested;
whether it has been subjected to peer review and publication;
whether it has a known error rate;
the existence and maintenance of standards and controls; and
whether the theory or technique employed by the expert is generally accepted in the scientific community.

The Ninth Circuit clarified another relevant consideration relating to Daubert: whether the research was conducted independently of the particular litigation or dependent on an intention to provide the proposed testimony. Subsequent

feedback in sorting through scientific explanations as described in the section titled “Science as a Human and Community Endeavor” below; the time all this takes to play out; the wide variety of valid scientific methodologies (described in the section titled “Scientific Methodologies and Data” below); the fact that individual scientists are human and inherently biased (described in the section titled “Bias” below); the depth of the subject matter knowledge that may be required to understand and evaluate scientific evidence; and of course the stakes that each party has in the testimony of their proffered expert witnesses and potential biases introduced in selecting those witnesses. Resources are available to assist judges in this process of identifying and securing a court-appointed expert—for example, the CASE program of the American Association for the Advancement of Science. For more on the services performed by court-appointed experts, see American Association for the Advancement of Science, Court Appointed Scientific Experts: A Handbook for Experts Version 3.0 (2002), https://perma.cc/5FHD-GKQL, and Daniel L. Rubinfeld & Joe S. Cecil, Scientists as Experts Serving the Court, 147 Daedalus 152–63 (2018), https://doi.org/10.1162/daed_a_00526.

16. 509 U.S. 579 (1993).

17. For more on Daubert and admissibility, see Liesa L. Richter & Daniel J. Capra, The Admissibility of Expert Testimony, in this manual.

Page 58 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

decisions by the Supreme Court clarified the standards for scientific and technical evidence.¹⁸

These legal standards for admissibility under Daubert can be mapped onto the elements of the flowchart, as illustrated by Figure 2. The central region of the flowchart outlines the logic of testing hypotheses against evidence, the first of the Daubert factors outlined above. Testing is essential to both the process of science and Daubert. If a hypothesis or testimony is not supported by or derived from evidence in some form, it cannot be based on scientific methodologies. The process of testing leads to two more of Daubert’s illustrative factors. The known or potential rate of error for scientific techniques or processes (the third Daubert factor) can be derived statistically from sufficient rounds of testing and data collection. The appropriate standards and controls required by a technique or process (the fourth Daubert factor) are likewise established in this way. Error rates, standards, and controls most clearly impact the consideration of studies that involve analytic and assessment techniques, such as sobriety tests, forensic identification techniques, and DNA analysis.

Relationships among the Daubert factors and the process of science — Figure 2. Relationships among the *Daubert* factors and the process of science.

18. These standards were then codified through amendments to Federal Rule of Evidence 702. For more on Federal Rule of Evidence 702, see Liesa L. Richter & Daniel J. Capra, The Admissibility of Expert Testimony, in this manual.

Page 59 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

The flowchart’s Community analysis and feedback area highlights peer review and publication (the second Daubert factor) as an important element of scientific knowledge building. For many areas of science, peer-reviewed journal articles are the usual means through which the expert scientific community assesses hypotheses and evidence (see section titled “Peer Review” below). However, other scenarios are also possible. For example, an engineering technique or piece of computer code might be scrutinized through direct use by the relevant community to see if it performs as expected. Research conducted solely for litigation may be scrutinized by the judge, other expert witnesses, and a jury.¹⁹ Finally, it is through interactions within Community analysis and feedback and multiple iterations through the Testing ideas region that hypotheses, theories, and techniques gain widespread acceptance (the final Daubert factor; see section titled “Achieving Scientific Consensus” below). And of course, the final result of the judge’s decisions on the admissibility of scientific evidence represents one of the ways that benefits (the lower left area of the flowchart) are derived from the process of science, by helping resolve disputes through litigation.

Textbook portrayals of a five-step scientific method evoke a scientist in an academic institution driven by curiosity alone.²⁰ In contrast, the upper area of the science flowchart and the Benefits and outcomes region illustrate the variety of factors that motivate scientific investigations. Engineering research may be driven by the need to solve a particular problem, and as Kumho Tire Co. v. Carmichael held, the provenance of such specialized knowledge can be meaningfully compared to standards and methodologies of good science.²¹ Career advancement, prestige, and funding are likewise incentives for many scientists.²² And as described in The Admissibility of Expert Testimony, in this manual, scientific research may also be conducted for the sole purpose of testifying at trial.

The fact that a study was undertaken in the service of litigation does not itself make the research unscientific or unreliable.²³ All scientists have some sort of motivation for their work, and this does not preclude scientific knowledge

19. Jasanoff 2005, supra note 5, describes this role of the judiciary in helping produce certain sorts of scientific knowledge, acknowledging the non-independence of science and the law.

20. McComas 1998, supra note 4.

21. 526 U.S. 137 (1999).

22. Hull 1988, supra note 11; Kitcher 1995, supra note 5; Longino 1990, supra note 11; Michael Strevens, The Role of the Priority Rule in Science, 100 J. of Phil. 55–79 (2003), https://doi.org/10.5840/jphil2003100224; and Wimsatt 2007, supra note 5.

23. In the 1995 remand of Daubert (43 F.3d 1311 (9th Cir. 1995)), Judge Alex Kozinski wrote that scientific investigations performed in the service of litigation should be more stringently evaluated for admissibility, suggesting that they may not be reliable, a perspective that has been criticized by some scholars. For example, see Sheila Jasanoff, Representation and Re-Presentation in Litigation Science, 116 Env’t Health Persps. 123–29 (2008), https://doi.org/10.1289/ehp.9976; and Leslie I. Boden & David Ozonoff, Litigation-Generated Science: Why Should We Care? 116 Env’t Health Persps. 117–22 (2008), https://doi.org/10.1289/ehp.9987, which are largely consistent with the broad view of science as a human and community endeavor presented in this reference guide. Indeed, Jasanoff attributes

Page 60 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

building, so long as biased methodologies and interpretations are avoided. However, research conducted for the purpose of testifying is unlikely to have had the opportunity to be reviewed and scrutinized by the relevant scientific community. Though imperfect (see section titled “Peer Review” below), this process of community scrutiny can still help to identify and overcome biases that may shape a scientific investigation, as well as assess its methodological and deductive quality (see section titled “Science as a Human and Community Endeavor” below). Because of this, judges may be placed in the position of providing a higher level of scrutiny to testimony based on research developed for litigation than that based on established research programs conducted for other purposes, which have already iterated through the flowchart multiple times. For example, survey research is an important line of evidence in trademark infringement litigation,²⁴ and this research may be so specific in its intent (e.g., whether the term beanies is perceived as a brand name for a toy²⁵) that there is little reason to perform it except in service of litigation. Testimony based on such survey research is often determined to be admissible, but may not be if biases or methodological problems are identified in the study design.²⁶ In short, testimony based on research conducted solely for litigation may be scientifically reliable and admissible—or may not be; and making this determination in the absence of any of the services performed by the scientific community may place a higher burden on judges.

Key Traits of Science

Judges might wish for a single marker that distinguishes between science and pseudoscience, good and bad science, accepted and speculative science. Unfortunately, we have no easy answers. These are all the end points of continua, and multiple factors determine where on the spectrum a particular unit of knowledge or research falls. Contemporary thinking suggests that the boundary between science and nonscience is best identified by the process by which knowledge is generated, and

this denigration of litigation-derived science to resting on “idealized, misleading, or misinformed assumptions about the scientific method,” which this reference guide aims to clarify and correct.

24. See Shari Seidman Diamond et al., Reference Guide on Survey Research, in this manual.

25. Ty Inc. v. Softbelly’s, Inc., 353 F.3d 528 (7th Cir. 2003).

26. Rebecca Kirk Fair & Laura O’Laughlin, Ensuring Validity and Admissibility of Consumer Surveys, ABA Groups Practice Points (2017), https://perma.cc/2LHD-7NE2. For more on the design of methodologically sound surveys, see Shari Seidman Diamond et al., Reference Guide on Survey Research, in this manual.

Page 61 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

especially by how the community engaged with that process behaves.²⁷ Key aspects of this process and these behaviors are outlined below.

Science Investigates the Natural World and Natural Explanations

Communities engaged in scientific endeavors aim to build accurate, natural explanations for how the world works. In this context, the term natural refers to any element of the physical universe—whether made by humans or not. Thus, the natural world includes matter, the forces that act on matter, energy, distant galaxies, the constituents of the biological and physical world (e.g., the ozone layer), humans, human society, and the products of that society (e.g., CFCs). As described above, the goal of building explanations does not imply that, to be scientific, research must be without practical or economic end. Both applied research, which is undertaken with the goal of solving a particular problem, and pure or basic research, which has the primary aim of building scientific knowledge, improve our understanding of the natural world.

The investigation of supernatural explanations—that is, explanations that rely on forces beyond the observable universe—is not a part of modern science.²⁸ The exclusion of supernatural explanations from science was recognized in Kitzmiller v. Dover Area School District, in which parents of school-age children brought suit against their local school board over the requirements that Intelligent Design be taught in schools and that teachers read a disclaimer that cast evolution as shaky science.²⁹ The court found that Intelligent Design is not science in part because it “violates the centuries-old ground rules of science by invoking and permitting supernatural causation.”³⁰ The philosophical and historical roots of this exclusion are deep, but a simple, practical explanation clarifies this necessity: Science is able to build reliable knowledge because it is based on evidence and observations, and supernatural explanations are generally not testable against evidence, as explained below.

27. Hull 1988, supra note 11; Philip Kitcher, Believing Where We Cannot Prove, Abusing Science: The Case Against Creationism 30–54 (1983) [hereinafter Kitcher 1983]; and UCMP 2022, supra note 12.

28. Massimo Pigliucci & Maarten Boudry, Philosophy of Pseudoscience: Reconsidering the Demarcation Problem (2013).

29. 400 F. Supp. 2d 707 (M.D. Pa. 2005).

30. The court’s additional reasons for finding that Intelligent Design (ID) is not science were (1) that ID proponents falsely argue that any challenge to evolutionary theory necessarily supports ID and (2) that the scientific community has investigated and rejected ID proponents’ challenges to evolutionary theory.

Page 62 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Science Investigates Testable Hypotheses

Communities engaged in scientific endeavors work with testable hypotheses. For a hypothesis to be testable, it must, by itself or in conjunction with other hypotheses, generate specific predictions—a set of observations that one could expect to make if the hypothesis were true and/or a set of observations that would be inconsistent with the idea and lead one to believe that it is not true. If an explanation is equally compatible with all possible observations, then it is not testable and hence, not within the reach of science. This is frequently the case with supernatural explanations. For example, a central proposition of Intelligent Design that a complex anatomical structure (e.g., an eye) was designed by an intelligent, supernatural entity does not make any predictions about anatomy or distribution of the structure among species.³¹ We cannot know how or why a supernatural entity would design an eye in a particular way, so any observation we might make about the anatomical structure is compatible with this explanation. Evidence cannot be used to investigate such propositions, and hence, they are outside the realm of science.

The logic of testing hypotheses is illustrated in the central area of Figure 1. For example, the idea that smoking causes lung cancer is testable and leads to a wide variety of predictions: that rates of lung cancer will be higher among smokers than among a comparable group of nonsmokers; that lung cancer rates will increase in populations where smoking becomes common; that exposing non-human animals to compounds in cigarettes will lead to higher rates of cancer; and many more. Similarly, the hypothesis that CFCs deplete the ozone layer led to many predictions: that certain chemical reactions occur in the atmosphere; that the distribution of CFCs and other chemicals in the atmosphere will match the predictions of models; and that the ozone layer will thin as CFCs accumulate. The same logic applies to testing hypotheses much more limited in scope. For example, the narrow hypothesis that CFCs persist in the atmosphere can be tested simply by checking for them in places far from where they were produced.³² For an example drawn from the courtroom, consider litigation surrounding whether Thermos Co.’s trademark was infringed upon by another company’s use of the term thermos. The narrow hypothesis that thermos is perceived as a generic term for an insulated vacuum bottle led to the prediction that people who want to purchase an insulated vacuum bottle from a store would ask for a “thermos.”³³

In the context of testing hypotheses, the term prediction refers to a logical consequence of a hypothesis, not necessarily what will happen in the future. Scientific predictions often involve reexamining records of the past. For

31. Elliott Sober, What Is Wrong With Intelligent Design?, 82 Q. Rev. Biology 3–8 (2007), https://doi.org/10.1086/511656.

32. This hypothesis was tested and described as part of Lovelock et al. 1973, supra note 6, although performing that test was not the main intent of the investigation.

33. King-Seeley Thermos Co. v. Aladdin Indus., 321 F.2d 577 (2d Cir. 1963).

Page 63 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

example, key hypotheses about connections between greenhouse gases and Earth’s temperature, which have dramatic implications for life on Earth today, make predictions that are best tested, in part, by examining ice and oceanic sedimentary cores that reveal Earth’s deep history.³⁴

Science Responds to Evidence

Science builds knowledge based on evidence; this is how the community ensures that the explanations it generates approach accuracy. As a community, scientists actively seek evidence to test their hypotheses, and ultimately accept, give up, or modify hypotheses as warranted by the evidence. Just as Rule 702(d) directs courts to assess whether there is a disconnect between an expert’s methods and principles, and the conclusions reached, reliable scientific knowledge building depends on there being a logical connection between what the evidence shows and what explanatory hypotheses are accepted. Ignoring evidence is not an option in science. For example, when research revealed the hole in the ozone layer to be much larger than any of the models predicted, scientists rushed to explore additional explanations and modifications of the key hypothesis that would help make sense of this new evidence.³⁵

Evidence is not the only arbiter of whether a hypothesis is accepted by the scientific community. Other considerations, like broadness and coherence, also matter. Furthermore, the community’s response to evidence or engagement with testing a hypothesis may be slower or hindered if dogma or other social factors work against it, as described in the section titled “Achieving Scientific Consensus” below. But evidence is the most important determinant of acceptance in science. Explanations that are not supported by evidence are eventually rejected. Hypotheses that are protected from testing or that are allowed to be tested by only one group of investigators with a vested interest in the outcome are not a part of good science.

Science Does Not Prove Hypotheses

Because science responds to evidence, scientific knowledge is always open to refinement, revision, or even rejection if warranted. Philosophers of science call this property fallibilism.³⁶ No matter how much evidence supports or refutes it, a

34. Valérie Masson-Delmotte et al., Information from Paleoclimate Archives, in Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change 383–464 (2013).

35. E.g., Susan Solomon et al., On the Depletion of Antarctic Ozone, 321 Nature 755–58 (1986), https://doi.org/10.1038/321755a0.

36. Susan Haack, Inquiry and Advocacy, Fallibilism and Finality: Culture and Inference in Science and the Law, 2 L., Probability & Risk 205–14 (2003), https://doi.org/10.1093/lpr/2.3.205.

Page 64 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

hypothesis cannot be absolutely proven true or false.³⁷ Even widely accepted scientific explanations, like the link between CFCs and ozone depletion, could be rejected if new evidence or a better-supported hypothesis comes along. This has happened at many points in the history of science. For example, Newton’s classical mechanics is a powerful predictive tool that was relied upon by physicists for 200 years and is still used to generate reliable predictions in many situations. But when Einstein’s theory of relativity showed itself to be more closely aligned with the evidence in more situations than Newtonian mechanics, scientists accepted that this new framework was a broader and more powerful one. At a smaller scale, the idea that one misfolded protein can cause others to misfold and generate disease (i.e., the prion hypothesis) was once regarded as beyond far-fetched and rejected by biologists; however, as evidence accrued showing prions to be proteins responsible for transmissible diseases such as mad cow disease and scrapie, dogma was overridden and scientists accepted this expanded conception of proteins and disease.³⁸ As new scientific facts are established and insufficient ones are overturned, judges can expect these slow-moving ripples of change to play out at trial. For example, bitemark evidence may be increasingly excluded as research establishes its fundamental limitations.³⁹

Despite its tentative nature, accepted scientific knowledge is reliable. Such explanations generate predictions that hold true in many different contexts and at many different scales, allowing us to figure out how entities in the natural world are likely to behave and how we can harness that understanding to solve problems and dispense justice. For example, Molina and Rowland’s understanding of chemistry allowed them to predict that CFCs were damaging the ozone layer long before this was observed. Because of this scientific knowledge, we were able to make policy changes and ultimately reverse damage to the ozone layer. We have good reason to trust accepted scientific explanations: They allow us to make accurate predictions about the future and intervene on the present. Even Newtonian mechanics, though subsumed by relativity at an explanatory level, is still used at a practical level for a wide variety of applications, such as building bridges and predicting the movement of satellites and asteroids.

37. Formal proofs are central to mathematics, a field that provides critical tools for scientists but that is nonetheless distinct from scientific disciplines that rely on evidence from the natural world to assess hypotheses. Mathematical proofs accept a set of axioms as true and demonstrate that a particular conclusion is logically implied by those assumptions. While the sciences described here incorporate logic and inference, their reliance on evidence and observations to verify conclusions means that hypotheses are fallible in a way that mathematical statements are not.

38. Claudio Soto, Prion Hypothesis: The End of the Controversy?, 36 Trends in Biochemical Scis. 151–58 (2011), https://doi.org/10.1016/j.tibs.2010.11.001 [hereinafter Soto 2011].

39. See Valena E. Beety et al., Reference Guide on Forensic Feature Comparison Evidence, in this manual, for more on bitemark evidence.

Page 65 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Science Is Carried Out by a Community that Holds Members to Norms

Communities engaged in scientific endeavors generally adhere to a set of practical behavioral standards, known as norms, which can change over time. There is no one group tasked with ensuring adherence to scientific norms, but because they are recognized as important by a wide variety of scientific institutions, scientists who are found to violate them knowingly are likely to be impeded from publishing, receiving grants, and advancing their career (see section titled “Misconduct” below for more on this). The following list summarizes some important, current scientific norms:

Seek relevant information: Because science is cumulative and scientists are in the business of seeking broad explanations, scientists need to know what others before them have learned about a system to form hypotheses that cohere with the many lines of available evidence. Science rarely starts from scratch. For example, upon hearing how widespread CFCs were, Mario Molina first turned to the published work of other scientists to see what chemical processes might break down CFCs in the lower atmosphere before they reach the ozone layer.
Scrutinize ideas and evidence: In science, questioning methods and looking for other explanations for evidence does not necessarily signal that a hypothesis is wrong or a study was poorly executed. For example, the prion hypothesis was once viewed suspiciously by many biologists, who required particularly convincing evidence to accept an idea that seemed at odds with how proteins were known to behave.⁴⁰ Such skepticism is part of the normal practice of science and helps steer it in the direction of accuracy. Indeed, scientists are generally motivated to scrutinize closely even well-established ideas and evidence because of the esteem that the community awards those who overturn accepted science and contribute truly new and fruitful explanations.
Openly communicate findings: Scientists are expected to share their hypotheses, methods, and findings honestly and openly to allow others to evaluate and build on them. When Rowland and Molina uncovered evidence about the longevity of chlorine nitrate in the atmosphere that threatened their hypothesis, they published the evidence for others to evaluate.⁴¹ For examples of how violation of this norm can lead to or factor into litigation, see section titled “Misconduct” below.

40. Soto 2011, supra note 38.

41. F. Sherwood Rowland et al., Stratospheric Formation and Photolysis of Chlorine Nitrate, 80 J. Physical Chemistry 2711–13 (1976) [hereinafter Rowland et al. 1976].

Page 66 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Identify and avoid bias: Because scientists are human, it is impossible for them to be completely objective and unbiased in their work (see section titled “Science as a Human and Community Endeavor” below). However, scientific progress depends on scientists aspiring toward this ideal by trying to employ fair tests and methods, and to evaluate ideas on the merits of the evidence. For example, the scientists who first reported the hole in the ozone layer over Antarctica strove for objectivity in several ways. They got their data from carefully calibrated instruments, directed other scientists to information about how the machines were calibrated, and presented data with generous error bars to accommodate subsequent corrections to their observations.
Assimilate evidence: When faced with evidence contradicting their hypothesis, scientists may decide to conduct more tests, may revise or reject the idea, or may consider alternative ways to explain the evidence. But ultimately, scientific explanations are sustained by evidence and cannot be propped up—or denied—if the evidence is clear. For example, as evidence mounted in favor of the prion hypothesis, even skeptics were convinced.
Provide appropriate credit: Scientists are expected to cite their sources scrupulously. Credit is an important measure of impact within the scientific community. It also provides a sort of paper trail, through which another scientist can evaluate the methods, assumptions, and reasoning that a particular study is based on. Rowland and Molina’s original paper on the CFC/ozone link included 31 citations outlining where their ideas and evidence originated.⁴²
Abide by official ethical guidelines: Research institutions, funding agencies, publishers, scientific societies, and legislation regulate what scientific research can be done and how. In general, these guidelines are intended to ensure that scientific work is of high quality, is performed in ethical ways, and benefits society. Such guidelines vary depending on the subject, application, scope, and setting of research. For example, a survey-based study designed to contribute to generalizable knowledge about human perception will generally need to be approved by an institutional review board (IRB) tasked with protecting the interests and rights of survey participants before the research can commence.

When the work of scientific experts aligns with these norms, judges can be more confident that their testimony stems from reliable scientific methodologies.

42. Molina & Rowland 1974, supra note 7.

Page 67 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Nonscience, Pseudoscience, Bad Science, and Wrong Science

The characteristics above can help us identify research that is clearly scientific, but there are different ways for an endeavor to be unscientific. Some lines of inquiry are simply outside the realm of science and do not pretend to be otherwise. For example, science cannot make moral judgments. Science can help us assess what segment of the population finds euthanasia acceptable, understand how people reason about euthanasia as they near the end of their lives, and learn about the emotional effects of euthanasia,⁴³ but science cannot tell us if euthanasia is morally right or wrong.

Pseudoscience, on the other hand, employs the trappings of science but fails to meet several key scientific standards. For example, homeopathy is a type of alternative medicine that arose in the 18th century out of legitimate concerns about the efficacy of mainstream medicine at the time. It is based on the idea that “like cures like”—that ingesting a substance that causes certain bodily effects in a healthy person (e.g., chills) can be used to cure a disease that causes that symptom. Homeopathy uses Latin scientific names for the substances in its treatments, has adopted the vocabulary of mainstream medicine (e.g., vaccine), and often packages and labels products in ways that are reminiscent of mainstream pharmaceuticals—all of which might be interpreted as somehow scientific. However, many evidence-based tests of homeopathic remedies have been carried out and turned up no data suggesting that they perform better than placebos. Similarly, no evidence of a natural mechanism that might underlie the claims made by homeopaths has been found. Many panels, committees, and councils tasked with assessing the scientific consensus on homeopathy have concluded that it has no scientific basis and is, instead, pseudoscience.⁴⁴ In Allen v. Hyland’s, Inc., the plaintiffs argued that Hyland’s made false claims about the efficacy of its homeopathic remedies.⁴⁵ The defendants’ expert witnesses were allowed to testify, a decision that has been criticized as a failure of the court’s scientific gatekeeping function

43. For example, see Joachim Cohen et al., European Public Acceptance of Euthanasia: Socio-Demographic and Cultural Factors Associated with the Acceptance of Euthanasia in 33 European Countries, 63 Soc. Sci. & Med. 743 (2006), https://doi.org/10.1016/j.socscimed.2006.01.026; Ronit D. Leichtentritt & Kathryn D. Rettig, Meanings and Attitudes Toward End-of-life Preferences in Israel, 23 Death Stud. 323–58 (1999), https://doi.org/10.1080/074811899200993; and Nikkie B. Swarte et al., Effects of Euthanasia on the Bereaved Family and Friends: A Cross Sectional Study, 327 BMJ 189 (2003), https://doi.org/10.1136/bmj.327.7408.189.

44. For example, see National Health and Medical Research Council, NHMRC Information Paper: Evidence on the Effectiveness of Homeopathy for Treating Health Conditions (2015).

45. No. CV 12–1150 DMG (MANx) (C.D. Cal. Aug. 16, 2016); See Robert G. Knaier, Homeopathy on Trial: Allen v. Hyland’s, Inc. and a Failure of Evidentiary Gatekeeping, 57 Jurimetrics J. 361–96 (2017).

Page 68 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

under Daubert, and the jury found for the defendants, highlighting the challenges of detecting pseudoscience and assessing scientific consensus.

Then there is “bad” science: specific investigations that are within the realm of science (focus on natural explanations, work with testable ideas, etc.) but are unreliable because they have not been executed in ways that meet the community’s standards for scientific behavior—perhaps because the researchers selectively reported data and so did not openly and honestly communicate findings. Some analyses suggest that cases of bad science are on the rise because of systemic incentives that reward quantity of publications and certain types of results over quality.⁴⁶ A fuller framing of this concern is provided in the section below titled “Replication and the ‘Replication Crisis.’”

Unsurprisingly, bad science often leads to wrong science—that is, acceptance of hypotheses that are inaccurate. Even investigations that meet all the standards described above can wind up supporting incorrect explanations. But because science involves retesting hypotheses in different ways, contexts, and combinations, inaccurate ideas will ultimately be identified as such. This is part of the normal way in which science works, but it does take time, often many years, for scientists to sort through explanations and identify those that consistently lead to accurate predictions, new insights, and improved understanding.

Sometimes entire fields or lines of inquiry are denounced as not scientific or pseudoscientific, and such arguments may be leveraged at trial. However, as we’ve seen, science can address a vast array of topics. A finer-grained analysis may be required to assess which aspects of a body of knowledge are backed by scientific evidence and which are not. For example, acupuncture is a technique of traditional Chinese medicine used for more than 2,000 years that is based on the idea of Qi (or “Chi”), a type of life-giving energy that circulates through the body in special channels. No scientific evidence supports the existence of Qi or the channels through which it circulates. If we accept the traditional view that Qi is a mystical force, then this explanatory mechanism is supernatural and outside the realm of science. However, sound scientific investigations have found acupuncture to be useful for treating a variety of symptoms, and scientific research is beginning to investigate the biological basis of these effects.⁴⁷ At the same time, proponents sometimes overstate these findings, and practitioners may employ terminology and tools that patients associate with medicine (often for good reason), making it difficult to distinguish which aspects of acupuncture are supported by scientific research and which are pseudoscientific. However, labeling all of acupuncture as nonscience ignores reliable scientific evidence and knowledge.

46. For an overview of the concern, causes, and proposed solutions, see National Academies of Sciences, Engineering, and Medicine, Reproducibility and Replicability in Science (2019) [hereinafter NASEM].

47. For example, see Tony Y. Chon & Mark C. Lee, Acupuncture, 88 Mayo Clinic Proc. 1141–46 (2013), https://doi.org.10.1016/j.mayocp.2013.06.009.

Page 69 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Science as a Human and Community Endeavor

Science is carried out not by objective automatons but by people, shaped by their unique backgrounds and motivations, as well as by the changing norms and values of the societies in which they are embedded, all of which influence the course of science. The identities, backgrounds, and experiences of the practitioners of science impact their selection of research topics, hypotheses, chosen research methods, and interpretations of results and evidence. Such influences are apparent in a wide range of scientific fields but are famously exemplified by shifts in the field of primatology as more women entered the field in the 1970s. These scientists did not make the same assumptions that their male predecessors had made and so observed and investigated behaviors and interactions previously overlooked and undocumented, revealing, for example, that female primates have elaborate sex lives and manipulate male behavior.⁴⁸ Of course, the fact that scientists are fallible humans also means that they are vulnerable to bias and capable of misconduct, as described in the sections below.

The human practitioners and institutions of science likewise respond to events in and values of society around them in various ways. For example, the 1970s energy crisis led to a surge of alternative energy research and reorganizations of funding mechanisms for this research.⁴⁹ To cite an even broader

48. For a few examples of how scientists’ identities, backgrounds, and experiences impact their science, across several disciplines, including in the field of primatology, see Donna Haraway, Primate Visions: Gender, Race, and Nature in the World of Modern Science (1989); Andrew Gary Darwin Holmes, Research Positionality—A Consideration of Its Influence and Place in Qualitative Research—A New Researcher Guide, 8 Shanlax Int’l J. of Educ. 1–10 (2020), https://doi.org/10.34293/education.v8i4.3232; Rembrand Koning, Sampsa Samila, & John-Paul Ferguson, Who Do We Invent For? Patents by Women Focus More on Women’s Health, but Few Women Get to Invent, 372 Science 1345–48 (2021), https://doi.org/10.1126/science.aba6990; Diego Kozlowski et al., Intersectional Inequalities in Science, 119 Proc. of the Nat’l Acad. of Sci. USA e2113067119 (2022), https://doi.org/10.1073/pnas.2113067119 [hereinafter Kozlowski et al., 2022]; D. N. Lee, Taking it Personally, 311 Sci. Am. 47 (2014), https://doi.org/10.1038/scientificamerican1014-47; Douglas Medin, Carol D. Lee, & Megan Bang, Particular Points of View, 311 Sci. Am. 44–45 (2014), https://doi.org/10.1038/scientificamerican1014-44; H. Richard Milner, IV, Race, Culture, and Researcher Positionality: Working through Dangers Seen, Unseen, and Unforeseen, 36 Educ. Researcher 388–400 (2007), https://doi.org/10.3102/0013189X07309471; Mathias Wullum Nielsen et al., One and a Half Million Medical Papers Reveal a Link Between Author Gender and Attention to Gender and Sex Analysis, 1 Nature Hum. Behav. 791–96 (2017), https://doi.org/10.1038/s41562-017-0235-x; Cassidy R. Sugimoto et al., Factors Affecting Sex-Related Reporting in Medical Research: A Cross-Disciplinary Bibliometric Analysis, 393 Lancet 550–59 (2019), https://doi.org/10.1016/S0140-6736(18)32995-7; Caitlin Wilson, Gillian Janes, & Julia Williams, Identity, Positionality and Reflexivity: Relevance and Application to Research Paramedics, 7 Brit. Paramedic J. 43–49 (2022), https://doi.org/10.29045/14784726.2022.09.7.2.43.

49. Alice L. Buck, History of the Energy Research and Development Administration, No. DOE/ES-0001 USDOE Assistant Secretary for Management and Administration (1982).

Page 70 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

example, while the earliest examples of scientific peer review date back hundreds of years, scholarship has traced the rise of peer review as a central pillar of scientific publication and funding—as well as its perception as a critical hallmark of “good” science—to the Cold War. That conflict instigated an expansion of public monetary support for scientific research and, as society’s anxieties about American scientific competitiveness began to ease, led to a tension between public accountability and scientists’ personal desires for autonomy, respect, and financial support.⁵⁰ Individual scientists and the overarching institutions of science are shaped by society and, in turn, produce scientific knowledge that goes on to affect society through its applications and implications, creating the feedback loop between “benefits and outcomes” and other parts of the process of science illustrated in Figure 1. Both scientific institutions and scientists are deeply embedded within and interconnected with broader society.

Social interactions within the scientific community are critical to building scientific knowledge. We saw this in the CFC/ozone example as one researcher’s conference presentation sparked Molina and Rowland’s first investigation of CFCs, which, in turn, led to a cascade of studies by different research groups. Scientists often work collaboratively, and today more and more science is done by large research groups and multi-institution consortia, reflecting the increasing complexity of scientific questions being investigated and the specialized knowledge needed to carry out those investigations. The scientific community performs other key services as well. One of the most important of these is scrutinizing the evidence and ideas of other scientists.

Scientific scrutiny often involves the following activities, which help ensure the reliability of the scientific knowledge produced:

an evaluation of methods, considering potential biases and oversights,
reconsideration of the hypothesis and how to interpret the evidence relevant to it, and
appraisal of alternative hypotheses that may also explain the results.

Scientists engage in these activities at many times, including when serving as peer reviewers for a publication, when reviewing research for their own information, and through questions and discussions at conferences. In their published articles, scientists generally preserve an account of how they have applied this scrutiny to others’ work and to their own research described in the article. For example, a key paper that helped show that proteins are the infectious agents in prion diseases documents the careful and often laborious reasoning required

50. Melinda Baldwin, Scientific Autonomy, Public Accountability, and the Rise of “Peer Review” in the Cold War United States, 109 Isis 538–58 (2018) [hereinafter Baldwin 2018].

Page 71 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

to fully scrutinize a hypothesis and the relevant evidence.⁵¹ In that paper, the author critiques other researchers’ methods of purifying the infectious agent and explains why they might not be useful; considers fourteen different previously proposed alternative hypotheses regarding the nature of the infectious agent in prion diseases; outlines the many different lines of evidence relating to these hypotheses; considers modifications of these hypotheses that might still account for the available data; describes where the author’s findings differ from those of other investigators; explains how one set of data the author collected is consistent with multiple hypotheses; acknowledges when clearly relevant factors could not be taken into account in the author’s calculations; and describes where more research needs to be done because of known problems with certain assays. The application of such a high degree of scrutiny reflects the goal of producing accurate explanations for the natural world despite the fundamentally fallible nature of scientific knowledge. When research relevant to a trial has not yet been scrutinized by a community with the appropriate technical expertise, a judge may be placed in the position of providing or requesting this scrutiny.

Ultimately, through such scrutiny, scientists decide what ideas, results, and methods are worth building on and retesting in the same or some other context. Because scientists are unique, societally situated, and fallible humans, the scientific community can better perform the functions listed above, as well as many others, when scientists represent the diversity of the societies in which science is embedded.⁵² A diverse scientific community balances biases, investigates areas of inquiry relevant to all parts of society, and explores more innovative ideas for addressing the challenges it sets for itself.⁵³ Science is making progress toward equitable inclusion, albeit slowly.⁵⁴

51. Stanley B. Prusiner, Novel Proteinaceous Infectious Particles Cause Scrapie, 216 Science 136–44 (1982), https://doi.org/10.1126/science.6801762.

52. Here, the term diversity includes racial and gender identity, of course, but also many other facets of identity and background, including culture, religion, age, sexual orientation, disability, incarceration history, class, and more.

53. For evidence that diversity fosters the investigation of diverse questions of relevance to society, see Travis A. Hoppe et al., Topic Choice Contributes to the Lower Rate of NIH Awards to African-American/Black Scientists, 5 Sci. Advances 1–12 (2019), https://doi.org/10.2226/sciadv.aaw7238, and Kozlowski et al., 2022, supra note 48. For evidence that diversity fosters innovation, see Bas Hofstra et al., The Diversity-Innovation Paradox in Science, 117 PNAS 9284–91 (2020), https://doi.org/10.1073/pnas.1915378117. For more on the mechanisms behind this connection, see Patrick Grim et al., Diversity, Ability, and Expertise in Epistemic Communities, 86 Phil. Sci. 98–123 (2019), https://doi.org/10.1086/701070.

54. For more on changes in the diversity of scientific workforce in the U.S., see Pew Rsch. Ctr. STEM Jobs See Uneven Progress in Increasing Gender, Racial, and Ethnic Diversity (2021). For a global picture, see Fred Guterl, Where Are the Data?, 311 Sci. Am. 40–41 (2014).

Page 72 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Peer Review

Peer-reviewed journal articles are the most important way that scientists convey their ideas and evidence to the scientific community, offering them up for scrutiny. Indeed, in Kitzmiller v. Dover Area School District, the court found that Intelligent Design could not be considered an accepted scientific theory in part because its proponents had not published research in peer-reviewed journals. That is not to suggest that all testimony at trial must be based on peer-reviewed research, but that information required to be addressed in the science classroom should meet this minimum standard. Daubert, of course, includes peer review as one of five nonexclusive factors judges may consider in assessing the scientific validity of expert testimony.

In the process of peer review, editors at a journal, typically themselves scientists, receive a submission and first decide if it is appropriate for publication in that journal: whether it is sound enough, on topic, and of sufficient importance. If they determine that it is, they send it to other scientists with expertise on the topic of the article—the peer reviewers—who are not known to have a conflict of interest: for example, scientists who are not direct collaborators of the paper’s authors or identified by the authors as potentially vested in the publication decision. The peer reviewers may recommend rejection or acceptance of the paper outright or may request changes, starting a back-and-forth among the authors, reviewers, and editors until the editors make a final decision on acceptance or rejection. Peer review and publication are time-consuming, usually involving many months between submission and publication. The process can also be highly competitive. The journal Science accepted just 6.4% of the articles it received in 2021.

When the institution of peer review was adopted by the scientific community, it was not intended to distinguish credible science from that which is inaccurate or badly executed, though the process is now often viewed as serving that function.⁵⁵ Nevertheless, peer review is an important gatekeeper to the official record of science, ensuring that published works at least purport to meet minimum standards of quality and objectivity. In this way, it is reasonable to consider whether scientific knowledge that might impact a trial has been published in a peer-reviewed journal. However, peer-reviewed publication is a far from surefire indicator of reliable science. Here are some of the reasons.

Peer-reviewed journals vary in their standards and criteria for publication. Some are much more lax about methodological rigor than others, and often these differences are readily apparent only to members of the expert scientific communities themselves. Judges may be tempted to rely on a journal’s impact factor, a statistic that indicates how often articles

55. Baldwin 2018, supra note 50.

Page 73 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

published in that journal are cited and hence loosely conveys its prestige within a field, as an indication of methodological rigor. However, prestige is not a simple corollary of quality or correctness. A journal may have high standards for publication but a lower impact factor because of its topic or the type of articles it publishes (e.g., review articles are likely to garner more citations). Furthermore, journals can use various strategies to artificially boost their impact factor.⁵⁶ To understand the quality of a particular specialized journal, a judge may need to rely on experts within the field.
Most but not all of the material that is published in peer-reviewed journals is original research. In addition to research articles, such journals publish commentary, including editorials, that may or may not be peer reviewed and that may express opinions and perspectives not immediately derived from evidence, and identifying these commentaries as such can sometimes present a challenge to nonspecialists.
Recognizing the weight that peer-reviewed publications are given, sometimes regardless of the quality of the evidence they present, proponents of pseudoscientific or rejected lines of inquiry have begun to put out peer-reviewed journals that are oriented toward promoting a particular viewpoint, not objectively weighing different lines of evidence.
Peer reviewers usually cannot detect scientific fraud. For example, widely cited research investigating the cause of Alzheimer’s disease published in the prestigious peer-reviewed journal Nature in 2006 allegedly includes fraudulent, manipulated images.⁵⁷ When scientific fraud is discovered, it may result in retraction (see section titled “Correction and Retraction” below), but this process of discovery and resolution takes time. The validity of the 2006 Alzheimer’s disease paper’s results was not brought into question until 2021, when an unaffiliated scientist was commissioned to investigate the research by an attorney for investors.⁵⁸ Then in 2022, Nature’s editors attached a note to the publication cautioning readers that concerns had been raised and that an investigation was ongoing. The paper was retracted on June 24, 2024.
Many investigations that appear in peer-reviewed journals may reach an incorrect conclusion, which is revealed only when further testing is applied to the hypothesis. This is a normal part of the process of science.

56. Douglas N. Arnold & Kristine K. Fowler, Nefarious Numbers, 58 Notices Am. Mathematical Soc’y 434–37 (2011), https://doi.org/10.48550/arXiv.1010.0278.

57. Sylvain Lesné et al., A Specific Amyloid-β Protein Assembly in the Brain Impairs Memory, 440 Nature 352–57 (2006), https://doi.org/10.1038/nature04533 [hereinafter Lesné et al. 2006].

58. Charles Piller, Blots on a Field?, 377 Science 358–63 (2022), https://doi.org/10.1126/science.add9993.

Page 74 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Despite efforts to eliminate it, bias may influence the peer review of individual papers, as well as editors’ decisions about publishing those papers. Some have argued that the process may be conservative and favor established hypotheses and less innovative approaches.⁵⁹ Others have suggested that the process currently favors certain types of positive results, even if they turn out to be incorrect.⁶⁰ And studies have reported different findings in terms of whether historically oppressed groups experience bias in the peer-review process.⁶¹

Rowland and Molina’s CFC hypothesis was first published in Nature, which aims to publish original work that is of widespread interest and importance across science. Rowland and Molina’s work met both of these requirements and ultimately led to a widely accepted scientific hypothesis. One might think that if a journal like Nature publishes a paper, its hypotheses are more likely correct, at least in comparison with hypotheses published in less-competitive journals. However, because such journals require that the work they publish be of widespread interest and importance, in fact, the papers that appear in them may be more likely to concern “risky” hypotheses that make bold explanations challenging established ideas or that leverage new techniques still undergoing development. In addition, “everyday” science, which applies widely accepted methods in routine ways, is unlikely to warrant publication in preeminent journals. Journal prestige is thus not a clear indicator of scientific reliability. We saw that Rowland and Molina’s hypothesis was basically correct when it was published in Nature, but it still underwent many refinements and modifications afterward, particularly to the mechanisms involved in CFCs’ depletion of the ozone layer.

Some new research may be on track to undergo peer review but be so recent that this has not yet taken place. Such research is often shared in the form of preprints, fully drafted articles that include all the features of scientific journal articles but have not yet been accepted to a peer-reviewed journal for publication.⁶² Preprints are often shared via online repositories (often with names of the format arXiv, bioRxiv, medRxiv, etc.). Such repositories may accept or

59. Kyle Siler, Kirby Lee, & Lisa Bero, Measuring the Effectiveness of Scientific Gatekeeping, 112 PNAS 360–65 (2014), https://doi.org/10.1073/pnas.1418218112.

60. For example, see Paul E. Smaldino & Richard McElreath, The Natural Selection of Bad Science, 3 Royal Soc’y Open Sci. 160384 (2016), https://dx.doi.org/10.1098/rsos.160384.

61. For a few examples, see Donna K. Ginther et al., Race, Ethnicity, and NIH Research Awards, 333 Science 1015–19 (2012), https://doi.org/10.1126/science.119783; Markus Helmer et al., Gender Bias in Scholarly Peer Review, 6 eLife e21718 (2017), https://doi.org/10.7443/eLife.21718; Patrick S. Forscher et al., Little Race or Gender Bias in an Experiment of Initial Review of NIH R01 Grant Proposals, 3 Nature Hum. Behav. 257–64 (2019), https://doi.org/10.1038/s41562-018-0517-y; and Flaminio Squazzoni et al., Peer Review and Gender Bias: A Study on 145 Scholarly Journals, 7 Sci. Advances 1–11 (2021), https://doi.org/10.1126/sciadv.abd0299.

62. See Oya Y. Rieger, Preprints in the Spotlight: Establishing Best Practices, Building Trust (2020), https://doi.org/10.18665/sr.313288.

Page 75 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

reject submissions, but the process behind these determinations may not be disclosed and is not standardized in the way that peer review typically is. Most investigators who submit to preprint repositories intend for the work to undergo peer review and publication eventually but may not be successful at that. The fact that an article is called a preprint does not mean that publication is imminent. Preprints are beginning to make their way into the courts. For example, in a recent case, an incarcerated person moved to be granted compassionate release based on the risks he faced if he were to contract Covid-19.⁶³ The court denied the motion in part because the plaintiff had been vaccinated, mitigating his risk, and cited several lines of evidence supporting this view, including a preprint examining the efficacy of the Moderna vaccine against different strains of SARS-CoV-2.⁶⁴

Furthermore, some technical research relevant to trials, particularly that undertaken for the purpose of litigation and that conducted within industry to support functions of narrow interest, will not have had the opportunity to undergo peer review because of its provenance and intent. As addressed in The Admissibility of Expert Testimony in this manual, the fact that research has not undergone peer review does not make testimony based on it inadmissible; however, it may place a higher burden on the experts to explain their methods—and on the judge to assess the validity of this explanation.

Bias

As described above, scientists hold each other to a set of norms that help science build accurate explanations. However, scientists are human beings and, while they strive toward objectivity, they also bring their unique backgrounds and experiences to bear on their work. This invigorates problem solving, sparks new ideas, and benefits science in many ways. But it also means that the unconscious and conscious biases we all hold can influence the course of science and that scientists may interpret the same data in different ways.⁶⁵ The norms of scientific behavior, for example the expectations of scrutiny and open communication, as well as processes involving the scientific community like the institution of peer review, serve the function of helping identify and correct biases. Scientific objectivity thus lies in a diverse community of inquirers, not individual scientists.

63. United States v. Singh, 525 F. Supp. 3d 543 (M.D. Pa. 2021).

64. Kai Wu et al., mRNA-1273 Vaccine Induces Neutralizing Antibodies Against Spike Mutants from Global SARS-CoV-2 Variants, bioRxiv https://perma.cc/SU3B-9ZCZ.

65. Heather E. Douglas, Science, Policy, and the Value-Free Ideal (2009), https://doi.org/10.2307/j.ctt6wrc78; Hugh Lacey, Is Science Value Free? (2004), https://doi.org/10.4324/9780203983195; and Miriam Solomon, Social Empiricism (2007).

Page 76 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

In examining the methodologies and conclusions of expert testimony, the source of funding for the research is a valid consideration, though not in and of itself a reason to exclude testimony. Much evidence suggests that funding source is correlated with study outcome.⁶⁶ For example, in pharmaceuticals, there is a strong tendency for industry-sponsored trials to favor the industry’s product.⁶⁷ Several possible, non–mutually exclusive explanations may underlie this general effect, and they are not all nefarious—e.g., perhaps drug companies mainly fund clinical trials when internal research already strongly supports a compound’s efficacy. However, biased study design or biased interpretation directly introduced by the funder or by a potentially unconscious sense of quid pro quo on the part of the researcher is also a possibility.

Many potential sources of bias can be subtle and/or hard to detect. Within this manual, the reference guides on forensic science, statistics and research methods, survey research, and epidemiology outline many ways that methods and analytic techniques can subtly bias research outcomes and, hence, testimony. And of course, research sponsored by a defendant or plaintiff that fails to support the sponsor’s interest or that turns out to support the interests of the other party may simply not be introduced at trial.⁶⁸ In other cases, the influence of sponsorship is obvious. For example, after Molina and Rowland’s hypothesis began to gain traction, a CFC-industry-backed group sponsored a speaking tour by a meteorologist who considered the hypothesis “utter nonsense”—an approach that seemed to be recognized by the media for what it was, a public relations stunt.⁶⁹

That is not to suggest that government- or nongovernmental organization (NGO)-sponsored research is necessarily free from bias. Individuals, academic

66. For reviews, see Sheldon Krimsky, Do Financial Conflicts of Interest Bias Research? An Inquiry into the “Funding Effect” Hypothesis, 38 Sci., Tech., & Hum. Values, 566–87 (2012), https://doi.org/10.1177/0162243912456271; and David B. Resnik & Kevin C. Elliott, Taking Financial Relationships into Account When Assessing Research, 20 Accountability Rsch. Policies & Quality Assurance, 184–205 (2013), https://doi.org/10.1080/08989621.2013.788383.

67. Sergio Sismondo, Pharmaceutical Company Funding and Its Consequences: A Qualitative Systematic Review, 29 Contemp. Clinical Trials 109–13 (2008), https://doi.org/10.1016/j.cct.2007.08.001.

68. On the other hand, failing to conduct or report on a study that a litigant could have conducted may be viewed with suspicion, as noted in Eagle Snacks, Inc. v. Nabisco Brands, Inc., where the court observed that “failure of a trademark owner to run a survey to support its claims of brand significance and/or likelihood of confusion, where it has the financial means of doing so, may give rise to the inference that the contents of the survey would be unfavorable. . . .” 625 F. Supp. 571 (D.N.J. 1985).

69. The Council on Atmospheric Sciences sponsored the tour by British scientist Richard Scorer. Lydia Dotto & Harold Schiff, The Ozone War (1978) [hereinafter Dotto & Schiff 1978]; Edward A. Parson, Protecting the Ozone Layer: Science and Strategy (2003) [hereinafter Parson 2003]; and Walter Sullivan, Scientist Doubts Spray Cans Imperil Ozone Layer, N.Y. Times (July 8, 1975).

Page 77 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

institutions, and other organizations may also have conflicts of interest (COIs) that are not immediately apparent—for example, through stock ownership. Government-sponsored researchers may feel motivated to secure their next grant with a result that will impress or that aligns with an agency’s interests or policy positions. In an example that goes beyond mere bias, the potentially fraudulent Alzheimer’s research described above was funded by a government agency, the National Institutes of Health, and was carried out by researchers mainly associated with universities.⁷⁰ Nevertheless, at the time of publication of that paper, two of the authors were listed as inventors on a patent application stemming from the research submitted by the public university that employs them. Clearly, receiving sponsorship from a government agency and being situated in an academic setting is no guarantee that research is untainted. All research is potentially influenced by bias, and every funder of research has the potential to introduce a source of bias. Evidence has highlighted the strength of this connection when it comes to industry-sponsored research. For this reason, disclosure of funding sources and COIs is generally a requirement of peer-reviewed publication. Molina and Rowland, for example, noted in their original paper that it was funded by the U.S. Atomic Energy Commission.⁷¹ Some scientific institutions have COI policies designed to discourage research that they view as more likely to be biased, and many have COI-disclosure policies designed to encourage scientists to take sponsorship into account when evaluating each other’s work. An examination of an investigation’s funding source may help judges be alert to potential biases and assess their likely direction.

Misconduct

Beyond bias, scientific misconduct can occur when a scientist doesn’t fairly evaluate other scientists’ work, doesn’t honestly report methods and results, doesn’t fairly assign credit, or doesn’t work within the ethical guidelines of the community.⁷² These deceitful practices are penalized and discouraged in various ways, including with career curtailment, funding bans, fines, and potentially prison time. Scientific misconduct, as well as honest mistakes, can lead to the acceptance of inaccurate hypotheses. Such errors will be corrected as ideas are reexamined in different contexts, including in the course of a trial. For example, in Schueneman v. Arena Pharms., Inc.,⁷³ the court found that a pharmaceutical company had engaged in misrepresentation when it claimed that animal studies showed its

70. Lesné et al., 2006, supra note 57.

71. Molina & Rowland 1974, supra note 7.

72. For more, see Charles Gross, Scientific Misconduct, 67 Ann. Rev. of Psych. 693–711 (2016), https://doi.org/10.1146/annurev-psych-122414-033437.

73. 840 F.3d 698 (9th Cir. 2016).

Page 78 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

drug to be noncarcinogenic, when in fact the company was conducting a study that linked the drug to cancer in lab rats, a case in which scientific results were not honestly reported. Another well-known example of scientific misconduct on trial involves Takata Corporation submitting fraudulent data on the performance of its airbag inflators to automakers. Takata pled guilty to wire fraud in 2017, nearly 20 years after its internal research had shown that its inflators did not perform to required specifications.⁷⁴

Scientific misconduct can be difficult to detect; there are no simple red flags that a judge could identify that would indicate misconduct.⁷⁵ For example, misconduct may be reported by someone with firsthand knowledge of the environment in the research group, it may be detected by applying specialized tools for image analysis to publications, or it may be detected by subject matter experts who notice a discrepancy or anomaly in reported data or cannot replicate a result. An allegation of misconduct will generally lead to an extensive investigation and may take many years to resolve and correct. A judge or jury might justifiably be concerned that a scientist who has engaged in documented misconduct may be an unreliable source of expert testimony.

Correction and Retraction

After peer review and publication, a scientific paper may be corrected if a small error that does not invalidate the main findings of the paper is caught—for example, an incorrect row in a data table, missing text, or a missing citation. In such cases, a publisher will generally update the version of record available online and associate a correction notice with the document so that readers have a record of what changes have been made and why. Guidelines and policies for error corrections may vary across journals and are generally intended to maintain the validity of the scientific record, though it has been argued that journals and editors may not have made sufficient efforts in this regard.⁷⁶ The fact that a paper has had a correction made should not be taken to indicate that the study is of poor quality. Instead, it suggests that the article has received additional scrutiny after publication and that the authors and publishers have made public efforts to address concerns

74. In re Takata Airbag Prods. Liab. Litig., 193 F. Supp. 3d 1324 (S.D. Fla. 2016).

75. The Office of Research Integrity of the U.S. Department of Health and Human Services has identified five red flags of research misconduct, but all are observations that would only be made by other scientists working in the same field or by other workers with a professional relationship to the scientist suspected of misconduct. Office of Research Integrity, Possible Red Flags of Research Misconduct (2016), https://perma.cc/JA9J-XKTE.

76. Lonni Besançon et al., Correction of Scientific Literature: Too Little, Too Late!, 20 PLoS Biology e3001572 (2022), https://doi.org/10.1371/journal.pbio.3001572; Ambar Castillo, Mistakes Happen in Research Papers. But Corrections Often Don’t, STAT (Jan. 10, 2023), https://perma.cc/TQX7-3YXW.

Page 79 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

raised by that scrutiny, which can be viewed as a strength. Corrections in scientific papers are rare and have remained relatively constant since the 1970s.⁷⁷

On the other hand, if evidence arises that the entire basis of a published study is invalid, the paper may be retracted. Articles may be retracted because of a significant flaw that changes the conclusions of the paper (e.g., faulty equipment used to collect data or an error in code used to analyze the data) or misconduct of many varieties—plagiarism, fraudulent data, ethics violations, or manipulations of the peer-review process.⁷⁸ Theoretically, retraction means removal from the scientific record. In practice, however, the article is likely to remain available in libraries and online tagged with a notice of retraction, meant to signal to readers that the journal no longer backs the content of that article. One reason these papers remain available is because of the cumulative nature of science: Researchers may need to review retracted papers to see if the errors within them have propagated into other work. Because such articles remain accessible, retracted research may continue to garner citations improperly.⁷⁹ While the number and frequency of retractions from scientific journals has been on the rise, multiple analyses suggest that this is largely due to improved oversight and new tools to detect plagiarism and image manipulation.⁸⁰ Studies are retracted because they do not emerge from sound scientific methodologies, and thus any science that has been retracted should be disallowed in expert witness testimony.

Within science, retractions may be viewed as a blot on one’s record of scholarship, and the scientist behind a retracted study may experience stigma from the scientific community.⁸¹ Should the same be true in the courtroom? Retraction is not the same as scientific misconduct, in which there is an intent to deceive. While scientific misconduct is the most frequent cause of retraction, at least in the life sciences, it is by no means the only cause.⁸² The circumstances that can lead to

77. Daniele Fanelli, Why Growing Retractions Are (Mostly) a Good Sign, 10 PLoS Med. e1001563 (2013), https://doi.org/10.1371/journal.pmed.1001563 [hereinafter Fanelli 2013].

78. William Bülow, et al., Why Unethical Papers Should be Retracted, 47 J. Med. Ethics (2021), https://doi.org/10.1136/medethics-2020-106140.

79. Tzu-Kun Hsiao & Jodi Schneider, Continued Use of Retracted Papers: Temporal Trends in Citations and (Lack of) Awareness of Retractions Shown in Citation Contexts in Biomedicine, 2 Quantitative Sci. Stud. 1144–69 (2022), https://doi.org/10.1162/qss_a_00155.

80. Jeffrey Brainard & Jia You, What a Massive Database of Retracted Papers Reveals About Science Publishing’s ‘Death Penalty’, Science (Oct. 25, 2018), https://doi.org/10.1126/science.aav8384; Fanelli 2013, supra note 77; and Ferric C. Fang, R. Grant Steen & Arturo Casadevall, Misconduct Accounts for the Majority of Retracted Scientific Publications, PNAS Early Edition (2012) [hereinafter Fang et al. 2012].

81. This stigma may be detrimental to scientific knowledge building because it discourages the reporting of honest errors; for example, see Jaime A. Teixeira da Silva & Aceil Al-Khatib, Ending the Retraction Stigma: Encouraging the Reporting of Errors in the Biomedical Record, 17 Rsch. Ethics 251–59 (2021), https://doi.org/10.1177/1747016118802970.

82. Fang et al. 2012, supra note 80.

Page 80 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

retraction include honest mistakes (which may initially be known only to the researcher who made them), reported out of respect for the knowledge-building function of science. In such cases, acknowledgment of the error might reasonably be perceived as a sign of integrity and commitment to developing accurate scientific explanations. For example, in 2006, pharmaceutical scientist Geoffrey Chang and colleagues retracted five papers that relied upon a flawed computer program, a mistake they owned up to and corrected.⁸³ Chang then continued his research, making valuable contributions to our understanding of microbial resistance, malaria, and crop development, with no indications that his new findings should be discounted because of a previous error. However, a pattern of retracted research stemming from a single researcher or group and tied to allegations of misconduct or bad science can suggest that those practitioners are not adhering to the norms of the scientific community, making their unretracted research suspect as well.

Retraction does not occur if the hypothesis a study supports is simply found to be incorrect by subsequent research. The methods and data of that study may still hold valuable insights for researchers, who can go back to it to try to understand why a different conclusion was reached in that case. This is a normal part of the self-correcting process of science.

Scientific Expertise

Judges may need to evaluate the qualifications and expertise of individual scientists to help determine whether to admit scientific testimony. On what basis can this judgment be made? Scientific experts are respected and active participants in their scientific communities, usually contributing to knowledge development as researchers and certainly scrutinizing the work of others as consumers of the research literature. Just as there is no single trait that separates science from nonscience, there is no single necessary and sufficient condition that would qualify a scientist as an expert on a particular topic in the eyes of other scientists. Instead, judges may evaluate a cluster of traits that a scientist is likely to attain in the process of contributing to a field and participating in the scientific community. These include:

holding an advanced degree, usually a Ph.D., in a field closely related to the area of purported expertise,
holding a position in which reviewing the latest research on the topic is required (e.g., having a specialized medical practice, an editorial position at a scientific journal, or a research career),

83. Greg Miller, A Scientist’s Nightmare: Software Problem Leads to Five Retractions, 314 Science 1856–57 (2006), https://doi.org/10.1126/science.314.5807.1856.

Page 81 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

holding leadership positions in relevant scientific societies or institutions, or having other markers of professional esteem by the community,
having published peer-reviewed research on the topic, and
developing a track record, often measured in citation metrics, of publications that are well respected by colleagues and contribute to knowledge development.

Citation metrics, or citation impacts, are statistics that indicate how often a particular academic work, author, or journal has been cited. These metrics range from simple tallies to more complex statistics, such as the h-index, which incorporates information about the distribution of citations across an author’s publication record.

Lacking a few of the traits in the list above does not disqualify an individual as a scientific expert, but lacking all or most of them should lead one to question whether the proposed expert is indeed an active, expert member of the relevant scientific community. Sherwood Rowland and Mario Molina had accumulated all of these markers of expertise by the end of their careers, of course, but Molina was an early career scientist at the time the CFC/ozone hypothesis was published, with only a few publications and a Ph.D. Had he been proposed as an expert witness on the ozone layer at that time, a judge might reasonably have questioned his expertise.

It is also worth evaluating the closeness of a scientist’s disciplinary expertise to the scientific topic on which expert testimony will be delivered. Doctors and honorifics abound, and promoters of pseudoscientific ideas or hypotheses with little evidence supporting them may leverage achievements in other areas to create the impression of relevant expertise. For example, Richard Scorer, a well-known scientist, was promoted as an expert on the CFC/ozone hypothesis by an industry-backed group aiming to discredit the hypothesis.⁸⁴ Scorer had a Ph.D., had received scientific honors, and had conducted research on air pollution; however, he had no expertise in atmospheric chemistry or the ozone layer and had published no peer-reviewed articles on this topic. Scorer was, in fact, an expert in some related disciplines, but lacked the specialized knowledge needed to fairly evaluate the CFC/ozone hypothesis in light of the available evidence and research. The problem of scientists with legitimate expertise in one field weighing in on a scientific question outside their area of expertise is a pernicious one that has affected public acceptance of science and policy on issues such as climate change and tobacco exposure.⁸⁵ This scenario may play out in courts as judges identify and disallow testimony from scientists whose expertise is legitimate but not relevant to the specific scientific issue at hand. Many of the later reference

84. Dotto & Schiff 1978, supra note 69; Parson 2003, supra note 69.

85. Oreskes & Conway 2010, supra note 2.

Page 82 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

guides in this manual provide guidance on qualifications for experts in particular disciplines.

Testing Hypotheses

At the most basic level, scientists test their ideas by figuring out what predictions are generated by a hypothesis and comparing those to evidence. Hypotheses are supported when the evidence matches predictions and are contradicted when they do not match.⁸⁶ Observations are often made with the help of sophisticated tools and may be analyzed with statistical techniques to discern patterns. Other observations are entirely qualitative and nonnumeric, taking the form of case studies or interviews, for example. Here, we review some overarching approaches to testing hypotheses that cut across all disciplines and address common concerns relating to the interpretation of data from such tests.

Scientific Methodologies and Data

The methodologies used in different scientific disciplines are so varied that it might seem a challenge to identify any fundamental similarities. How can it be that investigations as diverse as measuring ozone levels over Antarctica, exposing mice to cigarette smoke tars, and surveying consumers about their perception of the word thermos all serve the same fundamental purpose of testing hypotheses? In fact, scientific methods used to test hypotheses can be grouped in a few common ways, and understanding these can help in evaluating the strengths of individual studies. These attributes include:

whether or not the research involves an intentional manipulation of condition (i.e., Is the research experimental or nonexperimental in nature?)
what types of data the investigation produces (i.e., Did the research generate quantitative data, qualitative data, or both?)
whether or not the hypotheses being examined are reflected in a mathematical model.

There are no right, wrong, better, or worse answers to the above questions. Instead, these attributes help determine the types of hypotheses and approaches best suited to gather evidence, as described below. Note that these characteristics vary relatively independently of each other and of scientific discipline: Experimental or nonexperimental methods can be used to collect quantitative or

86. Michael Strevens, The Knowledge Machine: How Irrationality Created Modern Science (2020).

Page 83 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

qualitative data, with or without incorporating modeling, and these methodologies are applied broadly across the many topics that science investigates. Later reference guides in this manual explain how these methodological approaches are used in various disciplines.

Experimental and Nonexperimental Methods

An experiment involves intentionally manipulating some factor in a system to learn how that affects the outcome.⁸⁷ A controlled experiment seeks to keep variables other than the experimental manipulation the same in order to isolate the cause of any change. A control group, on the other hand, is a comparison condition or group that does not receive the experimental manipulation to increase confidence that the change observed in the experimental group is caused by the manipulation and not some other factor. For example, experiments examining whether the drug Bendectin causes birth defects, the basis of the Daubert litigation, compared experimental groups of animals given Bendectin during pregnancy to control groups of animals not given the drug.⁸⁸ Experiments can be performed in a lab or in a natural setting. For example, economic decision making may be investigated by bringing participants into a psychology lab and observing them playing a game, or by doing a field experiment at a market or swap meet.⁸⁹

A randomized controlled trial is an experimental approach that randomly assigns participants to the experimental or control group to better control variables across the groups. It is most commonly used in medical trials, but can be applied more broadly as well, for example to examine the effectiveness of a kindergarten curriculum at reducing bullying.⁹⁰ Randomized controlled trials can be very convincing and are often perceived as the gold standard in medicine.⁹¹ However, they are just one line of evidence among many.

87. Nancy Cartwright, Nature’s Capacities and Their Measurement (1989).

88. For an example, see Rochelle W. Tyl et al., Developmental Toxicity Evaluation of Bendectin in CD Rats, 37 Teratology 539 (1988), https://doi.org/10.1002/tera.1420370603. For an overview, see Joseph Sanders, From Science to Evidence: The Testimony on Causation in the Bendectin Cases, 46 Stanford L. Rev. 1–86 (1993), https://doi.org/10.2307/1229235.

89. For example, see Werner Güth & Oliver Kirchkamp, Will You Accept Without Knowing What? The Yes-No Game in the Newspaper and in the Lab, 15 Experimental Econ. 656–66 (2012), https://doi.org/10.1007/s10683-012-9319-7; and Steven D. Levitt & John A. List, Field Experiments in Economics: The Past, the Present, and the Future, 53 Eur. Econ. Rev. 1–18 (2009), https://doi.org/10.1016/j.euroecorev.2008.12.001.

90. For example, see Adele Diamond et al., Randomized Control Trial of Tools of the Mind: Marked Benefits to Kindergarten Children and Their Teachers, 14 PLoS ONE e0222447 (2019), https://doi.org/10.1371/journal.pone.0222447.

91. For example, see In re Bextra & Celebrex Mktg. Sales Practices & Prod. Liab. Litig., 524 F. Supp. 2d 1166 (N.D. Cal. 2007).

Page 84 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Controlled experiments are often misperceived as the pinnacle of scientific evidence, with all other forms of testing ranked as substandard in some way. Experiments are a useful way to collect evidence, but many important questions cannot be addressed through experiment. Some aspects of the natural world aren’t manipulable (e.g., distant stars), and hence can’t be studied with direct experiments. In other cases, it would be unethical to perform a direct experiment on the study system—for example, randomly assigning some pregnant people to receive Bendectin to assess its safety or intentionally introducing a pollutant to a sensitive ecosystem to determine its effect.

Nonexperimental methods, which involve no intentional manipulation of condition, are often critical in gathering evidence to evaluate important hypotheses. These methods still allow us to generate predictions from a hypothesis and make observations to test those predictions. For example, we can’t experiment on stars to test hypotheses about which elements occur within them, but we can test those ideas by figuring out what wavelengths of light the presence of different elements would generate and monitoring the stars’ emission spectra.⁹²

Nonexperimental research describes or documents a phenomenon. We might imagine a biologist sketching a bacterium viewed through a microscope, but of course, complex machinery, custom-designed instruments, and detailed measurements may also be involved. Analyses to test drinking water for contaminants, detect radiation coming from deep space, or measure consumer perception of a particular brand (e.g., surveys to support trademark litigation) fall into this category. Often nonexperimental studies involve making some sort of comparison among observations to detect patterns and associations. For example, a key study supporting the CFC/ozone link documented ozone measurements over Antarctica and compared results obtained between 1957 and 1984, showing a pattern of dramatic decline.⁹³ This key piece of convincing evidence was not obtained through an experiment, but through observation and comparison over time. Similarly, many nonexperimental epidemiologic studies informed the Daubert litigation, some that compared birth outcomes for mothers who had taken Bendectin to those who did not (cohort studies) and some that compared Bendectin exposure between those with a birth defect and those without (case-control studies).

Some effects we might wish to study experimentally cannot be examined in that context because of ethical or logistical considerations. However, scientists can leverage situations in which such changes occur naturally, without an intentional manipulation on the part of the researcher, to gather relevant data in a nonexperimental setting. For example, studying the connection between years

92. For example, see Noriyuko Matsunaga et al., Identification of Absorption Lives of Heavy Metals in the Wavelength Range 0.97-1.32 μm, 246 Astrophysical J. Supp. Series 10 (2020), https://doi.org/10.3847/1538-4365/ab5c25.

93. Farman et al. 1985, supra note 8.

Page 85 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

of parental schooling and children’s well-being presents nearly insurmountable experimental challenges. However, researchers have taken advantage of variation in the rollout of government educational programs and policies to find comparison groups that mimic the desired experimental setup, contrasting outcomes from children whose parents received extra years of schooling with those from closely matched communities that received the program later.⁹⁴ More recently, in another comparative, nonexperimental study, researchers used data from school districts that lifted mask mandates at different times to examine the on-the-ground effect of masking on transmission of SARS-CoV-2.⁹⁵

Many explanations can be tested through both experimental and nonexperimental methods. Each offers advantages. A laboratory experiment may allow a researcher to control many factors and make a strong case for causality; however, such experiments may not be able to closely mimic the complex environment outside the laboratory and so are vulnerable to the argument that the causal mechanism at work in the experimental setup is not important where it matters most—in the messy real-world conditions found in places like hospitals, schools, and natural ecosystems. On the other hand, a nonexperimental, comparative study of a field setting may be clearly grounded in those real-world conditions but leave open questions about confounding causal factors. For example, medical studies using nonhuman animal models may provide convincing evidence of the carcinogenic effects of a substance in the model organism but not necessarily people, while epidemiologic evidence of cancer rates in people are clearly relevant to human cancers but may be consistent with multiple hypotheses as to the cause of the cancer (see the reference guides on toxicology and epidemiology, in this manual, for further considerations). The nonexperimental masking study referenced above addressed some complaints about lab-based mask research not reflecting real-world variation in the types of masks worn in schools and the consistency with which they are worn, but the study was nevertheless criticized by opponents for not being a randomized controlled trial.⁹⁶

Whenever possible, triangulating across a body of evidence based on multiple methodologies is the best approach to evaluating the scientific support for a hypothesis. The scientific community does not fully exclude consideration of a particular type of evidence relevant to a hypothesis unless it is clearly compromised methodologically or there are other a priori reasons for thinking it

94. Shin-Y Chou et al., Parental Education and Child Health: Evidence from a Natural Experiment in Taiwan, 2 Am. Econ. J.: Applied Econ. 63–91 (2010), https://doi.org/10.1257/app.2.1.33. For more on the complex statistical models that can be used to analyze data from studies such as this one, see David H. Kaye and Hal S. Stern, Reference Guide on Statistics and Research Methods, in this manual.

95. Tori L. Cowger et al., Lifting Universal Masking in Schools—Covid-19 Incidence among Students and Staff, 387 New Eng. J. Med. 1935–46 (2022), https://doi.org/10.1056/NEJMoa2211029.

96. Roni Caryn Rabin, Masks Cut Covid Spread in Schools, Study Finds, N.Y. Times (Nov. 10, 2022), https://perma.cc/JVJ5-LUTX.

Page 86 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

irrelevant. This same issue, assessing the value of the many lines of evidence relevant to a particular question (e.g., if and how nonexperimental epidemiologic studies and experimental animal studies should inform deliberation in toxic tort litigation), has presented challenges for the courts and led to decisions about admissibility that have generated controversy.⁹⁷ In science, the available evidence (some of which may come from other research programs not designed to test the hypothesis under consideration) is evaluated as a body, along with the strengths, weaknesses, and caveats relating to each type of data, an approach which, some scholars have argued, the judiciary has not always followed.⁹⁸ When a particular experimental or nonexperimental methodology has been applied to a question multiple times, special analytic techniques like meta-analyses may be used to interpret this body of evidence (see section titled “Systematic Reviews and Meta-analyses” below).

Within scientific methodologies, blinding generally refers to concealment of comparison group membership so that this knowledge does not bias the study outcome. In a double-blind clinical trial, neither the researcher nor the participants know which group is receiving a placebo and which is receiving the experimental intervention until after data collection is complete. Triple blinding refers to additionally concealing this information from the data analysts. In non-comparative studies, blinding may refer to concealment of the purpose or sponsor of the research from participants and those implementing the protocol (see the Reference Guide on Survey Research, in this manual). Blinding can be helpful in many investigations, experimental or not, in which human perception might tip the outcome in a particular direction—for example, in handwriting analysis and in proficiency testing to determine a lab’s ability to perform a particular

97. For an overview, see Margaret A. Berger, What Has a Decade of Daubert Wrought?, 95 Supp. 1 Am. J. Pub. Health s59–s65 (2005) [hereinafter Berger 2005]. For a small sample of discussion relevant to this issue, see Gary Edmond & David Mercer, Litigation Life: Law-Science Knowledge Construction in (Bendectin) Mass Toxic Tort Litigation, 30 Soc. Stud. Sci. 265–316 (2000); Victoria Sutton & Brie DeBusk Sherwin, Toxicological Animal Studies Disparate Treatment as Scientific Evidence, 2 J. Animal & Env’t L. (2011), https://perma.cc/V2U3-SXKV; and Raymond Richard Neutra, Carl F. Cranor, & David Gee, The Use and Misuse of Bradford Hill in U.S. Tort Law, 58 Jurimetrics 127–62 (2018) [hereinafter Neutra et al. 2018]. For argument on the opposing side, see, for example, Dije Ndreu Comment, Keeping Bad Science Out of the Courtroom: Why Post-Daubert Courts Are Correct in Excluding Opinions Based on Animal Studies from Birth-Defects Cases, 36 Golden Gate U. L. Rev. 459 (2006); and Amanda Hungerford Note, Back to Basics: Courts’ Treatment of Agency Animal Studies after Daubert, 110 Colum. L. Rev. 70–113 (2010).

98. Some scholars have raised concerns that the courts have on occasion unfairly dismissed numerous individual lines of evidence as being flawed or insufficiently conclusive and concluded that evidence is lacking, when in fact the body of evidence, taken as a whole, points to a clear conclusion. For more, see discussion of Milward v. Acuity Specialty Products Group, Inc.; see also Liesa L. Richter & Daniel J. Capra, The Admissibility of Expert Testimony, in this manual; Berger 2005, supra note 97; and Steve C. Gold, A Fitting Vision of Science for the Courtroom, 3 Wake Forest J.L. & Pol’y 1 (2013).

Page 87 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

technique (see the Reference Guide on Forensic Feature Comparison Evidence, in this manual).

Qualitative Data, Quantitative Data, and Mixed Methods

Quantitative data are numeric; qualitative data are nonnumeric, taking the form of, for example, written descriptions, audio recordings, and video footage. Both are essential forms of evidence within science, just as they are in a trial. Qualitative data are often misperceived as soft, less useful than quantitative data, and the domain of the social sciences. This view overlooks many applications of qualitative data within the natural sciences—for example, to visually identify biological species, minerals, and strata. Neither are quantitative data absent from the social sciences, as illustrated by numerous examples in the Reference Guide on Survey Research, in this manual. Furthermore, nuanced qualitative data from social science, psychology, and behavioral science are particularly valuable for providing context and meaning for quantitative results, as well as for suggesting hypotheses for further investigation. For some analyses, qualitative data may be validly aggregated and converted to quantitative data through a coding scheme. Mixed-methods research uses a combination of qualitative and quantitative techniques within a single investigation to detect patterns and form and test hypotheses.

When research is quantitative in nature, scientists often use common numeric standards to evaluate the outcomes of their tests:

Evaluating hypotheses: In Bayesian approaches, scientists attempt to calculate the probability that a hypothesis is true given the data available. Such approaches begin by choosing a prior probability distribution (“priors”) over mutually exclusive hypotheses. These priors are based on previous research, background theories, or even statistical indifference between possibilities. Then as observations are made or experiments conducted, a new probability distribution called the posterior distribution is calculated using Bayes’s theorem. Over repeated observations or experiments, the values of the posterior distribution will converge to the true values. Thus, Bayesian approaches allow scientists to calculate the probability that the hypothesis is true given the evidence they have observed. In contrast, in the commonly used p-value approach, scientists compare a test hypothesis (e.g., that drug X is effective) to a null (e.g., that there is no difference in cure rates between those who took drug X and those who took a placebo). Scientists then calculate the probability that the null hypothesis could be true even with the observed difference between conditions (e.g., the cure rate of patients taking drug X compared to that of those taking a placebo). For more details, see the Reference Guide on Statistics and Research Methods, in this manual. In these cases, scientists

Page 88 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

often use a probability of 0.05 as the threshold of significance, indicating that the null hypothesis would generate a result as or more extreme than the observed data only 5% of the time, providing support to the non-null hypothesis. For example, in Zoloft product liability litigation, the court excluded expert testimony (arguing that Zoloft caused birth defects in the children of mothers taking the drug) because the testimony was based on trends—patterns in data that do not reach the level of statistical significance—leaving a strong possibility that any association between Zoloft and birth defects was due to chance alone (the null hypothesis).⁹⁹ When considering testimony and evidence that relies on p-values, it is important to keep in mind a few caveats:
1. A p-value lower than 0.05 does not prove that a null hypothesis is false. It is strong evidence, but there is a small chance that the difference observed could be the result of chance alone.
2. Using a low p-value (e.g., 0.05) as a criterion for significance sets a high bar for rejecting the null hypothesis, minimizing the chance of getting a false positive—as a hypothetical example, minimizing the chance that we conclude that Zoloft use in pregnancy leads to more birth defects when, in fact, there is no relationship between Zoloft and birth defects. When the p-value is higher than the cutoff criterion and the null is not rejected (and note that the lower our cutoff criterion, the more likely this is to happen), it might be tempting to conclude that the null hypothesis must be true (e.g., that there is no relationship between Zoloft and birth defects). But this is erroneous. Having not enough evidence to establish a relationship is not the same as having strong evidence that there is no relationship. We should also be concerned about false negatives—that is, concluding that Zoloft use in pregnancy has no impact on birth defects in the hypothetical case that Zoloft use does in fact lead to more birth defects—which could have drastic implications for toxic tort litigation. The likelihood of false negatives can be assessed through an examination of statistical power (see paragraph (3) below).
3. Depending on its design and the underlying system, a study may have limited ability to reject a null hypothesis (i.e., detect a difference) at the significance cutoff criterion. Power refers to a test’s ability to reject a hypothesis that is indeed false. A study with high statistical power minimizes the chance of false negatives. Statistical power increases with the sample size of the study and with the effect size one wishes to detect (see Evaluating impact bulleted paragraph below). Well-designed studies have sufficient power to detect the differences of interest, but it may not be apparent when a test lacks power (see the Reference Guide

99. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 26 F. Supp. 3d 449 (E.D. Pa. 2014).

Page 89 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

1. on Statistics and Research Methods, in this manual, for more details), and it is possible to leverage underpowered studies to argue for no causal effect when one does in fact exist.¹⁰⁰
2. When several separate studies find nonsignificant trends, meta-analyses (see section titled “Systematic Reviews and Meta-analyses” below) may be used to examine what the data taken as a whole say, as any individual test may have limited statistical power for the reasons described above.
3. The 5% cutoff value for significance is a convention, not a number that emerges out of nature or statistical theory, and scientists have their own debates about how p-values should be used.¹⁰¹ There is nothing special about 0.05 other than the fact that it is an a priori criterion; clearly 0.04 or 0.06 are not very different. As described in the section titled “Assessment of Evidence” below, scientists may take into account the outcome of accepting or rejecting a particular hypothesis and adjust the design of their tests, or their threshold for action or acceptance, to guard against harm.
Evaluating impact: While p-values and Bayesian approaches attempt to answer the question “Is there a relationship between variables?” effect sizes answer the question “How strong is this relationship?” P-values can generally be used to interpret the significance of a relationship in the same way across studies, but effect size can be communicated using many different statistics. Effect sizes are an important consideration at trial because they provide an indication of how much of an outcome can be attributed to a particular causal agent. For example, congenital cardiovascular defects in neonates can be caused by a variety of factors and occur relatively frequently even when mothers did not take an antidepressant during pregnancy. In the Zoloft case mentioned above, one of the studies cited by the memorandum opinion found that a different antidepressant, fluoxetine, had an adjusted odds ratio of 4.47, meaning that mothers who took fluoxetine during early pregnancy were estimated to be 4.47 times more likely to have a baby with a congenital cardiovascular anomaly than mothers who did not take the antidepressant during early pregnancy.¹⁰² Causal relationships with a weak effect are difficult to detect without very large

100. See Neutra et al. 2018, supra note 97, for discussion of the judiciary’s awareness of power when evaluating statistical significance.

101. For example, see Daniel J. Benjamin et al., Redefine Statistical Significance, 2 Nature Hum. Behav. 6–10 (2018), https://doi.org/10.1038/s41562-017-0189-z; John P.A. Ioannidis, The Importance of Predefined Rules and Prespecified Statistical Analyses: Do Not Abandon Significance, 321 JAMA 2067–68 (2019), https://doi.org/10.1001/jama.2019.4582; Ronald L. Wasserstein, Allen L. Schirm & Nicole A. Lazar, Moving to a World Beyond “p < 0.05”, 73sup1 Am. Statistician 1–19 (2019), https://doi.org/10.1080/00031305.2019.1583913.

102. Orna Diav-Citrin et al., Paroxetine and Fluoxetine in Pregnancy: A Prospective Multicentre, Controlled, Observational Study, 66 Brit. J. Clinical Pharmacology, 695–705 (2008), https://doi.org.10.1111/j.1365-2125.2008.03261.x.

Page 90 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

sample sizes. In the Zoloft litigation, the plaintiffs’ expert’s testimony would have argued that multiple, nonsignificant associations between Zoloft use and birth defects indicated a causal relationship. The testimony was excluded because these results were consistent with a weak causal relationship (a small effect size), one that is “so weak that one cannot conclude that the risk is greater than that seen in the general population.”¹⁰³
Evaluating estimates: In science (and in contrast to their lay meanings), the terms uncertainty and error refer to the variability of a set of data that is intended to estimate a single number. Uncertainty and error are generally expressed as a range, within which we are confident that, if the study were repeated, the new result would fall. Scientists often use a 95% confidence interval for this purpose. For example, the researchers who first documented the ozone hole compiled daily measurements of total ozone over Halley Bay, Antarctica, taken over more than two decades. The measurements varied substantially from one day to the next. To communicate the overall pattern, the researchers estimated the maximum error bounds on their measurements and showed that the trend of decreasing ozone levels over time was much larger than any potential errors in the measurements. No matter where within the error range the true ozone values fell, the pattern they’d identified would still be exhibited.

Note that the 95% and 5% cutoffs are somewhat arbitrary, and a higher degree of confidence might be required if more certainty were desired—for example if an impactful policy decision depended on the conclusion. Similar, cross-disciplinary, standardized cutoff values for effect sizes do not exist because of the many different statistics used to communicate effect size and because interpreting these statistics is highly dependent on the system under consideration and for what purposes one needs to interpret associations. For more on these statistics, see the Reference Guide on Statistics and Research Methods, in this manual.

Importantly, a low p-value, large effect size, or narrow confidence interval found in a particular study should not be taken to indicate scientific consensus. These statistical measures help communicate the results of an isolated investigation, and scientific consensus is built over the course of many investigations and analyses (see section titled “Achieving Scientific Consensus” below).

Modeling

The term model is used in many different ways in science. However, as a research method of modern science, modeling almost always refers to developing and testing

103. Memorandum Opinion 9, In Re Zoloft Products Liability Litigation, No. 2:12-MD-2342 at 9 (E.D. Pa. 2014), ECF No. 979.

Page 91 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

a mathematical model. These models can be relatively simple and thus analytically tractable; however, other models are so complex that they are built as a piece of computer code. One can think of a model as a surrogate, a simplified representation of a real-world system.¹⁰⁴ That system could be any part of the natural world, from how gases move in the atmosphere to how social media influencers arise.¹⁰⁵

Models generally accept multiple different input values, or parameters, and then generate a set of predictions, which are usually, though not always, quantitative in nature. For example, models of Earth’s atmosphere and climate¹⁰⁶ accept input about the shape of terrain, vegetation cover, ice cover, atmospheric makeup, and many other factors. The models integrate mathematical rules that simulate how gases circulate over the surface of earth, how sunlight is reflected from various surfaces, how water moves from land surfaces into the atmosphere, and more. And they generate predictions about future states of the Earth system: temperature ranges, precipitation levels, frequency of extreme weather events, etc. Essentially, a model presents an argument: If these input values represent the starting point of the system, and if the system works according to the rules implemented in the model’s code or equations, then we’d expect the system to have this predicted state in the future.

Models can and must be tested against evidence, just like hypotheses. Their predictions can be compared to observations to see if the model seems to be a fair representation of the system. If a model’s predictions do not match key elements of our observations or evidence, we must reevaluate our input parameters and/or the sufficiency and accuracy of the model.¹⁰⁷ The atmospheric models that were used to test Rowland and Molina’s hypothesis incorporated many different units of scientific knowledge, which were themselves supported by other lines of evidence. While the models were not perfect representations of the Earth system, they were close enough to generate many predictions that held true of the atmosphere in general, giving scientists confidence that the models were useful approximations of actual mechanisms. And when Rowland and Molina’s hypothesis was incorporated into those models, they generated two key predictions that were borne out in observations: that CFCs should be abundant in the lower atmosphere but rare in the upper atmosphere, a prediction that was immediately verified with evidence from sensors placed on aircraft and balloons,

104. Michael Weisberg, Simulation and Similarity: Using Models to Understand the World (2013) [hereinafter Weisberg 2013].

105. For example, see Penelope Maher et al., Model Hierarchies for Understanding Atmospheric Circulation, 57 Rev. Geophysics 250–80 (2019), https://doi.org/10.1029/2018RG000607; and Nicolò Pagan et al., A Meritocratic Network Formation Model for the Rise of Social Media Influencers, 12 Nature Commc’ns 6865 (2021), https://doi.org/10.1038/s41467-021-27089-8.

106. For example, see June-Yi Lee et al., Future Global Climate: Scenario-Based Projections and Near-Term Information, in Climate Change 2021: The Physical Science Basis 553–672 (2021), https://doi.org/10.1017/9781009157896.006.

107. Weisberg 2013, supra note 104.

Page 92 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

and that the ozone layer would become thinner over time, a prediction that was verified many years later.¹⁰⁸

If a model is supported and seems to be a good representation of the target system, we can use it to answer “what if” questions: What would have happened in 50 years if we had continued CFC production at 1974 rates? How would changing a social network’s recommendation system affect influencers’ ability to spread information? If a hazardous chemical is used in a continuous action air freshener, what dose would an individual confined mostly to the home inhale daily? If a business had continued to operate instead of being barred from doing so, what profits and value would it have accrued during that period of closure? Predictions from widely accepted models may be considered evidentiary and used to build other arguments—for example, about the cause of a disease or injury. The Reference Guide on Exposure Science, in this manual, explains the types of models that are used to estimate the exposures to and environmental fates of chemicals, and the Reference Guide on the Estimation of Economic Damages, in this manual, explains the types of financial models used to estimate damages.

Correlation and Causation

While the often-stated maxim that correlation does not imply causation is true, in fact, correlation is the only means that we have of establishing causation in science.¹⁰⁹ The reason we accept that smoking causes lung cancer is a series of very convincing correlations: between increases in population-level cigarette consumption and lung cancer rates; between exposure of lab animals to cigarette smoke tars and the development of tumors; between smoking and the presence of precancerous cells in the lungs; etc.¹¹⁰ Experiments are valuable for establishing causation because they allow us to rule out many confounding variables, but interpreting experimental evidence still involves looking for correlations between experimental manipulations and outcomes. When many correlations line up in different tests, all linked by a logical natural explanation, scientists are likely to accept a causal relationship.¹¹¹ In our ozone layer example, when correlations were shown between the presence of ice crystals and more rapid ozone-depleting

108. For example, see A. L. Schmeltekopf et al., Measurements of Stratospheric CFCl₃, CF₂Cl₂, and N₂O, 2 Geophysical Rsch. Letters 393–96 (1975), https://doi.org/10.1029/GL002i009p00393; and Farman et al. 1985, supra note 8.

109. Judea Pearl, Causality (2d ed. 2009).

110. Robert N. Proctor, The History of the Discovery of the Cigarette-Lung Cancer Link: Evidentiary Traditions, Corporate Denial, Global Toll, 21 Tobacco Control 87–91 (2011), https://doi.org/10.1136/tobaccocontrol-2011-050338.

111. For considerations in inferring causation in the field of epidemiology, see discussion of the Bradford-Hill criteria for causation in Steve C. Gold et al., Reference Guide on

Page 93 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

chemical reactions in a lab environment, between the prevalence of icy polar clouds and the degree of ozone depletion in the stratosphere, and between the predictions of models that incorporated icy clouds and observations of the atmosphere, scientists accepted that polar stratospheric clouds were part of the cause of the extreme ozone loss they observed in Antarctica.¹¹² Correlations are the basis of essential evidence both in science and at trial.

Assumptions and Auxiliary Hypotheses

Assumptions are often perceived as a liability in logically reaching a sound conclusion, but in fact all scientific tests depend on making assumptions. Even testing a simple hypothesis about the presence of lead in drinking water involves making many assumptions about how the samples were collected, how the analytic equipment was calibrated, and of course, the accuracy of the stores of scientific knowledge that go into the field of optical emission spectroscopy, which is used to determine the elemental constituents of metals. One might wonder about trusting any scientific evidence if it all relies on assumptions. In science, assumptions are treated as auxiliary hypotheses, which can themselves be tested, a process that is part of the fabric of scientific knowledge building. By testing hypotheses and auxiliary hypotheses in different combinations, using different methods, science can triangulate between lines of evidence, homing in on the accuracy of a particular idea, independent of the assumptions that underlie its tests.¹¹³

For example, the sorts of atmospheric models that were used to examine the link between CFCs and ozone depletion are complex and themselves represent hundreds of auxiliary hypotheses—about how molecules move and diffuse through the atmosphere, about how solar radiation penetrates the atmosphere, about the strength of solar radiation throughout the day, and so forth—each of which had some justification and/or evidence supporting it.¹¹⁴ When Molina and Rowland discovered that chlorine nitrate was longer lived than previously reported and so could interact with chlorine atoms from CFCs, they published this information so that it could be added to atmospheric models.¹¹⁵ Several groups were involved in testing the Rowland-Molina hypothesis, and up to this

Epidemiology, in this manual. For an analysis of potential judicial interpretations of Bradford-Hill, see Neutra et al. 2018, supra note 97.

112. For example, see Susan Solomon, Progress Towards a Quantitative Understanding of Antarctic Ozone Depletion, 347 Nature 347–54 (1990), https://doi.org/10.1038/347347a0.

113. Carl G. Hempel, Philosophy of Natural Science (1966).

114. For an introduction to these models and examination of how they have developed over time, see Paul N. Edwards, History of Climate Modeling, 2 WIREs Climate Change 128–39 (2011), https://doi.org/10.1002/wcc.95.

115. Rowland et al. 1976, supra note 41.

Page 94 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

point, their models’ predictions had largely agreed. But once the researchers added the chlorine nitrate reactions to their models, different groups got conflicting results. Some models even predicted an increase in ozone. The discrepancy was traced to a particular auxiliary hypothesis. Those models predicting a net increase in ozone had one thing in common: They all used the approximation that the sun shines at its average intensity all the time, instead of varying throughout the day.¹¹⁶ Correcting this simplifying auxiliary hypothesis brought these models into agreement with the other models and with the actual atmospheric measurements of chlorine nitrate. Incorrect assumptions can be identified as such because of the iterative nature of the process of science.

Replication and the “Replication Crisis”

Scientists aim for their tests to be replicable—so that, for example, two studies examining consumers’ perception of the “Thermos” brand name would yield concordant results if carried out in similar contexts.¹¹⁷ The goal of replicability comes with science’s job of uncovering the mechanisms by which the natural world operates; these mechanisms are consistently at work, and so, if a study’s finding cannot be replicated, it suggests that our understanding of the mechanism underlying the result or our testing methodologies are insufficient. In practice, many studies are not repeated step by step in an exact replication of the original, and those that are replicated exactly may not be published.¹¹⁸ Instead, hypotheses are often tested in slightly different contexts using varied methods, a process that helps establish the accuracy of the idea and further illuminates its scope and applicability. In our ozone layer example, scientists’ use of multiple atmospheric models, some of which made simplifying hypotheses about the strength of solar radiation throughout the day and some of which did not, led to a better understanding of ozone depletion and of the sensitivity of the models, which are used for other purposes, like weather and climate predictions.

In recent years, concerns have arisen that many published scientific studies are unreplicable and that the hypotheses they purport to support may in fact be incorrect.¹¹⁹ This concern is often termed “the replication crisis,” and we will

116. F. S. Rowland, John E. Spencer, & Mario J. Molina, Estimated Relative Abundance of Chlorine Nitrate among Stratospheric Chlorine Compounds, 80 J. Phys. Chem. 2713–15 (1976), https://doi.org/10.1021/j100565a020.

117. In different disciplines and contexts, the terms replicability, reproducibility, and repeatability are used in inconsistent ways. Here, we use the term replication to mean a study carried out with the intent of mimicking another to the closest degree possible.

118. See NASEM, supra note 46, and Fiona Fidler & John Wilcox, Reproducibility of Scientific Results, in The Stanford Encyclopedia of Philosophy (Edward N. Zalta ed., 2021).

119. See NASEM, supra note 46, and more recent publications such as Timothy M. Errington et al., Investigating the Replicability of Preclinical Cancer Biology, 10 eLife e71601 (2021), https://doi.org/10.7554/eLife.71601.

Page 95 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

use that terminology for consistency.¹²⁰ The identification of many unreplicable studies leads to valid questions about science’s reward structure of promoting low quality and misleading practices. Individual scientists and scientific institutions have responded with proposals for incentivizing exact replication studies, incentivizing the reporting of negative or nonsignificant results, encouraging transparent reporting, diversifying peer review, and changing aspects of science’s reward structure that seem to be to blame for the apparent proliferation of unreplicable studies.¹²¹ Concrete steps aimed at achieving these goals have been taken by, for example, expanding the use of platforms where researchers can preregister a planned investigation, describing hypotheses to be examined, methods used, and planned analyses. Preregistration may help reduce the likelihood that researchers will fail to report null results or mine their data for significant results, which will occur by chance alone 5% of the time with a p-value of 0.05, as described above. Preregistration is standard practice for medical clinical trials, but it is becoming more common, especially within psychology. However, it remains to be seen to what extent, if any, preregistration will improve replicability; research on this question is still in progress.¹²² Nevertheless, preregistration is just one of many initiatives that scientific institutions are pursuing to help address the replication crisis, reflecting their commitment to encouraging high-quality research.¹²³

While there is no question that changing science to promote higher-quality research is worthwhile and will make science more efficient at building accurate knowledge about the natural world, the fact that there is room for improvement in the process of science does not necessitate distrust of hypotheses that have gained widespread acceptance in the scientific community and about which consensus has been achieved. Indeed, some have argued that the problem of unreplicated results may not be any worse today than it has been over the history

120. However, as discussed herein, describing the situation as a “crisis” implies that it is a new situation and a catastrophic problem for scientific progress, which does not seem to be the case.

121. For examples of proposed changes, see Marcus R. Munafò et al., A Manifesto for Reproducible Science, 0021 Nature Hum. Behav. 1–9 (2017), https://doi.org/10.1038/s41562-016-0021; and Piers D. L. Howe & Amy Perfors, An Argument for How (and Why) to Incentivize Replication, 41 Behav. & Brain Scis. e135 (2018), https://doi.org/10.1017/S0140525X18000705, as well as NASEM, supra note 46.

122. For some research on this question, see A. Claesen, S. Gomes, F. Tuerlinckx, & W. Vanpaemel, Comparing Dream to Reality: An Assessment of Adherence of the First Generation of Preregistered Studies, R. Soc. Open Sci. 211037 (2021), http://doi.org/10.1098/rsos.211037, and Christopher Allen & David M. A. Mehler, Open Science Challenges, Benefits and Tips in Early Career and Beyond, 17 PLoS Biology e3000246 (2019), https://doi.org/10.1371/journal.pbio.3000587. For discussion of why preregistration might not work as planned, see Kai Kupferschmidt, More and More Scientists Are Preregistering Their Studies. Should You?, Science (2018), https://doi.org/10.1126/science.aav4786.

123. Max Korbmacher et al., The Replication Crisis Has Led to Positive Structural, Procedural, and Community Changes, 1 Commc’ns Psych. 1–13 (2023), https://doi.org/10.1038/s44271-023-00003-2.

Page 96 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

of Western science and that, in any case, this problem is not impeding progress at building new, accurate knowledge about the natural world.¹²⁴

The replication crisis is a concern about individual investigations, not scientific consensus. Not all published ideas will be replicated, but neither are they all worth trying to replicate. Hypotheses that seem fruitful, explanatory, and potentially important are useful for understanding the world and so will be tested in different ways, by different people. In science, the fact that a particular study has not been precisely replicated does not mean it is invalid. Nor is a successful replication attempt necessarily an indication that the idea being tested is correct. Only ideas that have held up to repeated testing in different ways, having built up a body of supporting evidence, are likely to gain wide acceptance within science. However, judges are frequently asked to consider evidence that is far from achieving consensus. The replication crisis makes clear that peer review alone cannot ensure that the conclusions of published studies are actually correct, highlighting the responsibility judges bear in evaluating the validity of the methodologies that contributed to a particular piece of research.

Achieving Scientific Consensus

Much of the preceding text, as well as subsequent reference guides, will assist judges in evaluating the methods behind evidence relating to emerging or contentious science. Here, we examine the opposite end of the continuum of certainty: scientific consensus or widespread acceptance, the fifth factor that Daubert guides judges toward in their consideration of the reliability of expert testimony—though we emphasize that consensus is not a requirement for admissibility. While widespread acceptance provides a strong indicator of the reliability of scientifically acquired knowledge, there is no surefire way of assessing which hypotheses have reached this level of acceptance. No regular polls of scientists assess scientific consensus, although on some issues with significant societal importance and controversy, like climate change, such polls may be carried out.¹²⁵ Sometimes consensus conferences or panels are convened or consensus reports are written to assess the state of the field and/or to better communicate important conclusions to the public and policy makers. For example, several organizations, such as the

124. See Daniele Fanelli, Is Science Really Facing a Reproducibility Crisis, and Do We Need It To?, 115 PNAS 2628–31 (2018), https://doi.org/10.1073/pns.1708272114; and Richard M. Shiffrin, Katy Börner, & Stephen M. Stigler, Scientific Progress Despite Irreproducibility: A Seeming Paradox, 115 PNAS 2632–39 (2018), https://doi.org/10.1073/pnas.1711786114.

125. For examples, see Peter T. Doran & Maggie Kendall Zimmerman, Examining the Scientific Consensus on Climate Change, 90 Eos 22–23 (2011), https://doi.org/10.1029/2009EO030002; Neil Stenhouse et al., Meteorologists’ Views About Global Warming: A Survey of American Meteorological Society Professional Members, 95 Bull. Am. Meteorological Soc’y 1029–40 (2014), https://doi.org/10.1175/BAMS-D-13-00091.1.

Page 97 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Community Preventive Services Task Force and the Cochrane Collaboration, focus on producing reports that assess scientific consensus on medical questions, and the Intergovernmental Panel on Climate Change has, since 1990, produced reports assessing the scientific consensus on climate change. Similarly, the National Research Council and National Academies regularly produce consensus reports on a variety of scientific issues, including topics as diverse as cryptography, policing outcomes, and the health effects of electric and magnetic fields.¹²⁶ Returning to our ozone example, in 2003 the World Meteorological Association issued a report synthesizing scientific knowledge on ozone depletion to which 275 disciplinary experts contributed.¹²⁷ Such reports are valuable indicators of scientific consensus, but many scientific topics in question at a trial are unlikely to have such definitive documentation of acceptance to which to turn. Figure 3 summarizes some of the signals that may indicate scientific consensus has been reached, as well as signs that insufficient evidence has accumulated for the hypothesis to reach widespread acceptance. The link between CFCs and ozone depletion began at the left side of this diagram—as a single peer-reviewed study—and over the course of 15 years, accumulated so much evidence and withstood so many rounds of scrutiny that it has now achieved widespread acceptance and the highest level of certainty science has to offer, placing it at the far right side of the diagram.

Figure 3. Indicators of scientific consensus. Scientific consensus is formed based on a body of evidence, not a single study or investigation.

126. National Research Council, Possible Health Effects of Exposure to Residential Electric and Magnetic Fields (1997), https://doi.org/10.17226/5155; NASEM, Cryptography and the Intelligence Community: The Future of Encryption (2022), https://doi.org/10.17226/26168; NASEM, Policing to Promote the Rule of Law and Protect the Population: An Evidence-Based Approach (2022), https://doi.org/10.17226/26217.

127. World Meteorological Organization, Scientific Assessment of Ozone Depletion: 2002, Global Ozone Research and Monitoring Project—Report No. 47 (2003).

Page 98 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

The path to consensus generally involves the accumulation of multiple lines of converging evidence, as studies are conducted, published, scrutinized, and iterated on, bringing more and more of the community into agreement.¹²⁸ This process works most effectively when it is carried out by a diverse group of scientists adhering to the expectations and norms outlined above. Even once broad consensus has been achieved, it is unlikely to be completely unanimous. Because scientific explanations are inherently fallible and science values skepticism, it is typical for a few in the community to remain unconvinced by even extremely compelling evidence. In fact, in a state of consensus, having a few holdouts can help maintain a degree of vigilance among the persuaded, which helps ensure the reliability of accepted knowledge. When a judge is presented with a disagreement among scientific experts, it is reasonable to seek clarification on how representative of the scientific community the two views are. Is this a case of truly unsettled science, where methodologically sound studies have been carried out and scrutinized, but open questions and disagreements remain because the hypothesis has not yet been thoroughly investigated, in which case, each party may have comparable claims on scientific reliability? Or is it a case of relatively settled science, in which one party has recruited experts who interpret the evidence differently than most of the community?¹²⁹ Note that public perception of the certainty of a scientific concept or hypothesis may differ from the actual stage of consensus building within the scientific community. This sometimes occurs as a result of strategic manipulation from stakeholders who stand to be harmed if the public were to understand the true state of scientific consensus surrounding the hypothesis, as has occurred with, for example, the health effects of tobacco, ozone depletion, and climate change.¹³⁰

Despite there being a culture of skepticism in science, consensus is often reached over time. Hypotheses are more likely to gain widespread acceptance and achieve scientific consensus if they:

128. Mary Jo Nye, Molecular Reality: A Perspective on the Scientific Work of Jean Perrin (1972).

129. For example, see Allen v. Hyland’s, Inc., which allowed testimony from experts in homeopathy, a set of rejected hypotheses about medical causation, and from a scientist noted for holding fringe views in the scientific community. See Kristin Shrader-Frechette, Conceptual Analysis and Special-Interest Science: Toxicology and the Case of Edward Calabrese, 177 Synthese 449–69 (2010), https://doi.org/10.1007/s11229-010-9792-5; and Jan Beyea, Lessons to be Learned From a Contentious Challenge to Mainstream Radiobiological Science (the Linear No-Threshold Theory of Genetic Mutations), 154 Env’t Res. 362–79 (2017), https://doi.org/10.1016/j.envres.2017.01.032.

130. For example, see Michael E. Mann, The New Climate War: The Fight to Take Back our Planet (2021); Naomi Oreskes, The Scientific Consensus on Climate Change, 306 Science 1686 (2004), https://doi.org/10.1126/science.1103618; Oreskes & Conway 2010, supra note 2; and G. Supran, S. Rahmsdorf & N. Oreskes, Assessing ExxonMobil’s Global Warming Projections, Science 153-162 (2023), https://doi.org/10.1126/science.abk0063.

Page 99 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

have been tested and garnered support many times in many ways,
help explain disparate and previously unexplained observations,
can more closely explain our observations than other hypotheses,
can be broadly applied, and
are consistent with well-established hypotheses in neighboring fields.

Systematic Reviews and Meta-analyses

Two types of publication can be particularly useful in understanding and assessing the lines of evidence relevant to a hypothesis across multiple investigations. Systematic reviews attempt to synthesize and summarize the current state of research on a topic. In them, the authors delineate both the criteria that studies must meet for inclusion in the review and the methods that will be used to assess the studies. Meta-analyses extend this process by using statistics to analyze data from multiple studies.¹³¹ Systematic reviews and meta-analyses are used across the broad array of topics that science seeks to understand—from that which can be tightly controlled (e.g., a meta-analysis of randomized controlled trials of antimanic treatments) to complex and confounded social issues (e.g., whether school-choice programs boost student achievement).¹³² For example, in the years after Rowland and Molina published their hypothesis, scientists worked to understand how the depletion of the ozone layer might affect life on Earth. One meta-analysis examined 62 field-based studies designed to assess how an increase in ultraviolet-B radiation, which the ozone layer helps filter out, might affect plant life, concluding that the effects on plants were likely to be subtle.¹³³ Meta-analyses are an important line of scientific evidence for the courts, but, as with all scientific methodologies, require the rigorous application of reliable methodologies, as recognized in In re Paoli R.R. Yard PCB Litigation, in which the exclusion of a contested meta-analysis was overturned.¹³⁴

Assessing the state of scientific consensus often requires some degree of subject matter expertise and evaluation, which could be vulnerable to personal biases. Meta-analyses do not entirely sidestep these criticisms, as such analyses

131. Many guidelines for systematic reviews and meta-analyses appear in the peer-reviewed literature; however, these are generally oriented toward specific disciplines, making it difficult to cite a publication that applies broadly across the sciences.

132. Aysegül Yildiz et al., Efficacy of Antimanic Treatments: Meta-Analysis of Randomized, Controlled Trials, 36 Neuropsychopharmacology 375 (2011), https://doi.org/10.1038/npp.2010.192; Huriya Jabbar et al., The Competitive Effects of School Choice on Student Achievement: A Systematic Review, 36 Educ. Pol’y 247–81 (2022), https://doi.org/10.1177/0895904819874756.

133. Peter S. Searles, Stephen D. Flint, & Martyn M. Caldwell, A Meta-Analysis of Plant Field Studies Simulating Stratospheric Ozone Depletion, 127 Oecologia 1–10 (2001), https://doi.org/10.1007/s004420000592.

134. 916 F.2d 829 (3d Cir. 1990).

Page 100 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

are impacted by an array of assumptions and methodological approaches, about which practitioners may disagree.¹³⁵ For this and other reasons, some sociologists of science have explored approaches to assessing consensus that examine emergent properties of consensus building and that can be applied without an evaluation of the subject matter and evidence at hand, and without polling scientists or convening expert panels. For example, Shwed and Bearman examined patterns of citations within the scientific literature to assess the timing of consensus formation around the hypotheses that smoking causes cancer, that coffee does not cause cancer, and that vaccines do not cause autism.¹³⁶ This approach has also been leveraged to assess scientific consensus on the question of whether children raised by same-sex parents have similar outcomes to children raised in other family settings,¹³⁷ an analysis motivated by legal debate on the issue.¹³⁸

Not Necessarily Consensus

A few, easy-to-grasp (and so, tempting) criteria should not be taken as indicators of scientific consensus that a hypothesis is correct. These deceptive heuristics include:

Being called a law: The terminology used to refer to a scientific explanation—whether hypothesis, theory, law, or model—does not indicate, in and of itself, that an idea is accurate. “Mere” hypotheses may have garnered much support and may be widely accepted, while some scientific laws have been overturned or are understood to have many exceptions. For example, Lamarck’s second law stated that traits acquired during an individual’s lifetime (e.g., muscular strength developed by lifting weights) will be passed down to one’s offspring—an idea that is at odds with most of modern genetics.
Statistical significance: A statistically significant result (e.g., low p-value) or high confidence in a particular measurement does not itself indicate scientific consensus in the context of a single study. For example, a 2012 study found a small but statistically significant association between exposure to

135. Jop de Vrieze, Meta-Analyses Were Supposed to End Scientific Debates. Often, They Only Cause More Controversy, Science 18 (Sep. 18, 2018), https://perma.cc/H39R-XTXN.

136. Uri S. Shwed & Peter S. Bearman, The Temporal Structure of Scientific Consensus Formation, 75 Am. Socio. Rev. 817–40 (2010), https://doi.org/10.1177/0003122410388488.

137. jimi adams & Ryan Light, Scientific Consensus, the Law, and Same Sex Parenting Outcomes, 53 Soc. Sci. Rsch. 200–310 (2015), https://dx.doi.org/10.1016/j.ssresearch.2015.06.008.

138. In oral arguments in Hollingsworth v. Perry, Justice Scalia stated, “... there’s considerable disagreement among sociologists as to what the consequences of raising a child in a single-sex family, whether that is harmful to the child or not.” (Transcript of oral argument at 19, Hollingsworth v. Perry, 570 U.S. 693 (2013), https://www.supremecourt.gov/oral_arguments/argument_transcripts/2012/12-144_5if6.pdf).

Page 101 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Zoloft in utero and major congenital malformations.¹³⁹ But other studies have reached different conclusions on this question, and a later meta-analysis that included the 2012 study found no significantly increased risk for major congenital anomalies associated with Zoloft use during pregnancy.¹⁴⁰ Despite an earlier statistically significant result, scientific evidence appears to be converging around there being no or minimal elevated risk for these birth defects as a result of a pregnant person’s Zoloft use. While one’s confidence in the conclusions of a lone study should always be tempered, if a particular method, measure, or analytic technique has been established over the course of multiple studies and is widely accepted by the scientific community, then one might well feel highly confident about the reliability of a significant result obtained through a particular application of that approach, bearing in mind that statistically significant results necessarily occur by chance alone some of the time.
Peer-reviewed publication: This is an important milestone along the road to achieving consensus, but of course, many hypotheses published in peer-reviewed journals are ultimately rejected by the scientific community and do not make it very far in that journey. For example, before Watson and Crick published their ideas about the structure of DNA, a peer-reviewed publication proposed the now-rejected hypothesis that DNA is a three-stranded molecule.¹⁴¹
Journal prestige, scientific eminence, or high citation metrics alone: Hypotheses published in journals with high standards for evidence and stringent peer-review practices that then go on to receive many citations because other scientists are building on the research is an indication that the scientific community is moving toward broad acceptance of a hypothesis; however, any one of these factors in isolation is a poor indicator of consensus. The fact that an article has high citation metrics, was published in a prestigious journal (often loosely indicated by the journal’s impact factor), or has a famous author is not by itself a reason to conclude that the study’s findings are widely accepted scientific facts. The vagaries, inconsistencies, and biases of publishing are discussed above, and plenty of renowned scientists have been wrong in prestigious journals. The proposal for the three-stranded DNA helix was authored by Linus Pauling, winner of the Nobel Prize in Chemistry, in the Proceedings of the

139. Espen Jimenez-Solen et al., Exposure to Selective Serotonin Reuptake Inhibitors and the Risk of Congenital Malformations: A Nationwide Cohort Study, 2 BMJ Open e001148 (2012), https://doi.org/10.1136/bmjopen-2012-001148.

140. Shan-Yan Gao et al., Selective Serotonin Reuptake Inhibitor Use During Early Pregnancy and Congenital Malformations: A Systematic Review and Meta-Analysis of Cohort Studies of More than 9 Million Births, 16 BMC Med. 205 (2018), https://doi.org/10.1186/s12916-018-1193-5.

141. Linus Pauling & Robert B. Corey, A Proposed Structure for the Nucleic Acids, 39 Proc. of the Nat’l Acad. of Sci. USA 84–97 (1953), https://doi.org/10.1073/pnas.39.2.84.

Page 102 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

National Academy of Sciences, for example. Furthermore, women, people of color, other historically oppressed groups, and non-Western people have been, and in many cases continue to be, unfairly dismissed and so may lack the recognition their work deserves, which includes prestigious appointments and publications.¹⁴² Ultimately, evidence, not reputation, determines scientific acceptance, although these other factors are likely to influence how long it takes the community to reach consensus and who receives credit for developing the knowledge.

Because scientific consensus emerges from community-level processes driven by scrutiny, additional testing, and application of a hypothesis, the simple indicators listed here are not, in isolation, signals of consensus.

Myths About Science

Above we described and debunked some common myths about science, which include the following.

Myth: There is a single scientific method that all scientists follow.

Fact: The process of science is nonlinear and dynamic.

Myth: The institution of peer review ensures that all published papers are sound and dependable.

Fact: The conclusions that many peer-reviewed articles reach will ultimately turn out to be incorrect.

Myth: Without an experiment, a scientific investigation cannot be rigorous or reliable.

Fact: Nonexperimental evidence is critical in evaluating many scientific ideas.

Myth: Qualitative data are “soft” and relatively unimportant.

Fact: Qualitative data are essential to providing context and meaning to quantitative results, especially within social science, and are key to generating hypotheses in many disciplines.

A few other common misconceptions about science warrant unpacking.

Myth: Hard sciences are more rigorous and scientific than so-called soft sciences.

142. See Esther A. Odekunle, Dismantling Systemic Racism in Science, 369 Science 780–81 (2020), https://doi.org/10.1126/science.abd7531 and references therein.

Page 103 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Fact: Both the classic hard sciences (physics, chemistry, biology, geology, and astronomy) and the social sciences, which are sometimes portrayed as soft, can be approached with well-designed, rigorous investigations and develop reliable explanations for phenomena.¹⁴³

As later reference guides in this manual make clear, many questions dealing with economics, decision making, and human psychology are legitimate topics of scientific investigation and employ rigorous, reliable methodologies. A variety of experimental, nonexperimental, qualitative, and quantitative methodologies are employed by all of these fields, as well as in the traditional natural sciences, and judges can apply similar considerations regarding the process of science, as outlined in this reference guide, to all of them. Of course, the more complex a study system is, the more difficult it is to control variables and untangle causal mechanisms. Furthermore, tightly controlling a study system can make it more difficult to apply those results to real-world problems. Many systems of concern in the social sciences, earth sciences, and biology are enormously complex, posing methodological challenges; however, because scientific questions in these fields are also of pressing importance to human well-being, scientists go to great lengths to meet and overcome such challenges.

Myth: Science undergoes periodic scientific revolutions, or paradigm shifts, in which an old understanding of a natural system is replaced by a new one, and the two provide such different worldviews that it is not possible to make comparisons of evidence, or perhaps even communicate, across the two perspectives.

Fact: Science moves forward incrementally, as well as in large leaps. These changes occur because evidence has accumulated, and the new view makes sense of more evidence more coherently than did the old one.

Philosopher Thomas Kuhn famously conceptualized large-scale changes in accepted scientific knowledge as paradigm shifts.¹⁴⁴ According to Kuhn, the history of science can be divided up into times of normal science, when scientists add to and elaborate on an accepted scientific theory, such as Newton’s classical mechanics, and briefer periods of revolutionary science, when there is a switch to a new theory or paradigm—in this case, Einstein’s theory of relativity, which explains everything that classical mechanics did and more. Kuhn argued that new and old paradigms were incommensurable, though later retreated from this view somewhat. Philosophers and historians of science generally agree that many details of Kuhn’s concept of paradigm shifts do not align well with how changes

143. Larry V. Hedges, How Hard Is Hard Science, How Soft Is Soft Science? The Empirical Cumulativeness of Research, 42 Am. Psych. 443–55 (1987), https://doi.org/10.1037/0003-066X.42.5.443.

144. Thomas S. Kuhn, The Structure of Scientific Revolutions (1962).

Page 104 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

in scientific explanations actually happened.¹⁴⁵ While Kuhn’s ideas are not a broadly accepted account of scientific change among modern philosophers, sociologists, and historians of science, they were influential in many ways, for example by focusing attention on theory change within science and the potential role of the resolution of evidentiary anomalies in this process.

Myth: Science moves forward only by falsifying, rejecting, or disproving hypotheses, leaving the “last hypothesis standing” as the most likely to be correct.

Fact: Science can neither prove nor disprove hypotheses, and scientists evaluate hypotheses based on both supporting and refuting evidence.

This misconception is based on the idea of falsification, philosopher Karl Popper’s influential account of scientific justification, which suggests that all science can do is reject, or falsify, hypotheses and that science cannot find evidence that supports one idea over others.¹⁴⁶ Falsification is a popular philosophical doctrine, especially with scientists, and apparently with judges as well, as the Daubert court explained, “Scientific methodology today is based on generating hypotheses and testing them to see if they can be falsified [emphasis added]; indeed, this methodology is what distinguishes science from other fields of human inquiry.”¹⁴⁷ However, falsification is not a complete or accurate picture of how scientific knowledge is built. First of all, reviewing the logical arguments presented in any scientific journal will reveal that evidence can and does play a role in supporting particular hypotheses over others, not just in ruling some ideas out, as implied by the doctrine of falsification. Furthermore, philosophers of science now recognize that science cannot once-and-for-all, absolutely prove any idea to be false or true.¹⁴⁸ Even if all the available evidence lines up for or against a particular hypothesis, science always allows for the outside possibility that we are missing something—for example, that one of our assumptions about the system is incorrect and so our tests are not measuring what we think they are, that our interpretation of the evidence is shaded by societal norms and that future generations will see the results differently, or that there is a subtle bug in the code of a common statistical package that is leading everyone in the same wrong direction. Those are certainly very, very unlikely possibilities, but they are not strictly impossible; hence, the essential fallibility of scientific knowledge. Of course, this does not

145. Kitcher 1995, supra note 5.

146. Karl R. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge (1963).

147. See Daubert, 509 U.S. 579 at 593 (citing Michael D. Green, Expert Witnesses and Sufficiency of Evidence in Toxic Substances Litigation: The Legacy of Agent Orange and Bendectin Litigation, 86 Nw. U. L. Rev. 643 (1992)).

148. Godfrey-Smith 2003, supra note 3; Kitcher 1983, supra note 27; and Kitcher 1995, supra note 5.

Page 105 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

mean that all scientific hypotheses are equally likely; in fact, it’s the opposite. The whole business of science is sorting through explanations and collecting evidence to try to find those that are most expansive and powerful and are supported by the widest variety of evidence. Despite the fact that scientists can never reach absolute proof or disproof, there are plenty of hypotheses that scientists are remarkably certain are either true or untrue, and these ideas are extraordinarily unlikely to ever be overturned. Nonscientists justifiably place their trust in this same set of reliable, accepted hypotheses every time we rely on the myriad of technologies, policies, and other applications that derive from them.

Myth: If a scientific idea is called a hypothesis or a theory, this indicates that the explanation is uncertain, whereas laws are nearly irrefutable scientific ideas.¹⁴⁹

Fact: Hypotheses, theories, and laws are scientific explanations that differ in breadth, not in level of support, and these terms are not consistently used across scientific disciplines or throughout history.

In everyday language, the words hypothesis and theory usually refer to educated guesses or ideas that we are quite uncertain about, while laws are viewed as rigid mechanisms that have few exceptions. However, the meaning of these words differs within modern science, where hypotheses are generally understood to be potential explanations for natural phenomena regardless of the extent to which the explanations have been investigated or the amount of evidence supporting or refuting them; the term law is often used to refer to a statement (often a mathematical statement) about how observable phenomena are related; and theories are understood to be deep explanations that apply to a broad range of phenomena and that may integrate many hypotheses and laws. Any individual hypothesis, theory, or law may have accumulated strong evidence supporting or refuting it—or little in either direction. Adding further confusion, use of these terms is not standardized, and they have gained popularity in different scientific disciplines and at different points in history. Thus, we have Newton’s laws, which are still widely applicable but were subsumed at an explanatory level by Einstein’s now accepted theory of relativity; Lamarck’s laws, now rejected, their explanatory position occupied by the theory of evolution; and the once-heretical prion hypothesis, now settled science and the basis for the 1997 Nobel Prize for Medicine or Physiology. In practice, judges should set little store by the terminology used to refer to a unit of purported scientific knowledge and focus instead on the methodologies and evidence that underlie it.

Myth: Scientists are completely objective in their evaluation of scientific ideas and evidence.

149. McComas 1998, supra note 4.

Page 106 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Fact: Scientists’ work is influenced by their biases, motivations, backgrounds, and experiences.

Scientists may be motivated to different degrees by curiosity, by the desire to solve a problem or gain recognition or funding, or by personal financial interests. These factors and many other conscious and unconscious biases shape the questions scientists ask, how they investigate these questions, how they interpret evidence, and how they evaluate and credit the work of others. However, because scientists are expected to, incentivized to, and generally do strive for objectivity and because of the ongoing community scrutiny of scientific findings from many different viewpoints, such biases are attenuated and ultimately corrected—but this does take time. In the context of a trial concerning as-yet unsettled science, judges can reasonably weigh potential sources of bias, particularly when research has shown this to influence results, as in the case of funding bias.

Myth: Research that comes out of universities is pure and unbiased by financial and other considerations.

Fact: All research is vulnerable to bias.

The same arguments about bias that apply to individual scientists also apply to research institutions of all sorts. Universities, government agencies, organizations, and industries all have complex internal and external incentive systems that shape how they operate and carry out research. Universities may earn revenues from patents, and many have offices devoted to handling intellectual property. Such conflicts of interest can be relevant in assessing potential sources of bias at trial.

Science and the Law

Science and the law are sometimes seen to be clashing cultures, but in fact they have many overlapping properties and interests.¹⁵⁰ Both are oriented toward discovering truths and facts. Both are complex endeavors, involving many interacting individuals and institutions that, while guided by overarching ideals, are nonetheless embedded in and shaped by the cultures and values of society and of their institutions. Both endeavors are influenced by the backgrounds, experiences, biases, and values of their human practitioners but have mechanisms and guidelines designed to help them achieve their aims and identify truths. Here, we outline further similarities and differences between science and the law that may help judges bridge these two cultures and make useful connections between them.

150. Jasanoff 2005, supra note 5.

Page 107 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Assessment of Evidence

Both science and law rely on evidence. Judges use scientific evidence to strengthen their ability to understand the factual basis of a dispute to reach a fair and just resolution of a conflict. Judges must make preliminary assessments of scientific information presented as evidence at trial to ensure that information meets appropriate standards of scientific integrity for consideration by a jury. Scientists, on the other hand, use evidence to make judgments about the accuracy of hypotheses.

Both institutions put individuals in the position of weighing the often-incomplete evidence to come to a conclusion about what the likely truth is. In the courtroom, judges and juries must make decisions based on the current state of the scientific evidence presented during a trial. This distinct terminus to deliberation is a requirement of the law but is not generally a part of science. In most cases, scientists can and will reserve judgment on a hypothesis if warranted, waiting for more evidence before accepting or rejecting it. This difference between science and the law was acknowledged in the decision on the Zoloft litigation referenced above: “The Court recognizes that the final scientific verdict as to whether Zoloft can cause birth defects may not be delivered for many years. Nevertheless, Plaintiffs chose when to file their cases, and the Court concludes that for the Plaintiffs who have continued to pursue their claims, the litigation gates must be closed.”¹⁵¹

Within the law, the standards for assessing evidence vary by jurisdiction and type of action. Criminal cases, for example, require proof beyond a reasonable doubt to draw the conclusion of wrongdoing, while civil cases require only a preponderance of evidence. As described in The Admissibility of Expert Testimony, in this manual, scientific evidence is admissible if the judge finds it “more likely than not that the expert’s methods are reliable and that they are reliably applied to the facts at hand.” Scientists too may vary their standards for evidence depending on the situation and the implications of the conclusion, but these recalibrations are not standardized across discipline or situation, as is the case in litigation. Scientists may, for example, lower their threshold for action when the investigators perceive a threat of harm, as occurred when Molina and Rowland began lobbying for policy change long before scientific consensus around the connection between CFCs and ozone depletion was achieved.

In the United States, our common-law tradition often relies on a lay jury to assess evidence to resolve disputes related to science. However, in science, only in rare cases is a specific body tasked with assessing the evidence and determining scientific consensus (see section titled “Achieving Scientific Consensus” above). Instead, consensus is usually an emergent property of interactions within the scientific community. Within science, disputes about the accuracy of

151. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 176 F. Supp. 3d 483 (E.D. Pa. 2016).

Page 108 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

hypotheses are managed by experts in that field and ultimately depend on the evidence, not the lay public’s perception of the issues. For example, scientific experts have long since reached consensus on the fact that measles, mumps, and rubella (MMR) vaccines do not cause autism; however, this concern still looms large for some of the lay public, continues to make its way into the courts, and may find sympathies within the jury box.

The emphasis in the legal system on resolving legal conflicts within the common-law tradition may require presentation of scientific testimony in ways that are uncommon in the ordinary course of science. For example, the legal system allows parties or the court to select the scientific experts who will testify, advise, or mediate, and parties involved with the case may identify experts they perceive to be sympathetic to their perspective, rather than seeking out the most knowledgeable scientists or scientists whose interpretation of the evidence represents that of most of the relevant scientific community. In addition, in litigation, attorneys and judges determine how and what scientific evidence is presented, potentially limiting the jury’s access to certain information. In contrast, interactions within the scientific community are not tightly regulated in terms of who is permitted to present and assess what evidence and in what ways, although journal editors and peer reviewers do play a role in determining what scientific evidence is available for review through publications. In science, debates relating to scientific evidence generally play out over many years, most notably by interactions among any subject matter experts who opt into the discussion, without imposed limits regarding what information can be used to buttress or refute an argument.

Precedent and Self-Correction

In the United States, precedent is built into our legal system. According to stare decisis, cases with the same facts are expected to be decided in the same way. Though precedent can be overturned, the hierarchical structure of the judiciary makes this occurrence rarer than it would be otherwise, and precedence remains a powerful doctrine in the law. Precedence has a limited role in science, on the other hand. Hypotheses are judged according to the specific evidence relevant to them, and scientists are expected to give up a previously accepted hypothesis if new evidence or interpretations warrant it. This is considered a normal part of good science, and science has built in mechanisms that allow for self-correction. Science is, of course, done by people, who may cling to ideas or practices, in institutions with cultures that may be slow to change, so science does experience a form of precedent. But because this is not a formal mechanism of science, and indeed conflicts with the idealized modes of reasoning that scientists expect of themselves, precedence is less important in science than in the law. This

Page 109 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

difference can contribute to a lag time between a shift in scientific consensus and when this consensus is consistently applied to proceedings in courts. For example, bitemark evidence has little methodologically sound science to support its use (see the Reference Guide on Forensic Feature Comparison Evidence, in this manual), a fact that started to become clear in the early 2000s.¹⁵² Yet, bitemark evidence continues to be admitted in some courts.

Conclusion

We’ve presented science as a human endeavor emerging out of a community that has tasked itself with working together to answer a wide variety of questions about the natural and human world through an iterative, self-correcting process. While science is often slow and its hypotheses fundamentally tentative, it is nonetheless a reliable means of generating explanations. On the broadest scale, science helps us understand the way the world is, connects disparate phenomena, and makes accurate and consistent predictions. At a practical level, science helps us solve problems, develop new technologies, make decisions, form policies, and, as this manual illustrates, administer justice. Subsequent reference guides in this manual provide a road map to the scientific findings that are most likely to factor into judicial decisions, guiding judges and juries in their search for truth and in their service to the law.

152. Michael J. Saks et al., Forensic Bitemark Identification: Weak Foundations, Exaggerated Claims, 3 J.L. & Biosciences 538–75 (2016), https://doi.org/10.1093/jlb/lsw045.

Page 110 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Glossary of Terms

auxiliary hypothesis. An assumption made in the context of testing a main hypothesis.

blinding. The concealment of comparison group membership so that this knowledge does not bias the study outcome.

control group. A condition or group that does not receive the experimental manipulation, which serves to increase confidence that the change observed in the experimental group is caused by the manipulation or factor in question.

controlled experiment. An experiment that seeks to keep variables other than the experimental manipulation the same to help isolate the cause of any change.

effect size. A statistic that communicates the strength of an association between variables.

error. In reference to statistics, the potential difference between a computed or measured value and the true value.

experiment. A testing methodology that involves intentionally manipulating some factor in a system to learn how that affects the outcome.

fallibilism. The principle that holds that scientific knowledge is always open to rejection or revision, no matter how much prior evidence supports it.

falsification. A now outdated account of scientific justification, which proposes that science can only prove hypotheses wrong and that scientific acceptance arises through a process of elimination.

hypothesis. Within science, a potential explanation for a natural phenomenon, regardless of the extent to which the explanation has been investigated or the amount of evidence supporting or refuting it, although this term is inconsistently applied and its use is more or less common in different disciplines and at different points in history.

law. Within science, an often-mathematical statement of the relationship among observable phenomena, although this term is inconsistently applied and its use is more or less common in different disciplines and at different points in history.

meta-analysis. A study that uses statistics to analyze data from multiple other studies, aggregating and synthesizing their results.

mixed methods. A research approach that uses a combination of qualitative and quantitative techniques within a single investigation to detect patterns and to form and test hypotheses.

model. Usually, a mathematical representation of hypothesized mechanisms and interactions within a system that can be used to investigate the impact

Page 111 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

of parameter changes on predicted system outcomes, although this term is inconsistently applied and its use is more or less common in different disciplines and at different points in history.

natural. Concerning any element of the physical universe, whether made by humans or not.

peer review. In science, a standardized process in which journal articles (or other scientific communications, such as funding applications) are vetted by other subject matter experts before acceptance or publication.

placebo. In medical studies, a treatment, medicine, or therapy given to study participants that is not known to have a therapeutic effect on the condition of interest.

power. A measure of a statistical test’s ability to detect an effect that is present.

prediction. A potential outcome of a scientific test that is arrived at by logically reasoning about what we would expect to observe if a hypothesis were true or false.

preprint. A fully drafted scientific journal article that includes all the features of a published article but that has not yet been accepted to a journal for publication.

p-value. A statistic that gives the calculated probability that the null hypothesis could be true even given the observed differences between conditions.

randomized controlled trial. An experimental design that randomly assigns participants to the experimental or control group to better control variables across the groups.

replication. Herein, a study carried out with the intent of mimicking another to the closest degree possible.

replication crisis. A series of observations, beginning around 2010, suggesting that the results of many published scientific studies are not replicable and their conclusions potentially incorrect.

science. A body of knowledge regarding the natural world and the process for building that knowledge based on evidence acquired through observation, experiment, and simulation.

systematic review. A publication type that attempts to synthesize and summarize the current state of research on a topic.

theory. Within science, a concise, coherent, and predictive explanation for a broad range of natural phenomena that integrates and makes sense of many hypotheses, although this term is inconsistently applied and its use is more or less common in different disciplines and at different points in history.

uncertainty. In reference to statistics, a parameter that indicates the dispersion of values within which the true value is likely to fall.

Page 112 Cite Bookmark

Suggested Citation: "How Science Works." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.