Reference Manual on Scientific Evidence: Fourth Edition (2025)

Chapter: Reference Guide on Medical Testimony

Previous Chapter: Reference Guide on Toxicology
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Reference Guide on Medical Testimony

JOHN B. WONG, LAWRENCE O. GOSTIN, AND OSCAR A. CABRERA

John B. Wong, M.D., is Vice Chair of Academic Affairs and Chief of the Division of Clinical Decision Making in the Department of Medicine at Tufts Medical Center and Professor of Medicine at Tufts University School of Medicine.

Lawrence O. Gostin, J.D., is Linda D. and Timothy J. O’Neill Professor of Global Health Law and Faculty Director of the O’Neill Institute for National and Global Health Law at Georgetown University Law Center.

Oscar A. Cabrera, Abogado, LL.M., is Deputy Director of the O’Neill Institute for National and Global Health Law and an Adjunct Professor of Law at Georgetown University Law Center.

CONTENTS

Introduction

Medical Testimony Introduction

Medical Versus Legal Terminology

Testimony by Physicians

Medical Expertise and Admissibility

Medical Care

Medical Education and Training

Medical School

Postgraduate Training

Licensure and Credentialing

Continuing Medical Education

Organization of Medical Care

Patient Care

Goals

Diagnostic Process

Medical Decision-Making

Clinical-Reasoning Process

Hypothetico-Deductive Reasoning

Heuristics: Intuitive Pattern Recognition

Diagnostic Reasoning

Symptoms and Signs

Bayes’ Rule

Probabilistic Reasoning

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Introduction

Physicians are a common sight in today’s courtroom. A survey of federal judges published in 2002 found that medical and mental health experts constitute more than 40% of testifying experts.1 Medical evidence is a common element in product-liability suits,2 workers’ compensation disputes,3 medical malpractice suits,4 and personal-injury cases.5 Medical testimony may also be critical in certain kinds of criminal cases.6 This reference guide introduces types of evidence that physicians use to make judgments as treating physicians or as experts retained by one of the parties in a case for federal and state judges, emphasizing the tools and basic concepts of diagnostic reasoning and clinical decision-making and highlighting the challenges in testifying as medical experts. The “Medical Care” and “Medical Decision-Making” sections of this guide explain in detail the practice of medicine, including medical education and training, the structure and organization of healthcare, the elements of patient care, and, most importantly,

1. Joe S. Cecil, Ten Years of Judicial Gatekeeping Under Daubert, 95 Am. J. Pub. Health S74–S80 (2005), https://doi.org/10.2105/AJPH.2004.044776.

2. See, e.g., In re Bextra & Celebrex Mktg. Sales Pracs. & Prod. Liab., 524 F. Supp. 2d 1166 (N.D. Cal. 2007) (thoroughly reviewing the proffered testimony of plaintiff’s expert cardiologist and neurologist in a products-liability suit alleging that defendant’s arthritis pain medication caused serious cardiovascular injury).

3. See, e.g., AT&T Alascom v. Orchitt, 161 P.3d 1232 (Alaska 2007) (affirming the decision of the state workers’ compensation board and rejecting appellant’s challenges to worker’s experts); Butts v. Dept. of Labor & Workforce Dev., 467 P.3d 231 (Alaska 2020) (reviewing the use of medical evidence to determine compensation).

4. See, e.g., Schneider ex rel. Est. of Schneider v. Fried, 320 F.3d 396 (3d Cir. 2003) (allowing a physician to testify in a malpractice case regarding whether administering a particular drug during angioplasty was within the standard of care); Ellison v. United States, 753 F. Supp. 2d 468 (E.D. Pa. 2010) (holding that oral surgeon’s testimony was relevant and reliable); Hemsley v. Langdon, 909 N.W.2d 59 (Neb. 2018) (holding that the district court did not abuse its discretion by admitting doctor’s expert testimony); Gonzales v. Neb. Pediatric Prac., Inc., 955 N.W.2d 696 (Neb. 2021) (admitting the testimony of a family and emergency-room physician to prove another physician’s misdiagnosis).

5. See, e.g., Epp v. Lauby, 715 N.W.2d 501 (Neb. 2006) (detailing the opinions of two physicians regarding whether plaintiff’s fibromyalgia resulted from an automobile accident with two defendants); Marsh v. Valyou, 977 So. 2d 543 (Fla. 2007) (analyzing expert testimony linking a car accident to fibromyalgia); Britt v. Wal-Mart Stores E., LP, 599 F. Supp. 3d 1259 (S.D. Fla. 2022) (deciding that a surgeon could testify competently as to his opinions on whether or not plaintiff had sustained a permanent injury based on the information disclosed during the course of plaintiff’s treatment).

6. See State v. Price, 171 P.3d 1223 (Mont. 2007) (discussing an assault case in which a physician testified regarding the potential for a stun gun to cause serious bodily harm); People v. Unger, 749 N.W.2d 272 (Mich. Ct. App. 2008) (a second-degree murder case involving testimony of a forensic pathologist and neuropathologist); State v. Greene, 951 So.2d 1226 (La. Ct. App. 2007) (regarding a child-sexual-battery and child-rape case involving the testimony of a board-certified pediatrician); Unger v. Bergh, 742 F. App’x 55 (6th Cir. 2018) (discussing the handling of expert testimony by trial counsel).

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

the processes of diagnostic reasoning and medical judgment that physicians use to diagnose and treat their patients. Special attention is given to the physician-patient relationship and to the types of evidence that physicians use to make medical judgments. Following the introduction, the section titled “Medical Testimony Introduction” identifies a few overarching theoretical issues that courts face in translating the methods and techniques customary in the medical profession in a manner that will serve the court’s inquiry. In an effort to make each issue more salient, examples from case law are offered when they are illustrative.

Medical Testimony Introduction

Medical Versus Legal Terminology

Because medical testimony is common in the courtroom generally and is indispensable to certain cases, courts have employed some medical terms in ways that differ from their use by the medical profession. Differential diagnosis, for example, is an accepted method that a medical expert may employ to offer expert testimony that satisfies Federal Rule of Evidence 702.7 In the legal context it refers to a technique “in which a physician first rules in all scientifically plausible causes of plaintiff’s injury, then rules out least plausible causes of injury until the most likely cause remains, thereby reaching a conclusion as to whether defendant’s product caused injury.”8 But in the medical context, differential diagnosis

7. See Liesa L. Richter & Daniel J. Capra, The Admissibility of Expert Testimony, in this manual.

8. Wilson v. Taser Int’l, Inc., 303 F. App’x 708 (11th Cir. 2008) (“[N]onetheless, Dr. Meier did not perform a differential diagnosis or any tests on Wilson to rule out osteoporosis and these corresponding alternative mechanisms of injury. Although a medical expert need not rule out every possible alternative in order to form an opinion on causation, expert opinion testimony is properly excluded as unreliable if the doctor ‘engaged in very few standard diagnostic techniques by which doctors normally rule out alternative causes and the doctor offered no good explanation as to why his or her conclusion remained reliable’ or if ‘the defendants pointed to some likely cause of the plaintiff’s illness other than the defendants’ action and [the doctor] offered no reasonable explanation as to why he or she still believed that the defendants’ actions were a substantial factor in bringing about that illness.’”); Williams v. Allen, 542 F.3d 1326, 1333 (11th Cir. 2008) (“Williams also offered testimony from Dr. Eliot Gelwan, a psychiatrist specializing in psychopathology and differential diagnosis. Dr. Gelwan conducted a thorough investigation into Williams’ background, relying on a wide range of data sources. He conducted extensive interviews with Williams and with fourteen other individuals who knew Williams at various points in his life.”) (involving a capital murder defendant petitioning for habeas corpus offering a supporting expert witness); Bland v. Verizon Wireless, L.L.C., 538 F.3d 893, 897 (8th Cir. 2008) (“Bland asserts Dr. Sprince conducted a differential diagnosis which supports Dr. Sprince’s causation opinion. We have held, ‘a medical opinion about causation, based upon a proper differential diagnosis, is sufficiently reliable to satisfy Daubert.’ A ‘differential diagnosis [is] a technique that identifies the cause of a medical condition by eliminating the likely causes until the most probable cause is isolated.’”) (citations

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

refers to a set of diseases that physicians consider as possible causes for the symptoms a patient is suffering or signs that a patient exhibits.9 By identifying the likely potential causes of a patient’s disease or condition and weighing the risks of harms and benefits of additional testing or treatment, physicians then try to determine the most appropriate approach—testing, medication, or surgery, for example.10

Courts have used the term differential etiology interchangeably with differential diagnosis.11 In medicine, etiology starts with the study of causation in disease,12 but differential etiology is a legal invention not used by physicians. In general, both differential etiology and differential diagnosis are concerned with establishing or refuting causation between an external cause and a plaintiff’s condition. Depending on the type of case and the legal standard, a medical expert may testify regarding specific causation, general causation, or both. Keep in mind that physicians typically make decisions about treatment and make

omitted) (stating expert’s incomplete execution of differential diagnosis procedure rendered expert testimony unsatisfactory for Daubert standard); Lash v. Hollis, 525 F.3d 636, 640 (8th Cir. 2008) (“Further, even if the treating physician had specifically opined that the Taser discharges caused rhabdomyolysis in Lash Sr., the physician offered no explanation of a differential diagnosis or other scientific methodology tending to show that the Taser shocks were a more likely cause than the myriad other possible causes suggested by the evidence.”) (finding lack of expert testimony with differential diagnosis enough to render evidence insufficient for jury to find causation in personal-injury suit); Feit v. Great West Life & Annuity Ins. Co., 271 F. App’x 246, 254 (3d Cir. 2008) (“However, although this Court generally recognizes differential diagnosis as a reliable methodology the differential diagnosis must be properly performed in order to be reliable. To properly perform a differential diagnosis, an expert must perform two steps: (1) ‘Rule in’ all possible causes of Dr. Feit’s death and (2) ‘Rule out’ causes through a process of elimination whereby the last remaining potential cause is deemed the most likely cause of death.”) (citations omitted) (ruling that district court was not in error for excluding expert medical testimony that relied on an improperly performed differential diagnosis); Saldana v. Delta Airlines, Inc., 1:19 CIV. 027 (CAK), 2021 WL 4710811 (D.V.I. Oct. 8, 2021) (quoting Feit, also considering that differential diagnosis had been properly performed in the case to form the opinion that the plaintiff had suffered a heart attack as a result of hurrying to catch the connecting flight).

9. Stedman’s Medical Dictionary 531 (28th ed. 2006) (defining differential diagnosis as “the determination of which of two or more diseases with similar symptoms is the one from which the patient is suffering, by a systematic comparison and contrasting of the clinical findings.”).

10. The Concise Dictionary of Medical-Legal Terms 36 (1998) (definition of differential diagnosis).

11. See Proctor v. Fluor Enters., Inc., 494 F.3d 1337 (11th Cir. 2007) (testifying medical expert employed differential etiology to reach a conclusion regarding the cause of plaintiff’s stroke). But see McClain v. Metabolife Int’l, Inc., 401 F.3d 1233, 1252 (11th Cir. 2005) (distinguishing differential diagnosis from differential etiology, with the former closer to the medical definition and the latter employed as a technique to determine external causation); Brown v. Burlington N. Santa Fe Ry. Co., 765 F.3d 765 (7th Cir. 2014) (indicating that differential diagnosis focuses on the identity of an ailment, while differential etiology focuses on the cause of the ailment).

12. Stedman’s Medical Dictionary 675 (28th ed. 2006) (defining etiology as “the science and study of the causes of disease and their mode of operation”). For a discussion of the term etiology in epidemiology studies, see Steve C. Gold et al., Reference Guide on Epidemiology, section titled “Specific Causation,” in this manual.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

inferences about the possible diseases (different diagnoses) that may cause the presenting symptoms or signs to arrive at an appropriate treatment based on potential diagnoses when the diagnosis alters the optimal treatment. For example, stroke symptoms (the disease) can be caused by a variety of etiologies, mechanisms by which oxygen to the brain is disturbed (e.g., a burst brain aneurysm, a blood clot or an atherosclerotic plaque traveling to the brain, or other causes), thereby motivating further diagnostic testing to identify the etiology and treat appropriately. Physicians also use inference to help narrow potential diseases by identifying associated symptoms and risk factors—for example, chest pain due to coronary artery disease with a history of high blood pressure, high cholesterol, smoking, or occurring with exertion. The focus, however, is on diagnosis, typically with residual uncertainty and a weighing of the likelihood of that disease and the benefits and harms of treatment in a patient’s particular health context—age, sex, gender, past medical history, health behaviors and exposures (e.g., smoking, alcohol, occupational hazards, toxins), and family history—sometimes requiring revisiting the patient’s response to treatment over time and possibly ordering additional tests or treatments in an iterative approach. Most treating physicians will less commonly be familiar with differential etiology. General causation refers to whether the plaintiff’s injury could have been caused by the defendant or a product produced by the defendant, while specific causation is established only when the defendant’s actions or product actually caused the harm.13 An opinion by a testifying physician may be offered in support of both types of causation.14

Courts also refer to medical certainty or probability in ways that differ from their use in medicine. The standards of “reasonable medical certainty”15 and “reasonable medical probability” are also terms of art in the law that have no analog for a practicing physician.16 As detailed in the “Medical Decision-Making

13. See Amorgianos v. Nat’l R.R. Passenger Corp., 303 F.3d 256, 268 (2d Cir. 2002).

14. See, e.g., Ruggiero v. Warner-Lambert Co., 424 F.3d 249 (2d Cir. 2005) (excluding testifying expert’s differential diagnosis in support of a theory of general causation because it was not supported by sufficient evidence); Milward v. Rust-Oleum Corp., 820 F.3d 469 (1st Cir. 2016) (establishing that the plaintiffs had the burden of establishing, through expert testimony, general and specific causation, and analyzing the admissibility of an expert witness to establish the latter).

15. See, e.g., the official reporter’s note in Restatement (Third) of Torts: Liability for Physical and Emotional Harm § 28 cmt. e (Am. L. Inst. 2010), which exhaustively reviews the case law and law-review articles on this topic and finds that at least thirty-eight states have concluded that expert testimony otherwise admissible need not be expressed to a “reasonable degree of medical or scientific certainty.”

16. See, e.g., Dallas v. Burlington N., Inc., 689 P.2d 273, 277 (Mont. 1984) (“‘[R]easonable medical certainty’ standard; the term is not well understood by the medical profession. Little, if anything, is ‘certain’ in science. The term was adopted in law to assure that testimony received by the fact finder was not merely conjectural but rather was sufficiently probative to be reliable.”); Laue v. Voyles, CV 13-31-BU-CSO, 2014 WL 12888669 (D. Mont. Apr. 18, 2014) (quoting the same passage from Dallas). This reference guide will not probe substantive legal standards in any detail, but there are substantive differences in admissibility standards for medical evidence between federal

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

section below, diagnostic reasoning and medical evidence are aimed at recommending the best therapeutic option for a patient. Although most courts have interpreted “reasonable medical certainty” to mean a preponderance of the evidence,17 physicians often work with multiple hypotheses while diagnosing and treating a patient without any “standard of proof” to satisfy.

Courts also use standard of care as a legal term that varies from state to state. In healthcare, medical-specialty societies use terms such as standards (high degree of certainty), guidelines (varying degrees of certainty), and consensus statements (expert opinion where controversy exists).18 Per the Institute of Medicine (IOM), “Practice Guidelines are systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances”; “Medical Review Criteria are systematically developed statements that can be used to assess the appropriateness of specific health care decisions, services, and outcomes”; “Standards of Quality are authoritative statements of (1) minimum levels of acceptable performance or results, (2) excellent levels of performance or results, or (3) the range of acceptable performance or results”; and “Performance Measures are methods or instruments to estimate or monitor the extent to which the actions of a health care practitioner or provider conform to practice guidelines, medical review criteria, or standards of quality.”19 In 2011, the Institute of Medicine updated the definition: “Clinical practice guidelines are statements that include recommendations intended to optimize

and state courts. See Robin Kundis Craig, When Daubert Gets Erie: Medical Certainty and Medical Expert Testimony in Federal Court, 77 Denv. U. L. Rev. 69 (1999).

17. See, e.g., Sharpe v. United States, 230 F.R.D. 452, 460 (E.D. Va. 2005) (“It is not enough for the plaintiff’s expert to testify that the defendant’s negligence might or may have caused the injury on which the plaintiff bases her claim. The expert must establish that the defendant’s negligence was ‘more likely’ or ‘more probably’ the cause of the plaintiff’s injury . . .”); W.C. v. Sec’y of Health & Hum. Servs., 100 Fed. Cl. 440 (2011), aff’d, 704 F.3d 1352 (Fed. Cir. 2013) (finding that claimant had not established by preponderant evidence a medical theory causally connecting his significantly worsened multiple sclerosis to his flu vaccination).

18. In Adams v. Lab. Corp. of Am., 760 F.3d 1322 (11th Cir. 2014), the issue was whether a litigation review of Pap smear slides needed to be done “blinded,” with the plaintiff’s specimen reviewed alongside others and with the expert not knowing which specimen was being challenged in the lawsuit. The court pointed out that there were plenty of times in pathology when retrospective reviews were done without such an elaborate “blinded” process, which seemed more suited to making it difficult to pursue litigation. As the Eleventh Circuit found, “As far as we are aware, this is the first time that an industry group has promulgated a set of guidelines that attempts to define and limit the evidence courts should accept when the group’s members are sued. The members of the CAP and ASC have a substantial interest in making it more difficult for plaintiffs to sue based on alleged negligence in their Pap smear screening, and their guidelines do just that.” Id. at 1331.

19. Comm. to Advise Pub. Health Serv. on Clinical Prac. Guidelines, Inst. Med., Clinical Practice Guidelines: Directions for a New Program 8 (Marilyn J. Field & Kathleen N. Lohr eds., 1990), https://doi.org/10.17226/1626. [hereinafter 1990 CAPHSCPG Report]. The Institute of Medicine is now the National Academy of Medicine.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

patient care that are informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options.”20

Statutes and administrative regulations may also contain terms that are borrowed, often imperfectly, from the medical profession. In these cases, the court may need to examine the intent of the legislature and the term’s usage in the medical profession. If no intent is apparent, the court may need to determine whether the medical definition is the most appropriate one to apply to the statutory language. Whether the language is a medical term of art or a question of law will often dictate the admissibility and weight of evidence.21

Testimony by Physicians

Federal Rule of Civil Procedure 26(a)(2) requires various disclosures regarding expert witnesses. In 2010, the expert disclosure requirements were expanded to include disclosure of anticipated testimony by non-retained experts, including treating physicians, even if those opinions were confined entirely to opinions formed during treatment. Law professor Steven Gensler summarizes the disclosure requirements in this way:

In summary, the disclosures that must be made for a treating physician depend on the nature of the testimony he or she will give. Unless the treating physician is going to be limited to testifying about facts in a lay person capacity, the physician must be disclosed as an expert and must provide either the summary disclosures or an expert report. Whether the treating physician must file a written report or is subject only to summary disclosures depends on the role of the expert. If the treating physician’s expert opinions stay within the scope of treatment and diagnosis, then the physician would not be considered “retained” to provide expert testimony and only summary disclosures would be needed. But if a treating physician is going to offer opinions formed outside the course of treatment and diagnosis, then as to those further opinions the physician is being used in a “retained expert” role and the Rule 26(a)(2)(B)’s report requirement will apply to the extent of that further testimony.22

Thus, the level of disclosure required by Rule 26 turns on whether the treating physician confines his or her testimony to facts and opinions in the context of

20. Comm. on Standards for Developing Trustworthy Clinical Prac. Guidelines, Inst. Med., Clinical Practice Guidelines We Can Trust 4 (Robin Graham et al. eds., 2011), https://doi.org/10.17226/13058. [hereinafter 2011 CSDTCPG Report].

21. See, e.g., Coleman v. Workers’ Comp. Appeal Bd. (Ind. Hosp.), 842 A.2d 349 (Pa. 2004) (holding that since the legislature did not define the medical term physical examination, the common usage of the term is more appropriate than the strict medical definition).

22. Steven S. Gensler, 1 Federal Rules of Civil Procedure: Rules and Commentary, Rule 26 (2023).

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

determining the treatment, or ventures into opinions formed apart from treatment. In practice, parties rarely object on Rule 702 grounds to a doctor’s testimony confined to treatment, even if it includes opinions formed during treatment. But when a treating clinician gives opinions beyond those formed during treatment, such as an opinion on future costs of treatment, Daubert objections are more common.

Although testifying medical experts often will need to tailor their opinions in a way that conforms to the legal standard of causation, the treating physicians generally are more concerned about accurate diagnosis and treatment and are less familiar with differential etiology. As the section titled “Medical Decision-Making” below will demonstrate, when analyzing the patient’s symptoms and making a judgment based on the available medical evidence, a treating physician may not expressly identify a “proximate cause” or “substantial factor.” For example, in order to recommend treatment, a physician does not necessarily need to determine whether a patient’s lung ailment was more likely the result of a long history of tobacco use or prolonged exposure to asbestos if the optimal treatment is the same. In contrast, when testifying as an expert in a case in which an employee with a long history of tobacco use is suing his employer for possible injuries from asbestos exposure in the workplace, physicians may need to make judgments regarding the likelihood that either tobacco or asbestos—or both—could have contributed to the injury.23

Physicians may be asked to testify about patients they have never examined or from whom they have never taken a medical history and make assessments of the proximate cause, increased risk of injury, or likely future injuries.24 A doctor may even need to make medical judgments about a deceased litigant.25 Testifying in all such cases requires making judgments that physicians do not ordinarily make in their profession and adapting their opinion to take into account the appropriate legal standard.26

23. Physicians will testify as experts in cases in which the plaintiff’s condition may be the result of multiple causes. In these cases, the divergence between medical reasoning and legal reasoning is very apparent. See, e.g., Tompkin v. Philip Morris USA, Inc., 362 F.3d 882 (6th Cir. 2004) (affirming district court’s conclusion that testimony offered by defendant’s expert regarding the decedent’s work-related asbestos exposure was not prejudicial in a suit against a tobacco company on behalf of plaintiff’s deceased husband); Mobil Oil Corp. v. Bailey, 187 S.W.3d 265 (Tex. Ct. App. 2006) (involving claims from a worker who had a long history of tobacco use that exposure to asbestos increased his risk of cancer).

24. Visual and verbal information about patients suspected of having acute heart disease influenced a physician’s estimates of the likelihood of disease by 35% (from a mean of 51%, with an absolute change of 18%). See C.P. Friedman et al., Visual Information and the Diagnosis of Chest Pain, 69 Acad. Med. S28–S30 (1994), https://doi.org/10.1097/00001888-199410000-00032.

25. See, e.g., Tompkin, 362 F.3d 882.

26. More than thirty medical specialty societies have issued statements about ethical requirements for their members to testify in court. That also deserves discussion. A prominent example is the American Academy of Orthopedic Surgeons (AAOS), which has an “expert witness affirmation statement,” available at https://perma.cc/H7TA-LC9T.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Medical Expertise and Admissibility

As the “Medical Testimony Introduction” section above suggested, the goal that guides the physician—recommending the best therapeutic options for the patient—means that diagnostic reasoning involves probabilistic judgments concerning several working hypotheses, often considered simultaneously. Legal standards for physician testimony include Federal Rule of Civil Procedure 26(a)(2) requiring various disclosures regarding expert witnesses. The 1993 Advisory Committee Note to Rule 26—when expert disclosure requirements in the absence of a discovery request were added to Rule 26—specifically states that experts as used in Rule 26 are those who testify under Rule 702. Before 2010, the only expert disclosures required of parties were the detailed reports required for experts “retained or specially employed” to give testimony in the case.27 This usually did not include treating physicians, who were testifying because they provided treatment and not because they were hired by a party. In 2010, the expert disclosure requirements were expanded to include non-retained experts who testify at trial.28 This was specifically done to address treating-physician testimony, which normally was not subject to the report requirement but often was pivotal at trial.

Therefore, treating doctors who do more than recite basic facts are giving testimony subject to Rule 702, but the admissibility of their testimony is typically more of an issue—and Daubert challenges are more common—when they provide opinions not formed during the course of their treatment. The “Medical Care” and “Medical Decision-Making” sections below explain in great detail the practice of medicine, including medical education, the structure of healthcare, and, most importantly, the methods that physicians use to diagnose and treat their patients. Special attention is given to the physician-patient relationship and to the types of evidence that physicians use to make medical judgments.

Medical Care

Medical Education and Training

Medical School

The Association of American Medical Colleges (AAMC) oversees accredited U.S. medical schools, as well as 197 Canadian medical schools.29 The Liaison Committee on Medical Education performs the accreditation for AAMC and

27. Fed. R. Civ. P. 26(a)(2)(B).

28. Fed. R. Civ. P. 26(a)(2)(A), (C).

29. Ass’n Am. Med. Colls., Membership, https://perma.cc/49NB-MLXW.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

assesses the quality of post-secondary education by determining whether each institution or program meets established standards for function, structure, and performance. All physicians who wish to be licensed must pass the U.S. Medical Licensing Examination Steps 1, 2, and 3.30

In the United States, the bulk of physicians are allopathic medical doctors (M.D.s) while 11% are doctors of osteopathy (D.O.s).31 The Commission on Osteopathic College Accreditation accredits osteopathic medical schools. Training is similar to that for allopathic physicians but with “special training in the musculoskeletal system, your body’s interconnected system of nerves, muscles and bones.”32 About 25% of current U.S. physicians are foreign medical graduates that

30. U.S. Medical Licensing Examination, General Common Questions, https://perma.cc/P67R-EESM.
Rule 702 requires an expert to be qualified by “knowledge, skill, experience, training, or education.” Planned Parenthood Cincinnati Region v. Taft, 444 F.3d 502, 515 (6th Cir. 2006) (“The State has not appealed the district court’s order refusing to recognize Dr. Crockett as an expert in the critical review of medical literature. Although that order has not been placed before us, the only reason the district court gave for her ruling was that Dr. Crockett did not have any specific training in the critical review of medical literature beyond the training incorporated in her general medical school and residency training. This ruling ignored Dr. Crockett’s testimony that her residency program at Georgetown University put particular emphasis on training residents in the critical review of medical literature, that she had taught classes on the subject, that she had done extensive reading and self-education on the subject, and that she had critically reviewed medical literature for the FDA. If these qualifications are not sufficient to demonstrate expertise, this court is hard-pressed to imagine what qualifications would suffice.”); Davis v. Houston Cnty., Ala. Bd. of Educ., No. 1:06-CV-953-MEF, 2008 WL 410619, at *4 (M.D. Ala. Feb. 13, 2008), aff’d sub nom. Davis v. Houston Cnty., Ala. Bd. of Educ., 291 F. App’x 251 (11th Cir. 2008) (“The Board has moved to exclude all evidence of Freet’s opinions and conclusions related to the cause of Joshua Davis’s behavior at the football game contained in his deposition as well as Freet’s letter to Malcolm Newman. The Board argues that Freet is not qualified to give expert testimony, and that Plaintiff failed to comply with Fed. R. Civ. 26(a)(2)(B) by not providing a report of Freet’s testimony that includes all of the information required by Rule 26(a)(2)(B). . . . In order to consider Freet’s expert opinions, this Court must find that Freet meets the requirements of Fed. R. Evid. 702. . . . Freet is not a medical doctor and never attended medical school. The only evidence of Freet’s qualifications are: approximately five years working for the Department of Veterans Affairs in the vocational rehabilitation program, followed by approximately seven years working in private practice as a ‘licensed professional counselor.’ There is no evidence in the record of Freet’s educational background, or any details of the exact nature of Freet’s work experience.”); Therrien v. Town of Jay, 489 F. Supp. 2d 116, 117 (D. Me. 2007) (“Citing Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 113 S. Ct. 2786, 125 L. Ed. 2d 469 (1993) and Rule 702 of the Federal Rules of Evidence, Officer Gould’s first objection is that Dr. Harding does not possess sufficient expertise to express expert opinions about ‘the mechanism and timing of Plaintiff’s injuries.’ This objection is not well taken. Dr. Harding was graduated from Dartmouth College and Georgetown Medical School; he completed a residency in internal medicine, is board certified in internal medicine, and has been licensed to practice medicine in the state of Maine since 1978.”).

31. Am. Osteopathic Ass’n, What Is a DO?, https://perma.cc/Q2AT-5SPX.

32. Id.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

include both U.S. citizens and foreign nationals.33 Because educational standards and curricula outside the United States and Canada vary, the Education Commission for Foreign Medical Graduates administers a certification exam to determine eligibility for Accreditation Council for Graduate Medical Education (ACGME) accredited U.S. residency and fellowship programs.34

Postgraduate Training

Most medical school graduates seek additional training through residency programs in their chosen specialty.35 Residencies range from three to seven years at teaching hospitals and academic medical centers. In training, residents care for patients while being supervised mostly by board-certified physicians. They also have opportunities to pursue educational and research activities.36 Physician licensure in many states requires the completion of a residency program accredited by the ACGME, which is responsible for all allopathic and osteopathic graduate medical education programs, including over 13,000 residency and fellowship programs in 182 specialties and subspecialties.37 After residency, eligible physicians can take their board certification examinations (see below), and some opt for additional subspecialty fellowship training.

Licensure and Credentialing

Medical practice acts defining the practice of medicine and delegating enforcement to state medical boards exist for each of the fifty states, the District of Columbia, and the U.S. territories. State medical boards award medical licenses, investigate complaints, discipline physicians who violate the law, and evaluate and rehabilitate physicians.

The hospital credentialing process defines physicians’ scope of practice and hospital privileges, that is, the clinical inpatient and outpatient services and procedures they may provide. The process involves verifying medical education, postgraduate training, board certification, professional experience, state licensure, prior credentialing outcomes, medical-board actions, malpractice, and adverse clinical

33. Ass’n of Am. Med. Colls., 2023 U.S. Physician Workforce Data Dashboard, https://perma.cc/J59U-Q3VZ.

34. Educ. Comm’n for Foreign Med. Graduates, About ECFMG, https://perma.cc/C3N8–9HHX.

35. See Brown v. Hamot Med. Ctr., Civil Action No. 05–32E, 2008 WL 55999 (W.D. Pa. Jan. 3, 2008), aff’d, 323 F. App’x 140 (3d Cir. 2009).

36. See Planned Parenthood Cincinnati Region v. Taft, 444 F.3d 502, 515 (6th Cir. 2006).

37. Accreditation Council for Graduate Med. Educ., About the ACGME, https://perma.cc/P8XQ-8ZX2.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

events. Re-credentialing involves an assessment of a physician’s professional or technical competence and performance by evaluating and monitoring the quality of patient care. Licensing and credentialing is identical for allopathic medical doctors and doctors of osteopathy.

The American Board of Medical Specialties (ABMS) provides certification in twenty-four medical boards, forty specialties, and eighty-eight subspecialties38 to maintain and evaluate standards in the profession and to help “physicians and specialists demonstrate their competence and professionalism throughout their careers.”39 Although criteria vary by field, board eligibility requires completing medical school, passing the United States Medical Licensing Examination, finishing an appropriate residency, and evaluation generally with computer-based and, in some cases, oral examinations. ABMS Standards for Continuing Certification, effective January 1, 2024, seek to promote integrated, specialty-specific programs by member boards to support an individual physician’s or medical specialist’s (i.e., diplomate’s) continuing professional development.40 In some cases, specialty organizations have developed their own certification process outside of the ABMS (e.g., the American Board of Bariatric Medicine).41 The American Osteopathic Association (AOA) certifies osteopathic physicians in sixteen specialty boards, issues certificates in twenty-seven primary specialties and forty-eight subspecialties,42 and has a continuous certification process.43

Continuing Medical Education

For relicensure, state medical boards require continuing medical education so that physicians can acquire new knowledge and maintain clinical competence. The Accreditation Council for Continuing Medical Education (ACCME) identifies, develops, and promotes quality standards for organizations providing continuing

38. Am. Bd. Med. Specialties, https://perma.cc/4DSJ-48HJ.

39. Although specialization is a hallmark of modern medical practice, courts have not always required that medical testimony come from a specialist. See, e.g., Gaydar v. Sociedad Instituto Gineco-Quirurgico y Planificacion Familiar, 345 F.3d 15, 24–25 (1st Cir. 2003) (“The proffered expert physician need not be a specialist in a particular medical discipline to render expert testimony relating to that discipline.”); Keller v. Feasterville Fam. Health Care Ctr., 557 F. Supp. 2d 671 (E.D. Pa. 2008) (holding that the expert opinion that patient did not have Alzheimer’s disease was admissible in medical malpractice action, even though treating physicians were internal and family-care physicians, and expert was board certified in family medicine, but never received any specialized education, training, or experience in neuropathology, neurodegenerative diseases, or Alzheimer’s disease).

40. Am. Bd. Med. Specialties, Standards for Continuing Certification, https://perma.cc/5SQ2-UMWP.

41. Am. Bd. Obesity Med., Certifying Physicians in the Treatment of Obesity, https://perma.cc/7HUX-ASF6.

42. Am. Osteopathic Ass’n, AOA Board Certification, https://perma.cc/N5ZQ-88DY.

43. Am. Osteopathic Ass’n, Osteopathic Continuous Certification, https://perma.cc/S4E5-EF4F.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

medical education for physicians44 to help assure physicians, state medical boards, medical societies, state legislatures, continuing medical education providers, and the public that the education meets certain quality standards. For osteopathic physicians, the AOA Board of Trustees also oversees accreditation for osteopathic continuing medical education sponsors through the Council on Continuing Medical Education (CCME).45

Organization of Medical Care

The delivery of healthcare in the United States is highly decentralized and fragmented.46 Healthcare is provided through clinics, hospitals, managed-care organizations, medical groups, multispecialty clinics, integrated delivery systems, specialty standalone hospitals, imaging facilities, skilled-nursing facilities, rehabilitation hospitals, emergency departments, and pharmacy-based and other walk-in clinics. Transitions in care settings involve multiple handoffs among different healthcare professionals and care providers with the need for accurate, timely, and complete transfer of information. As hospitals increasingly belong to a network or system, physicians too have shifted toward larger practices. For the first time, fewer than half of patient-care physicians were in private practice in 2020; therefore, “[n]o single practice type, ownership structure, or size can or should be considered the typical physician practice.”47 The Covid-19 pandemic has accelerated practice changes including telemedicine visits, home hospitalizations, and ambulatory surgery and procedure centers with their own specialty organizations, accreditation bodies, and state regulatory oversight, besides possible federal certification.

Per W. Edwards Deming, “Every system is perfectly designed to get the result that it does.” As a consequence, the U.S. healthcare system yields a life expectancy 5.9 years shorter than the average of comparable countries in 2021, yet it spends nearly twice as much per capita.48 Driven by social and economic inequities, the healthcare system has substantial disparities by race/ethnicity, but also by “socioeconomic status, age, geography, language, gender, disability

44. Accreditation Council for Continuing Med. Educ., About Accreditation, https://perma.cc/P3UF-QTKT.

45. Am. Osteopathic Ass’n, CME, https://perma.cc/6RBB-765P.

46. Comm. on Quality Health Care Am., Inst. Med., Crossing the Quality Chasm: A New Health System for the 21st Century (2001) [hereinafter 2001 CQHCA Report], https://doi.org/10.17226/10027.

47. Carol K. Kane, Am. Med. Ass’n, Recent Changes in Physician Practice Arrangements: Private Practice Dropped to Less Than 50 Percent of Physicians in 2020, Policy Research Perspectives, https://perma.cc/VQD4-RXX9.

48. Shameek Rakshit, Matthew McGough & Krutika Amin, Peterson-KFF Health System Tracker, How Does U.S. Life Expectancy Compare to Other Countries? (Jan. 30, 2024), https://perma.cc/J8T2-P6Q9.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

status, citizenship status, and sexual identity and orientation.” For example, “[b]lack infants in America are now more than twice as likely to die as white infants, a disparity that is wider than it was in 1850, 15 years before the end of slavery.”49

Since a Harvard Medical Practice study found a high incidence of adverse events in hospitalizations and highly publicized errors (fatal medication overdoses and amputation of the limb on the wrong side), there have been safety concerns about the organization of medical care. The Harvard study indicated that adverse events occurred in 3.7% of hospitalizations and that 27.6% of adverse events were due to negligence.50 A report by the IOM estimated that negligence errors resulted in as many as 98,000 deaths in patients hospitalized annually.51,52 The report further highlighted system issues leading to safety events: “The decentralized and fragmented nature of the healthcare delivery system (some would say ‘nonsystem’) also contributes to unsafe conditions for patients and serves as an impediment to efforts to improve safety.” While recognizing that “not all errors result in harm,” the report defined safety as “freedom from accidental injury” and specified two types of error: “the failure of a planned action to be completed as intended or the use of a wrong plan to achieve an aim.”53

The IOM recommended the development of a learning healthcare-delivery system—“a system that both prevents errors and learns from them when they occur. The development of such a system requires, first, a commitment by all stakeholders to a culture of safety and, second, improved information systems.”54

49. Nambdi Ndugga & Samantha Artiga, Peterson-KFF Health System Tracker, Disparities in Health and Health Care: 5 Key Questions and Answers (Apr. 21, 2023), https://perma.cc/ZGU7-U9SB; Linda Villarosa, Why America’s Black Mothers and Babies Are in a Life-or-Death Crisis, N.Y. Times Mag., Apr. 11, 2018, https://www.nytimes.com/2018/04/11/magazine/black-mothers-babies-death-maternal-mortality.html.

50. A systematic review and meta-analysis identified 45 studies published between 2000 to 2018 in hospital settings from around the world and found that 10% of hospitalized patients experienced harm with half of those harms being preventable. See Maria Panagioti et al., Prevalence, Severity, and Nature of Preventable Patient Harm Across Medical Care Settings: Systematic Review and Meta-Analysis, 2019 B.M.J. 366 (Table 1, p.4 of PDF), https://doi.org/10.1136/bmj.l4185. See also Troyen A. Brennan et al., Incidence of Adverse Events and Negligence in Hospitalized Patients—Results of the Harvard Medical Practice Study I, 324 NEJM 370–76 (1991), https://www.nejm.org/doi/full/10.1056/NEJM199102073240604.

51. Comm. on Quality Health Care in Am., Inst. Med., To Err Is Human: Building a Safer Health System (Linda T. Kohn et al. eds., 2000) [hereinafter 2000 CQHCA Report], https://doi.org/10.17226/9728.

52. In 70 studies identified in a systematic review across global medical settings (e.g., primary care to intensive care or surgery) involving 337,025 patients, the overall pooled prevalence of harm was 12% with half of the harms being preventable. Among preventable harms, 12% were severe, e.g., resulting in permanent disability or death. See Panagioti et al., supra note 50.

53. 2000 CQHCA Report, supra note 51, at 3–4.

54. Comm. on Data Standards for Patient Safety, Inst. Med., Patient Safety: Achieving a New Standard for Care 1 (Philip Aspden et al. eds. 2005), https://doi.org.10.17226/10863.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Government and nongovernment institutions have adopted as parts of their mission the assessment and promotion of safety at the healthcare-system level.

Besides patients and their families,55 the work system for healthcare delivery and diagnosis includes organizational characteristics (culture, rules and procedures, leadership, and management), technologies and tools (health information technology and electronic health records), physical environment (layout, distractions, lighting, and noise), and external factors (payment, care-delivery system, legal system, and reporting environment).56 Beyond physicians, medical delivery systems have increasingly incorporated allied health professions, including nurses, nurse practitioners, physician assistants, pharmacists, and therapists. The term clinicians will be used to be inclusive of this expanded diagnostic and care-delivery team.

Patient Care

Goals

The translated classical version of the Hippocratic Oath (c. 400 B.C.), “abstain from whatever is deleterious,” was abbreviated into “first do no harm,” or in Latin, primum non nocere. The IOM describes quality healthcare delivery as “[t]he degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge.” The six aims for improving the healthcare system assert that healthcare should be

  • Safe—avoiding injuries to patients from the care that is intended to help them.
  • Effective—providing services based on scientific knowledge to all who could benefit and refraining from providing services to those not likely to benefit (avoiding underuse and overuse, respectively).
  • Patient-centered—providing care that is respectful of and responsive to individual patient preferences, needs, and values and ensuring that patient values guide all clinical decisions.
  • Timely—reducing waits and sometimes harmful delays for both those who receive and those who give care.
  • Efficient—avoiding waste, including waste of equipment, supplies, ideas, and energy.
  • Equitable—providing care that does not vary in quality because of personal characteristics such as gender, ethnicity, geographic location, and socioeconomic status.57

55. Kathyrn M. McDonald et al., The Patient Is In: Patient Involvement Strategies for Diagnostic Error Mitigation, 22 B.M.J. Quality & Safety ii33 (2013), https://doi.org/10.1136/bmjqs-2012-001623.

56. Comm. on Diagnostic Error in Health Care, Nat’l Acads. Sci., Eng’g & Med., Improving Diagnosis in Health Care 34 (Erin P. Balogh et al. eds., 2015), https://doi.org/10.17226/21794 [hereinafter 2015 CDEHC Report].

57. 2001 CQHCA Report, supra note 46, at 5–6.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

In Crossing the Quality Chasm, the IOM emphasized that care delivery should accommodate individual patient choices and preferences and be customized on the basis of a patient’s needs and values.58 The Charter on Medical Professionalism specifies three fundamental principles: (1) patient welfare or serving the interest of the patient, (2) patient autonomy or empowering patients to make informed decisions, and (3) social justice or fair distribution of healthcare resources.59 When faced with limited resources, such as when a patient gets an intensive care bed during the Covid-19 pandemic, clinicians are faced with the dilemma of choosing between patient care and societal goals.60

Diagnostic Process

The diagnostic problem begins with patients experiencing a health problem that leads to a healthcare system encounter (Figure 1).61 Initial information gathering then leads to information integration and interpretation in which a set of working diagnoses are formed and revised until sufficient information has been gathered in the clinical reasoning process. The sources of information include the clinical history and interview, physical examination, diagnostic testing, and possible referral and consultation.62

The clinical history and interview identify the chief complaint as the specific symptom that led the patient to seek medical attention. The history of the present illness describes the onset and progression of symptoms over time and may include eliciting the presence or absence of symptoms as “pertinent positives or pertinent negatives” that increase or decrease the likelihood of potential diseases under consideration. The past medical history includes prior illnesses, hospitalizations, surgeries, current medications, drug allergies, and lifestyle habits (smoking, alcohol use, illicit drug exposure, dietary habits, and exercise habits). Family history captures illnesses diagnosed in related family members. Social history includes education, employment, and social relationships. These provide a context for the chief complaint. Finally, the review of systems is a comprehensive inquiry of symptoms from various organ systems.

58. Id. at 8.

59. ABIM Found., ACP-ASIM Found. & Eur. Fed’n Internal Med., Medical Professionalism in the New Millennium: A Physician Charter, 136 Annals Internal Med. 243 (2002), https://doi.org/10.7326/0003-4819-136-3-200202050-00012.

60. Medical Professionalism and the Parable of the Craft Guilds, 147 Annals Internal Med. 809 (Harold C. Sox ed., 2007), https://doi.org/10.7326/0003-4819-147-11-200712040-00015.

61. See generally Davoll v. Webb, 194 F.3d 1116, 1138 (10th Cir. 1999) (“A treating physician is not considered an expert witness if he or she testifies about observations based on personal knowledge, including treatment of the party.”).

62. 2015 CDEHC Report, supra note 56, at 32–41.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Figure 1. The diagnostic process.
The diagnostic process Source: From National Academies of Sciences, Engineering, and Medicine, Improving Diagnosis in Health Care 33 (National Academies Press 2015), https://doi.org/10.17226/21794.

Distinct from symptoms that are described by patients, physical-examination findings are considered as signs. Directed physical examination refers to inspecting only the relevant organ systems that may be causing the symptoms.

Based on the set of working diagnoses, diagnostic testing may be ordered to confirm or rule out possible diagnoses. Diagnostic testing is distinguished from surveillance testing or screening because the patient has a symptom whose cause the diagnostic team is seeking to identify for treatment. Depending on residual diagnostic uncertainty, patients may be referred for further consultation with specialists or subspecialists for their diagnostic or therapeutic opinion.

Patients also may seek care to monitor chronic conditions. This places an emphasis on collaborative and continuous care that involves patients, their families, clinicians, long-term care goals and plans, self-management training, and support63 with organizational needs that differ substantially from those necessary for acute episodic complaints. The clinical history and physical examination involve assessing whether the symptoms or signs of that condition have progressed, improved, or stabilized. Surveillance may involve repeating tests at some recommended frequency for a known health risk—for example, a colonoscopy in three years for someone found to have premalignant polyps previously, instead of ten years for someone with a negative colonoscopy.

63. 2001 CQHCA Report, supra note 46, at 27.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Patients may seek preventive health visits. Screening involves testing asymptomatic, otherwise-well patients to find a disease earlier, presumably when it may be more treatable (e.g., cancer) or preventable (e.g., preventing stroke by treating hypertension, a risk factor for stroke). Screening for a population requires that (1) the condition affects quality and length of life in the population; and (2) has a sufficiently high incidence or prevalence to justify any risk of harms associated with the test; (3) a preventive or early treatment is available; (4) an asymptomatic period for early detection is present; (5) an accurate, acceptable, and affordable screening test exists; and (6) as applied to a population, screening benefits should exceed harms. Screening for disease in asymptomatic, otherwise healthy patients has become widely accepted and promulgated.64 In contrast to diagnostic testing prompted by a complaint from a patient, screening involves apparently healthy individuals without any complaint,65 so “every adverse outcome of screening is iatrogenic and entirely preventable” by not screening, including overdiagnosis and overtreatment (diagnoses that never would have been made, e.g., never cause symptoms or mortality resulting in over treatment with potential adverse effects).66

Medical Decision-Making

Uncertainty in defining when a disease is present makes diagnosis, and therefore treatment decisions, difficult: (1) the difference between normal and abnormal is not always well demarcated; (2) many diseases do not progress with certainty (e.g., progression of lobular carcinoma in situ of the breast to invasive breast cancer occurs less than 50% of the time), but rather increase the risk of a poor outcome (e.g., hypertension raises the risk of developing heart disease or stroke); and (3) symptoms, signs, and findings for one disease overlap with others.67 Variation also exists in the ability of physicians to elicit particular symptoms (e.g., in a group of patients interviewed by many physicians, 23% to 40% of the physicians reported cough as being present), observe signs (e.g., only 53% of physicians detected cyanosis—a blue or purple discoloration of the skin resulting from lack of oxygen—when present), or interpret tests (e.g., only 51% of pathologists agreed with each other when examining Pap smear slides with abnormal cells taken from

64. Lisa M. Schwartz et al., Enthusiasm for Cancer Screening in the United States, 291 JAMA 71 (2004), https://doi.org/10.1001/jama.291.1.71.

65. David A. Grimes & Kenneth F. Schulz, Uses and Abuses of Screening Tests, 359 Lancet 881, 881 (2002), https://doi.org/10.1016/S0140–6736(02)07948–5.

66. Id. at 881; William C. Black, Overdiagnosis: An Underrecognized Cause of Confusion and Harm in Cancer Screening, 92 J. Nat’l Cancer Inst. 1280, 1280 (2000), https://doi.org/10.1093/jnci/92.16.1280.

67. David M. Eddy, Variations in Physician Practice: The Role of Uncertainty, 3 Health Affs. 74, 75–76 (1984), https://doi.org/10.1377/hlthaff.3.2.74.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

a woman’s cervix to look for signs of cervical cancer).68 And prognosis (response to disease or treatment) with alternative therapies is in many cases uncertain. A report by the Royal College of Physicians puts it this way:

The practice of medicine is distinguished by the need for judgement in the face of uncertainty. Doctors take responsibility for these judgements and their consequences. A doctor’s up-to-date knowledge and skill provide the explicit scientific and often tacit experiential basis for such judgements. But because so much of medicine’s unpredictability calls for wisdom as well as technical ability, doctors are vulnerable to the charge that their decisions are neither transparent nor accountable.69

Note that a key piece of the diagnostic process in Figure 1 involves communication of the diagnosis and the care-path plan for the diagnosis.

Clinical-Reasoning Process

Studies of clinical problem solving suggest that physicians employ combinations of two diagnostic approaches ranging from hypothetico-deductive reasoning (deliberative and analytical, or thinking slow) to heuristic pattern recognition (quick and intuitive, or thinking fast).70

Hypothetico-Deductive Reasoning

In the hypothetico-deductive approach, based on initial information such as age, gender, and chief complaint, clinicians begin to generate an initial list of potential diseases (hypothesis generation) based on their potential to explain the patient’s observed signs and symptoms (Figure 2)71 within the known capacity

68. Id. at 77–78.

69. Royal Coll. Physicians, Doctors in Society: Medical Professionalism in a Changing World xi, https://perma.cc/82RF-6GX6.

70. Jerome P. Kassirer et al., Learning Clinical Reasoning 5–7, 56–88 (2d ed. 2010); Arthur S. Elstein & Alan Schwartz, Clinical Problem Solving and Diagnostic Decision Making: Selective Review of the Cognitive Literature, 324 B.M.J. 729, 729–30 (2002), https://doi.org/10.1136/bmj.324.7339.729; Jerome P. Kassirer & G. Anthony Gorry, Clinical Problem Solving: A Behavioral Analysis, 89 Annals Internal Med. 245 (1978), https://doi.org.10.7326/0003-4819-89-2-245; Geoffrey Norman, Research in Clinical Reasoning: Past History and Current Trends, 39 Med. Educ. 418 (2005), https://doi.org/10.1111/j.1365-2929.2005.02127.x.

71. Steven N. Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, 130 Annals Internal Med. 995, 996 (1999), https://doi.org/10.7326/0003-4819-130-12-199906150-00008.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Figure 2. Medical deductive and inductive inference.
Medical deductive and inductive inference Source: Used with permission of the American College of Physicians, from Steven N. Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, 130 Annals Internal Med. 995, 996 (1999), https://doi.org/10.7326/0003-4819-130-12-199906150-00008; permission conveyed through Copyright Clearance Center, Inc.

limitations in human short-term memory.72 This initial working list of diagnoses provides a context used to evaluate subsequent information. Based on their disease knowledge, clinicians expect the presence or absence of certain symptoms, risk factors, disease course, signs, or test results for each diagnosis (deductive inference).

As further information emerges (Figure 3), those data are evaluated and interpreted for their consistency with the working list of possibilities and whether those data would increase or decrease the likelihood of each possibility (hypothesis refinement). If the data are inconsistent, those diseases may be dropped and other diagnostic possibilities may be considered (hypothesis modification). The information gathering continues iteratively over time, including possible referral to another clinician. The final cognitive step (diagnostic verification) involves testing the validity of the diagnosis for its coherency (consistency with predisposing risk factors, physiological mechanisms, and resulting manifestations), its adequacy (the ability to account for all normal and abnormal findings and the disease time course), and its parsimony (the simplest single explanation as opposed to requiring the simultaneous occurrence of two or more diseases to explain the findings).73

72. Elstein & Schwartz, supra note 70, at 732; George A. Miller, The Magical Number Seven Plus or Minus Two: Some Limits on Our Capacity for Processing Information, 63 Psych. Rev. 81, 81 (1956), https://doi.org/10.1037/h0043158.

73. Kassirer et al., supra note 70, at 6, 8–16, 89–127.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Figure 3. Clinical-reasoning process from chief complaint to therapeutic decisions.
Clinical-reasoning process from chief complaint to therapeutic decisions
Heuristics: Intuitive Pattern Recognition

Alternatively, heuristics are quick, automatic “rules of thumb” or cognitive shortcuts. In such cases, pattern recognition leads to rapid recognition and a reflexive diagnosis with little cognitive effort.74 For example, a black woman with large shadows from enlarged lymph nodes in her chest X-ray would trigger a presumptive diagnosis of sarcoidosis for many clinicians. The simplifying assumptions involved in heuristics, however, are subject to cognitive biases. For instance, episodic headache, sweating, and a rapid heartbeat form the classic triad seen in patients with a rare adrenal tumor known as a pheochromocytoma that also can cause hypertension. Physicians finding those three symptoms in a patient with hypertension may overestimate the patient’s likelihood of having pheochromocytoma based on representativeness bias, overestimating the likelihood of a less common disease just because case findings resemble those found in that disease.75 Other cognitive errors include availability (overestimating the likelihood of memorable diseases in a subsequent patient because of vivid experience with a prior patient or media attention, and thus underestimating common or routine diseases) and

74. Stephen G. Pauker & John B. Wong, How (Should) Physicians Think?: A Journey from Behavioral Economics to the Bedside, 304 JAMA 1233, 1233–34 (2010), https://doi.org/10.1001/jama.2010.1336.

75. For additional discussion and definition of terms, see section titled “Diagnostic Reasoning” below. Applying Bayes’ rule, about 100 in 100,000 patients with hypertension have pheochromocytoma; this symptom triad occurs in 91% of patients with pheochromocytoma (sensitivity), and does not occur in 94% of those without pheochromocytoma (specificity), and so 6% of those without pheochromocytoma would have this symptom triad. Based on Bayes’ rule, 91 of the 100 individuals with pheochromocytoma (91% times 100) would have this triad, and 5,994 without a pheochromocytoma (6% times 99,900) will have this triad. Thus, among the 100,000 hypertensive patients, 6,085 will have the classic triad, suggesting the possibility of pheochromocytoma, but only 91 out of the 6,085 or 1.5%, will indeed have pheochromocytoma.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

anchoring (insufficient adjustment of the initial likelihood of disease following a positive or negative test).76

Clinical intuition refers to rapid, unconscious processes that select the pertinent findings from the multitude of available data.77 Such expertise results from multiple exposures to patients with similar symptoms and their final diagnosis, is context sensitive, and cannot always be reduced to cause and effect.78 Cognitive research into the development of expertise suggests two competing hypotheses. In instance- or exemplar-based memory, physicians store scripts or stories of prior recalled case examples—for example, visual information such as that in pathology, dermatology, or radiology—and match new cases to those stories. The alternative prototype memory hypothesis is based on a mental model of disease wherein experts store structured clinical “facts” to create abstractions of patients with the disease. These “prototypes” enable experts to link findings to one another, to connect findings to the possible diagnoses, and to predict additional findings necessary to confirm the diagnosis, even in the absence of prior experience with exactly such a case.79

Physicians typically apply hypothetico-deductive approaches when seeing patients with problems outside of their expertise or patients with difficult problems with atypical features within their expertise and apply intuitive pattern recognition for cases within their expertise or less challenging cases. However, diagnostic accuracy appears to depend more on mastery of domain knowledge than on the particular problem-solving method.80

Diagnostic Reasoning

There is no correlation between physicians’ ability to collect data thoroughly and their ability to interpret the data accurately.81 As a decision scientist describes,

You can’t think unless you’ve got distinctions, and we talk about powerful distinctions. I say an expert in anything is an expert because that expert has powerful distinctions. And we don’t mean he or she has memorized the glossary. It’s that you really have a working knowledge of why they’re important and why that distinction was a very valuable one in the history of the subject and in your thinking.82

76. Kassirer et al., supra note 70, at 100–04, 255–62; Elstein & Schwartz, supra note 70, at 730–31.

77. Trisha Greenhalgh, Intuition and Evidence—Uneasy Bedfellows? 52 Brit. J. Gen. Prac. 395, 395 (2002).

78. Id. at 395–96.

79. Kassirer et al., supra note 70, at 11–12; Elstein & Schwartz, supra note 70, at 730–31.

80. Elstein & Schwartz, supra note 70, at 730.

81. Arthur S. Elstein & Alan Schwartz, Clinical Reasoning in Medicine, in Clinical Reasoning in the Health Professions 223, 224 (Joy Higgs et al. eds., 3d ed. 2008).

82. Interview with Ronald Howard on March 9, 2005, The Whitman Institute (document on file with author).

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Symptoms and Signs

Although a test is commonly thought of as a sample from a bodily fluid, tissue, or image, a test also could be the presence or absence of a symptom or physical sign. For example, both influenza and the ancestral Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) can present with similar symptoms. However, a distinguishing symptom is the absence of a runny nose, which is nine times more likely with SARS-CoV-2 (95% vs. 11%). Thus, a runny nose occurs in only 5% of patients with SARS-CoV-2 but 89% of patients with influenza, making influenza 18 times more likely in those with a runny nose.83

Bayes’ Rule

Over 200 years ago, Reverend Thomas Bayes first wrote a paper, published posthumously, that now forms a critical concept in modern medicine: how to estimate the likelihood of disease following a test result using the likelihood of disease prior to testing and the specific test result obtained. Bayesian analysis refers to a method of combining existing evidence or a pretest probability with additional evidence, for example, from the presence or absence of a positive symptom, sign, test, or research-study result to calculate a posterior probability.84

The pretest suspicion of disease or, equivalently, the likelihood or prior probability of disease may be objective, that is, related to incidence (new cases over a specified period of time) or prevalence (existing cases at a particular point in time); based on clinical-prediction rules (e.g., mathematical predictive models to estimate the likelihood of having a stroke or heart attack over the next ten years); or subjective, that is, based on a clinician’s subjective estimated likelihood of disease prior to any testing.85 Bayes’ rule then combines that pretest suspicion with the observed test result. Tests, however, are almost never perfectly accurate. Not everyone with disease has a positive test (i.e., false negatives), and not

83. Nathaniel Hupert et al., Accuracy of Screening for Inhalational Anthrax After a Bioterrorist Attack, 139 Annals Internal Med. 337, 342 (2003), https://doi.org/10.7326/0003-4819-139-5_part_1-200309020-00009; W. Guan et al., Clinical Characteristics of Coronavirus Disease 2019 in China, 382 NEJM 1708, 1713 (2020), https://doi.org/10.1056/NEJMoa2002032.

84. See David H. Kaye & Hal S. Stern, Reference Guide on Statistics and Research Methods, “Bayesian Statistical Methods and Posterior Probabilities,” in this manual.

85. See Gonzalez v. Metro. Transp. Auth., 174 F.3d 1016, 1023 (9th Cir. 1999) (describing the implications of Bayes’ rule for drug testing and noting that a test with the same false-positive rate will generate a higher proportion of false positives to true positives in a population with fewer drug users); see generally Michael O. Finkelstein & William B. Fairley, A Bayesian Approach to Identification Evidence, 83 Harv. L. Rev. 489 (1970), https://doi.org/10.2307/1339656. For a discussion of Bayesian statistics, see David H. Kaye & Hal S. Stern, Reference Guide on Statistics and Research Methods, “Appendix: Conditional Probability and Bayes’ Rule,” in this manual.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

everyone who does not have disease has a negative test (i.e., false positives), which can lead to misestimation of the likelihood of disease.

For example, consider the interpretation of a positive mammogram among women who may have cancer but most do not.86 Suppose 1% or 10 of 1,000 40-year-old women have breast cancer. The mammogram will be truly positive in 80% or 8 of the 10 women with breast cancer but falsely positive in 10% or 99 of the 990 women without breast cancer. Among the 107 (8 plus 99) women with a positive mammogram, 8 have breast cancer, so the positive mammogram increases the likelihood of breast cancer from 1% (10 per 1,000) to 7% (8 per 107). The 7% is the probability of breast cancer among 40-year-old women after a positive mammogram, that is, post-test probability of disease after a positive test or predictive-value positive. The above is described as a natural-frequency format for Bayes’ rule. When presented as probabilities as in Table 1: There is a 1% probability of breast cancer for these 10,000 women with an 80% probability of a positive mammogram for those with breast cancer and a 10% probability of a positive mammogram without breast cancer. Table 1 shows the calculation of Bayes’ rule using probabilities with column C showing the natural frequencies as probabilities and column D multiplying the two probability columns. Column E then divides the column D entries by the total frequency of positive mammograms.

Table 1. A Tabular Form of Bayes’ Rule for Test Interpretation

A B C D E
Prior Probability Probability of a Positive Mammogram Given the Condition in Column A Multiply Columns B and C Post-test Probability (Column D Divided by Sum)
Breast Cancer 1% = 10/1,000 80% = 8/10 0.8% = 8/1,000 7% = 8/107
No Breast Cancer 99% = 990/1,000 10% = 1/10 9.9% = 99/1,000 93% = 99/107
Sum = 10.7% = 107/1,000
Probabilistic Reasoning

There are two types of medical reasoning to infer the likely diagnosis (Figure 2). Traditional medical education teaches disease manifestations by understanding

86. Among 48 physicians, 10% receiving probability data achieved the correct answer versus 46% when given the natural frequency format. Ulrich Hoffrage & Gerd Gigerenzer, Using Natural Frequencies to Improve Diagnostic Inferences, 73 Acad. Med. 538 (1998), https://doi.org/10.1097/00001888-199805000-00024.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

the mechanistic pathways by which the disease causes specific symptoms and signs. For deductive medical inference (deduction), one hypothesizes a specific disease and determines if the manifestations of that disease (e.g., from the literature) account for all of the patient’s observed symptoms and signs.87 Deduction, however, assumes the diseases hypothesized contain the most likely set of diagnoses in a specific patient, yet a set of different diseases may have overlapping manifestations and have varying underlying likelihoods in different contexts. In contrast to deduction, inductive medical inference (induction) goes from observing symptoms and signs (what is known) to quantifying the likelihood of each of the possible diagnoses, allowing for a possibly broader consideration of diseases but with uncertainty attached to the likelihood of each disease.

For example, both influenza and SARS-CoV-2 can cause fever, cough, sore throat, nasal congestion, headache, fatigue, muscle and joint aches, chills, nausea, vomiting, and diarrhea. How might Bayes’ rule help distinguish the cause of the symptoms? To illustrate probabilistic inference, the table below presents another tabular form of Bayes’ rule that starts with two diseases (column A) and then the pretest probability of each disease (column B) (e.g., the observed ratio of the incidence of SARS-CoV-2 versus influenza based on prevailing epidemiologic data) followed by the published likelihood of the observing nasal congestion for each disease.88 Column D multiplies the numbers in columns B and C to calculate the joint likelihood of those results and the prior probability of each disease. Column E shows that the presence of nasal congestion changes the higher epidemiologic likelihood of SARS-CoV-2 to making influenza much more likely. Bayes’ rule provides the only quantitative approach to incorporate findings to calculate revised likelihoods of diseases given a symptom, sign, or test result through induction (in contrast to the deduction).

Table 2. A Tabular Form of Bayes’ Rule for Diagnosis

A B C D E
Prior Probability Probability of Nasal Congestion Given the Condition in Column A Multiply Columns B and C Post-test Probability (Column D Divided by Sum)
SARS-CoV-2 0.75 0.05 0.037 0.14
Influenza 0.25 0.89 0.222 0.86
Sum = 0.259

87. This is analogous to assuming the null hypothesis between the intervention and control arm in a randomized trial. One can then deduce the likelihood of the observed results based on that assumption.

88. Guan et al., supra note 83, at 1713; Hupert et al., supra note 83, at 340.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

How well does a symptom distinguish SARS-CoV-2 from influenza? The informative or discriminating ability of a test can be succinctly summarized as a likelihood ratio. The likelihood-ratio positive is the ratio of true-positive rate (sensitivity) to the false-positive rate (one minus the specificity) and expresses how much more likely disease is to be present following a positive test result. Considering influenza as the disease, the likelihood-ratio positive for nasal congestion is 18 (0.89 ÷ 0.05). Likelihood-ratio positives should always exceed one because they increase the likelihood of disease, and likelihood-ratio negatives should always be less than one. Likelihood ratios exceeding 10 or falling below 0.1 are strong discriminators causing “large” changes in the likelihood of disease; those between 5 and 10 or 0.1 and 0.2 cause “moderate” changes; and those between 2 and 5 or 0.2 and 0.5 cause “small” changes.89 The likelihood ratios succinctly capture the combined effects of sensitivity and specificity to express the impact of a finding on the likelihood of a disease.

Probabilistic inductive reasoning using Bayes’ rule not only applies to interpreting symptoms and signs for differential diagnosis but also to selection and interpretation of diagnostic tests to exclude or confirm the presence of a disease, thereby avoiding faulty information gathering and processing, for example, faulty interpretation of a test. It also explicitly accounts for varying disease prevalence in different contexts due to patient or population characteristics. For example, a patient with chest pain will have different likelihoods of atherosclerotic disease in the heart if they are being seen in primary care or had been referred by primary care to cardiology outpatient settings. This is because the latter often requires a referral from another doctor who judges the risk of heart disease to be higher, so those patients will have a higher likelihood of worrisome chest pain. This influences the interpretation of a test result as well as the likelihood of disease and the perceived benefit of treatment in the consulting cardiologist.90 Beyond differential diagnosis, the prevalence (or pretest probability) of disease in different patient populations affects how well mathematical predictive models perform in other populations (i.e., their generalizability). For example, a logistic regression model that captures the four physical findings that predict strep throat does not perform as well in other populations because the prevalence of strep throat varies by setting.91

89. David A. Grimes & Kenneth F. Schulz, Refining Clinical Diagnosis with Likelihood Ratios, 365 Lancet 1500, 1502 (2005), https://doi.org/10.1016/S0140–6736(05)66422–7.

90. Harold C. Sox Jr., The Baseline Electrocardiogram, 91 Am. J. Med. 573 (1991), https://doi.org/10.1016/0002-9343(91)90208-f.

91. Roy M. Poses et al., The Importance of Disease Prevalence in Transporting Clinical Prediction Rules: The Case of Streptococcal Pharyngitis, 105 Annals Internal Med. 586, 586 (1986), https://doi.org/10.7326/0003-4819-105-4-586 (prediction of streptococcal pharyngitis improved when adjusted for the disease prevalence in the specific clinical setting, e.g., by changing the constant in the logistic regression).

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Prosecutor’s Fallacy

The prosecutor’s fallacy involves the misinterpretation of probabilistic information, such as when a prosecutor argues that if the defendant is guilty, then the probability is high that the compelling evidence will be found, and given presentation of that evidence during the trial, the defendant must be guilty. For example, in a highly publicized British case, solicitor Sally Clark was convicted of double homicide of her first two children in part based on testimony from then-eminent pediatrician Professor Sir Roy Meadow, known for Meadow’s law that “one cot death [the British term for sudden infant death syndrome or SIDS] is a tragedy, two cot deaths is suspicious and, until the contrary is proved, three cot deaths is murder.”92 The Confidential Enquiry for Stillbirths and Deaths in Infancy (CESDI) in Britain estimated a SIDS risk at the time of the testimony of 1 in 1,300 live births overall; Sir Meadow applied a lower risk of a SIDS death of 1 in 8,543 births to account for two characteristics of the Clark family that made SIDS death less likely (nonsmoking and affluent parents), but ignoring ecologic bias when accounting for those characteristics and also assuming the same likelihood for both deaths. By squaring 1 in 8,543 births, he testified that the likelihood of two natural SIDS deaths was “one in 73 million” with the potential implication that all other deaths were all by homicide. With 700,000 births annually in the United Kingdom, Sir Meadow testified that a double SIDS death would be expected to occur “once in a hundred years.”93

In a commentary, Ray Hill, a professor of mathematics, wrote that some infants die from SIDS and some from homicides. Double SIDS and double infant homicides deaths in the same family are both rare events, so what matters are the “relative chances of these.” Using the 700,000 births annually and the incidence data, 538 deaths would be expected from SIDS and 32 from homicide, so on average, SIDS mortality was 17-fold more likely than homicide death. Based on data (albeit sparser data), Hill then estimated the likelihood of a second SIDS death to be 1/228 and a second homicide death to be 1/123, resulting in 2.36 double SIDS deaths and 0.26 double homicide deaths or a 9:1 ratio of double SIDS to double homicide (Table 3).94 Hill wondered “whether the Clark jury would have

92. Ray Hill, Multiple Sudden Infant Deaths: Coincidence or Beyond Coincidence?, 18 Paediatric & Perinatal Epidemiology 320, 320 (2004), https://doi.org/10.1111/j.1365-3016.2004.00560.x.

93. Id. at 325.

94. Bayes’ rule demonstrates how motive in a trial is critical. If the prior probability of an individual committing a crime is zero, despite whatever evidence is presented, the posterior probability of crime by that individual, regardless of the evidence, remains zero. A motive, even if weak, moves the prior probability to be more than zero so that convincing evidence can then raise the likelihood of crime to be more likely than not (i.e., greater than 50%) or, alternatively, beyond a reasonable doubt, e.g., for homicide. For example, if the prior probability of homicide is zero due to the absence of a motive for murder in Table 3, then that zero is multiplied throughout the Homicide row, so the posterior probability for SIDS will always be 1.00 or 100%.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Table 3. Probability of Two Deaths from SIDS vs. Homicides

A B C D E
Infant Deaths in 1 Year from Column A Probability of Second Death Multiply Columns B–C Posterior Probability
SIDS 538 1/228 2.36 0.90
Homicide 32 1/123 0.26 0.10
Sum = 2.62

convicted if, instead of being given the ‘once in a hundred years figure,’ they had been told that second cot deaths occur about four or five times a year and indeed happen more frequently than second infant murders in the same family.”95

Sally Clark was convicted of murder and, after losing her first appeal, won on second appeal in 2003 that would have also allowed appeal based on “flawed statistical evidence” (she died in 2007).96 The president of the Royal Statistical Society urged that “statistical evidence be presented only by appropriately qualified statistical experts.”97 Nobles and Schiff argued for a simpler approach: “[O]ne does not compensate for the misleading evidence; one simply excludes it.”98

The Clark case led a Queen’s Counsel to raise the issues of susceptibility toward systematic distortion in forensic science, evident in two forms: “inherent bias towards producing a positive outcome or result” and “the tendency for an adversarial system of investigation and trial to lead to partisan behavior,” particularly “when the experts used in the course of investigation are also the experts used in the trial.”99 In the 2003 second appeal of the Clark case, the pathologist who changed the cause of death for the first brother after the second death and who also withheld evidence that bacteria were found in the cerebral spinal fluid of the second child stated, “‘It is not my practice to refer to additional results in my post mortem unless they are relevant to the cause of death, as the specimens were referred to another consultant.’ The Court did not consider this approach acceptable. . . . It is in effect the expert trying the case and deciding what is relevant evidence, not the court.”100

95. Hill, supra note 92, at 325.

96. Clare Dyer, Falsely Convicted Sally Clark Dies Suddenly, 334 B.M.J. 602, 604 (Mar. 24, 2007), https://doi.org/10.1136/bmj.39160.770637.DB.

97. Richard Nobles & David Schiff, Misleading Statistics Within Criminal Trials, 47 Med. Sci. Law 7, 10 (2007), https://doi.org/10.1258/rsmmsl.47.1.7.

98. Id.

99. Clare Montgomery, Forensic Science in the Trial of Sally Clark, 44 Med. Sci. L. 185, 185 (2004), https://doi.org/10.1258/rsmmsl.44.3.185.

100. A.R.W. Forrest, Sally Clark: A Lesson for Us All, 43 Sci. Just. 63, 64 (2003), https://doi.org/10.1016/S1355–0306(03)71744–4.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Multiple Diseases and Multiple Tests

The terms sensitivity and specificity apply to a simple situation in which disease is present or absent, and a test can be positive or negative, but terminology and interpretation become more complicated when multiple diseases are under consideration and when multiple test results may occur.101 For example, consider blood in the urine (hematuria), which could be caused by a urinary tract infection, a kidney stone, or a bladder cancer, among many other diseases. The terms sensitivity and specificity are no longer appropriate because disease is not simply present or absent. Instead, they are replaced by the term conditional probabilities; that is, sensitivity is replaced by the likelihood of blood in the urine with a urinary tract infection, or with a kidney stone, or with a bladder cancer. Similarly, a very positive test has a different interpretation than a weakly positive test, and Bayes’ rule can quantify the difference. Results from multiple tests can be combined with Bayes’ rule by applying Bayes’ rule to the first test result and then reapplying Bayes’ rule to subsequent test results with the post-test probability after the first test becoming the pretest for the second test. This approach assumes that the result of the first test does not affect the test characteristics (sensitivity or specificity) of the second test (i.e., that there is conditional independence of each test). When two tests are available, the screening will usually occur first with the high-sensitivity test to detect a high proportion of those with disease (true positives, or “ruling in” disease). Those with a positive first test will then undergo a high-specificity test to reduce the number of individuals who do not have disease (“ruling out” disease) in those with a false-positive first test.

The choice of whether to do a test will depend on the pretest probability. The basic rule of thumb is to do a test only if it could change the action you would take. With a high pretest probability, you would treat unless a test result would drive the probability to such a low level that you wouldn’t treat, which would require a very sensitive test (few false negatives). If the pretest probability is low, you wouldn’t treat unless the post-test probability was high, which requires a very specific test (few false positives).

Genomic Analyses

Bayes’ rule becomes even more relevant in the genomic-medicine era.102 Suppose a genetic test has a sensitivity and specificity of 99.9%, and suppose the probability of disease is 1 in 1,000 if a positive family history is present and 1 in 100,000 if no

101. Kassirer et al., supra note 70, at 21–22. See also Steve C. Gold et al., Reference Guide on Epidemiology, “Specificity of the Association,” in this manual.

102. Isaac S. Kohane et al., The Incidentalome: A Threat to Genomic Medicine, 296 JAMA 212, 213 (2006), https://doi.org/10.1001/jama.296.2.212.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

family history is present. Screening 1,000 individuals with a positive family history for the gene results in two positive tests: one individual truly has a disease, and in the other, the test is a false positive. Screening 10 million individuals without a family history results in 10,100 positive tests, of which 100 individuals have a disease and 10,000 do not. Even with a specificity of 99.99%, if a test screens for 10,000 genes simultaneously, then 63% of individuals will have at least one false-positive test result. Based simply on the genetic test results alone, neither individuals nor physicians would be able to distinguish those with true-positive results from those with false-positive results, thereby potentially leading to inappropriate monitoring or treatment for all with positive test results.

Positivity Criterion

The sensitivity and specificity of a quantitative test rely on setting a positivity criterion, a level for determining what is negative or positive for disease. When the criterion is made stricter (a higher level) for a test where higher is more positive, then sensitivity falls and specificity increases, and if the criterion is made laxer (a lower level), then sensitivity rises, and specificity falls. Depending on the context of the testing, it may be more appropriate to choose a laxer criterion (e.g., screening donated blood for HIV infection where the benefit is reducing transfusion-associated HIV transmission, and the harm is discarding some uninfected units of donated blood) or a stricter one (e.g., screening a low-prevalence population for HIV infection where the benefit is reducing false-positive diagnoses from higher specificity, and the harm is missing some truly HIV-infected individuals from lower sensitivity).103 Thus the benefits of finding and treating a person with disease versus the risk of harm from mislabeling or treating a person without disease should help establish what should be considered negative or positive.

Bayesian vs. Frequentist Statistics

In a randomized trial, frequentist statistics involve assuming a null hypothesis (no difference between an intervention and control) and then calculating the likelihood of the outcomes observed. For example, using a fair coin (null hypothesis), the likelihood of two heads in a row occurring would be 25% (HT, HH, TH, TT), of three heads in a row 12.5%, of four heads in a row 6.25%, and of five heads in a row 3.125%. The alpha or p-value for statistical significance is typically 5%, and some argue that four or five heads in a row would make some suspicious of the coin being unfair. The American Statistical Association has, for

103. Klemens M. Meyer & Stephen G. Pauker, Screening for HIV: Can We Afford the False Positive Rate?, 317 NEJM 238, 240 (1987), https://doi.org/10.1056/NEJM198707233170410.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

the first time, made a statement about statistical practice methods to move away from p < 0.05:

Don’t believe that an association or effect exists just because it was statistically significant. Don’t believe that an association or effect is absent just because it was not statistically significant. Don’t believe that your p-value gives the probability that chance alone produced the observed association or effect or the probability that your test hypothesis is true.104

One of the most highly cited papers in medicine, titled Why Most Published Research Findings Are False, states, “It can be proven that most claimed research findings are false. . . . a consequence of the . . . ill-founded strategy of claiming conclusive research solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05.”105 When a study finds a surprising result with p = 0.05, the overwhelming majority of physicians “will confidently state there is a 95% or greater chance that the null hypothesis is incorrect . . . an understandable but categorically wrong interpretation because the P value” assumes “the null hypothesis is true. It cannot, therefore, be a direct measure of the probability that the null hypothesis is false.”106

Bayesian inference is often described as showing “how belief is altered by data” or “belief calculus” and being subjective, especially when no prior evidence exists, whereas others term it “evidential calculus” by combining the prior odds of the null hypothesis with a Bayes factor as the “weight of the evidence,”107 for example, from a randomized trial.

The Bayes factor differs in many ways from a P value. First, the Bayes factor is not a probability itself but a likelihood ratio of probabilities, i.e., Probability of the Data, given the null hypothesis divided by the Probability of the Data, given the alternative hypothesis, so it can vary from zero to infinity. It requires two hypotheses, making it clear that for evidence to be against the null hypothesis, it must be for some alternative. Second, the Bayes factor depends on the probability of the observed data alone . . . so we begin to understand what represents strong evidence, and weak evidence . . . 108

Standard frequentist statistical and Bayesian methods have tradeoffs, with standard frequentist methods weakest when drawing conclusions from a single

104. Ronald L. Wasserstein et al., Moving to a World Beyond “p<0.05,” 73 Am. Statistician 1, 1 (2019), https://doi.org/10.1080/00031305.2019.1583913.

105. John P. A. Ioannidis, Why Most Published Research Findings Are False, 2 PLoS Med. e124, 0696 (2005), https://doi.org/10.1371/journal.pmed.0020124.

106. Goodman, supra note 71, at 998.

107. Steven N. Goodman, Toward Evidence-Based Medical Statistics. 2: The Bayes Factor, 130 Annals Internal Med. 1005, 1005 (1999), https://doi.org/10.7326/0003-4819-130-12-199906150-00019; see generally Liesa L. Richter & Daniel J. Capra, The Admissibility of Expert Testimony, “Weight of the Evidence Analysis: Closing the Analytical Gap,” in this manual.

108. Goodman, supra note 107, at 1006.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

experiment, whereas Bayesian methods make inductive inference from that experiment. Bayesian methods, however, use prior probability distributions and are imperfect subjective estimates of what is known or unknown, so that over multiple experiments, Bayesians cannot guarantee that their 95% credible intervals (the Bayesian analog of the 95% confidence interval) will hold up. Thus, frequentist criteria can and have been used to evaluate Bayesian and likelihood methods. “Putting P values aside, Bayesian and frequentist approaches each provide an essential perspective that the other lacks. The way in which we balance their sometimes conflicting demands is what makes the process of learning from nature creative, exciting, uncertain, and, most of all, human.”109

Causal Reasoning

While considering risk factors (e.g., age, gender, exposures such as smoking, family history, or medical conditions), physicians will often use any type of evidence110 that might support causation—for example, biological plausibility,111 physiological drug effects, case reports, or temporal proximity112 to an exposure.113 Although physicians use epidemiological studies in their decision-making, “they are accustomed to using any reliable data to assess causality, no matter what their source” because they must make care decisions (e.g., treat or not) even in the face of uncertainty.114 This is in contrast to the courts, which require a higher standard than clinicians or regulators; causation cannot just be “possible,” rather “a ‘preponderance of evidence’ establishes that an injury was caused by an alleged exposure.”115 For physicians, causal reasoning typically involves understanding how abnormalities in physiology, anatomy, genetics, or biochemistry lead to the clinical manifestations of disease. Through such reasoning, physicians develop a causal cascade or “chain or web of causation” linking a sequence of plausible cause-and-effect mechanisms to arrive at the pathogenesis or pathophysiology of a disease. For example, kidney failure leads to poor drug

109. Id. at 1012.

110. Jerome P. Kassirer & Joe S. Cecil, Inconsistency in Evidentiary Standards for Medical Testimony: Disorder in the Courts, 288 JAMA 1382, 1384 (2002), https://doi.org/10.1001/jama.288.11.1382; see also section titled “Evidence-Based Medicine” below for levels of evidence.

111. See Keenan v. Sec’y of Health & Hum. Servs., No. 99–561V, 2007 WL 1231592 (Fed. Cl. Apr. 5, 2007).

112. But see Wilson v. Taser Int’l, Inc., 303 F. App’x 708, 714 (11th Cir. 2008) (“[A]lthough a doctor usually may primarily base his opinion as to the cause of a plaintiff’s injuries on this history where the patient ‘has sustained a common injury in a way that it commonly occurs,’ . . . Dr. Meier could not rely upon the temporal connection between the two events to support his causation opinion in this case.”).

113. Kassirer & Cecil, supra note 110, at 1384.

114. Id.

115. Id.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

excretion, resulting in symptoms or signs of drug toxicity.116 These causal-reasoning pathways center on disease manifestations to help treating clinicians, who are commonly less concerned with distinguishing causation from association, with diagnosis and treatment (e.g., Sir Austin Bradford Hill’s considerations for inferring causation).117

Pattern recognition of concomitant symptoms and signs can trigger a diagnosis. For example, cough, lung lesions, and enlarged breasts (gynecomastia) in a thirty-seven-year-old man could spark the diagnosis of metastatic germ cell cancer.118 More typically, physicians use causal reasoning in diagnostic refinement and verification to examine a diagnosis for its coherency, namely, asking whether its physiological mechanism would be expected to lead to the observed manifestations and whether it adequately accounts for all normal and abnormal findings and the disease time course. Once treatment has been implemented, clinicians must make causal judgments to determine whether alteration in patient status is the result of progression of disease or an adverse consequence of treatment, or whether the absence of improvement results should prompt a change in therapy or even reconsideration of the diagnosis.

Pathophysiological reasoning, however, also can lead to incorrect conclusions. In patients with heart failure with a weakened heart, a class of medications called beta blockers had been thought to be contraindicated because beta blockers would decrease the strength of the heart muscle contraction and slow the then thought to be compensatory increase in heart rate. Subsequent studies found that beta blockers in patients with heart failure actually increased survival without ill effect. Similarly, physicians once thought that atherosclerotic blockages in heart arteries slowly progressed to cause a heart attack, so that revascularizing those plaques through heart bypass surgery would prevent heart attacks.119 Over the past twenty-five years, however, scientific evidence has emerged that small vulnerable atherosclerotic plaques (not amenable to revascularization because of their small size) can suddenly rupture and cause heart attacks. Not surprisingly, revascularization trials involving either bypass surgery or percutaneous interventions such as stenting or angioplasty do relieve symptoms but do not diminish the risk of having a heart attack or improve survival for most patients.120

116. Id.

117. See generally Reference Guide on Exposure Science and Exposure Assessment, Reference Guide on Epidemiology, Reference Guide on Toxicology, and the prologue to those three guides, in this manual.

118. Kassirer et al., supra note 70, at 29.

119. David S. Jones, Visions of a Cure: Visualization, Clinical Trials, and Controversies in Cardiac Therapeutics, 1968–1998, 91 Isis 504, 532 (2000), https://doi.org/10.1086/384853.

120. Thomas A. Trikalinos et al., Percutaneous Coronary Interventions for Non-Acute Coronary Artery Disease: A Quantitative 20-Year Synopsis and a Network Meta-Analysis, 373 Lancet 911, 915 (2009), https://doi.org/10.1016/S0140–6736(09)60319–6; Mark A. Hlatky et al., Coronary Artery Bypass Surgery Compared with Percutaneous Coronary Interventions for Multivessel Disease: A Collaborative

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Although treating physicians121 may testify on both general and specific causation, as with use of evidence for causation, their standards for evidence vary.122 For example, some physicians may stop using a drug after the first reports of adverse effects, and others may continue to use a drug despite evidence of harm from randomized controlled trials. Determining whether an effect is a class effect or is drug-specific can be difficult. In a randomized trial limited to patients with documented weakened hearts, one particular beta blocker was found not to confer a survival benefit, and as a result, the heart failure guidelines limited their recommendation to three beta-blocker drugs with documented mortality benefit in trials.123

Treating physicians may be aware of patient-specific risk factors such as smoking or family history but may not routinely review specialized aspects of such data, for example, toxicology, industrial hygiene, environment, and some aspects of epidemiology. Additional experts may assist in distinguishing general from specific causation by using specialized knowledge to weigh the relative contribution of each putative causative factor to determine “reasonable medical certainty” or “reasonable medical probability.” The determination of general causation involves medical and scientific literature review and the evaluation of epidemiological data, toxicological data, and dose–response relationships.

Causal Inference Methods

It is important to recognize that

[S]cientists report that an evaluation of data and scientific evidence to determine whether an inference of causation is appropriate requires judgment and interpretation. Scientists are subject to their own value judgments and preexisting biases . . . although one scientist or group of scientists comes to one conclusion about factual causation, they recognize that another group that comes to a contrary conclusion might still be “reasonable.” Courts, thus, should be cautious about adopting specific “scientific” principles, taken out of context, to

Analysis of Individual Patient Data from Ten Randomised Trials, 373 Lancet 1190, 1192 (2009), https://doi.org/10.1016/S0140–6736(09)60552–3.

121. See generally Bland v. Verizon Wireless, LLC, 538 F.3d 893 (8th Cir. 2008) (upholding district court’s decision to reject a treating physician’s evidence of causation under Daubert); Cripe v. Henkel Corp., 318 F.R.D. 356 (N.D. Ind.), aff’d, 858 F.3d 1110 (7th Cir. 2017) (indicating that a treating doctor is providing expert testimony if doctor offers opinions based on scientific or technical knowledge).

122. Kassirer & Cecil, supra note 110, at 1384.

123. Paul A. Heidenreich et al., 2022 AHA/ACC/HFSA Guideline for the Management of Heart Failure: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines, 145 Circulation e895, e931 (2022), https://doi.org/10.1161/CIR.0000000000001062.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

formulate bright-line legal rules or conclude that reasonable minds cannot differ about factual causation.124

In an emerging area, “[b]ecause identifying individual causal effects is generally not possible, we now turn our attention to an aggregated causal effect: the average causal effect in a population of individuals.”125 Consider, for example, hormone replacement therapy for postmenopausal women. Multiple observational studies using methods such as case-control, cross-sectional, and cohort designs126 suggested an association between hormone therapy and reduction in heart attack, but such designs are subject to confounding and bias and are particularly weak for causation because in case-control and cross-sectional studies, the sequence of the exposure and outcome is unknown.

To resolve the question, the Women’s Health Initiative (WHI) study randomized women to hormone replacement therapy or placebo and found a statistically significant increase in harmful clot-related disorders—heart attack, stroke, and heart-related mortality—over five years, but evident in the first year after initiation of hormone therapy.127 Heart attacks are caused by blood clots and plaque rupture, and so the results were consistent with the known biological mechanism of estrogens in the clotting cascade. However, patients in the WHI study were, on average, sixty-three years old and therefore not perimenopausal, as were the participants analyzed in the observational studies. In a novel approach, researchers emulated the design and intention-to-treat (ITT) analysis aspect of the WHI randomized trial in the prospective observational Nurses’ Health Study and found that hormone-replacement treatment effects were similar to those from the randomized trial, suggesting that “the discrepancies between WHI and the Nurses’ Health Study ITT estimates could be largely explained by differences in the distribution of time since menopause and length of follow-up.”128 These emerging methods for analyzing non-randomized study designs address “[t]he fundamental problem of causal inference . . . that at most one of the potential outcomes can be realized and thus observed” as a consequence of an action (e.g., taking a pill).129

124. See Restatement (Third) of Torts, § 28 cmt. c, at 403 (2010).

125. Miguel A. Hernán & James M. Robins, Causal Inference: What If 4 (2020).

126. See Steve C. Gold et al., Reference Guide on Epidemiology, “Types of Observational Study Design,” in this manual.

127. Jacques E. Rossouw et al., Risks and Benefits of Estrogen Plus Progestin in Healthy Postmenopausal Women: Principal Results from the Women’s Health Initiative Randomized Controlled Trial, 288 JAMA 321, 327–29 (2002), https://doi.org/10.1001/jama.288.3.321; JoAnn E. Manson et al., Estrogen Plus Progestin and the Risk of Coronary Heart Disease, 349 NEJM 523, 527–28 (2003), https://doi.org/10.1056/NEJMoa030808.

128. Miguel A. Hernán et al., Observational Studies Analyzed Like Randomized Experiments: An Application to Postmenopausal Hormone Therapy and Coronary Heart Disease, 19 Epidemiology 766, 766 (2008), https://doi.org/10.1097/EDE.0b013e3181875e61.

129. Guido W. Imbens & Donald B. Rubin, Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction 6 (2015), https://doi.org/10.1017/CBO9781139025751.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Treatment Decision-Making

Treatment Thresholds

Sir William Osler describes medicine as “a science of uncertainty and an art of probability.”130 Through diagnostic reasoning, clinicians seek to identify the cause of a patient’s complaints and findings to determine the most appropriate therapy to alleviate symptoms or prolong life—including no therapy, as ineffective therapy carries risk for harm. In 1975, Pauker and Kassirer described the threshold approach to treatment decisions.131 For a likelihood of a disease ranging from zero (absent) to one (certainty), they determined that the threshold probability of disease above which patients should be treated depended on the benefits and harms of treating patients with and without disease (Figure 4). The benefit (B) equaled the difference in outcomes of treatment in patients with disease minus their outcomes without treatment; the harm (H) equaled the outcomes in patients without disease who are not treated minus their outcomes if treated. The threshold probability equals H divided by (H + B). If the benefit is high, then the threshold for treatment is low (e.g., antibiotics for pneumonia); if the harm is high, greater diagnostic certainty must exist before treatment (e.g., chemotherapy for cancer usually requires tissue diagnosis). For the diagnostic dilemma of a sarcoidosis or tuberculosis case, tuberculosis (TB) treatment would confer a 30% survival benefit with a 0.15% risk of mortality from TB treatment, so the threshold for TB treatment equals 0.15%/(0.15% + 30%) = 0.15% / 30.15% = 0.5%, far below the 70% estimated likelihood prior to testing by the discussant.132

Figure 4. Treatment threshold.
Treatment threshold

130. Mark E. Silverman et al., The Quotable Osler 46 (2003).

131. Stephen G. Pauker & Jerome P. Kassirer, Therapeutic Decision Making: A Cost-Benefit Analysis, 293 NEJM 229, 230–31 (1975), https://doi.org/10.1056/NEJM197507312930505.

132. Richard I. Kopelman et al., Clinical Problem-Solving: A Little Math Helps the Medicine Go Down, 341 NEJM 435, 437–38 (1999), https://doi.org/10.1056/NEJM199908053410608.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Testing Decision-Making

Testing Thresholds

When should a test be done? A guiding principle is that a test should be performed only if the test result could change the care plan. Thus, if a positive test moves the post-test probability of disease across the treatment threshold from a pretest probability falling in the no-treat range, it should change the plan from no treat to treat (or conversely with a negative test when the pretest probability falls in the treat range). The no-treat-test threshold is the lowest pretest probability at which the post-test probability equals the treatment threshold. When a post-test probability does not cross an action threshold, generally it should not be done.

In 1980, Pauker and Kassirer extended their threshold approach to treatment decisions by defining conceptual thresholds for testing (Figure 5).133 They determined that the threshold probabilities at which patients should be tested or treated (or neither) depended on treatment benefit (B) and harm (H) in patients with and without disease but also the sensitivity and specificity of the test as calculated by the likelihood-ratio positive (LR+ = sensitivity divided by one minus specificity) and likelihood-ratio negative (LR− = one minus sensitivity divided by specificity). The test threshold above which testing should be done equals H divided by (H + (LR+)×B). Because LR+ increases the likelihood of disease, it always exceeds one, so it magnifies the benefit of treatment and makes the test threshold lower than the treat threshold. Similarly, the LR− is always less than one, so it reduces benefit, thereby raising the test–treatment threshold above the treatment threshold.134 In the diagnostic dilemma of TB or sarcoidosis, the LR− for a negative TB skin test (for TB) is 0.26 (0.25/0.95), yielding a test–treatment threshold of 1.9%, and the LR− for an abnormal angiotensin-converting enzyme (ACE) blood test is 0.06 (0.05/0.8), yielding a test–treatment threshold of 7.4%, and even with both LR− tests, the test-treatment threshold rises only to 23% with the discussant’s 70% estimate exceeding all three of these thresholds, supporting empiric TB treatment. In summary, the treatment-threshold probability and the LR− and LR+ of the test divide the probability scale into three parts: no treat, test, and treatment. If the pretest probability is known, the preferred next step can be identified.

133. Stephen G. Pauker & Jerome P. Kassirer, The Threshold Approach to Clinical Decision Making, 302 NEJM 1109, 1110–11 (1980), https://doi.org/10.1056/NEJM198005153022003.

134. Kopelman et al., supra note 132, at 437.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Figure 5. Test and test–treatment thresholds.
Test and test–treatment thresholds
Screening

As articulated by Wilson and Jungner for the World Health Organization in 1968,

[t]he object of screening for disease is to discover those among the apparently well who are in fact suffering from disease. They can then be placed under treatment and, if the disease is communicable, steps can be taken to prevent them from being a danger to their neighbours. In theory, therefore, screening is an admirable method of combating disease, since it should help detect it in its early stages and enable it to be treated adequately before it obtains a firm hold on the community.135

But as they also note, “[i]n practice, there are snags.”136

Decades ago, routine screening tests included ordering anywhere from six to twelve blood tests as part of an annual visit in asymptomatic patients. Normal ranges for biochemical tests are often based on the 95% confidence intervals in a healthy population. By convention, values outside the 2.5% lower and upper extremes are considered to be abnormal, so ordering six blood tests in a healthy individual yields a 74% chance that all six tests will be normal (0.956), that is, a 26% chance that one or more tests may be abnormal, assuming each test is independent of the others. Similarly, when ordering twelve tests in a normal person, there is a 54% chance that all twelve will be normal, so a 46% chance that one or more will be abnormal. Simply ordering tests in healthy individuals in the absence of clinical suspicion of a disease may result in many false-positive test

135. J.M.G. Wilson & G. Jungner, Principles and Practice of Screening for Disease No. 34, at 7, https://iris.who.int/bitstream/handle/10665/37650/WHO_PHP_34.pdf.

136. Id.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

results that can lead to false alarms, anxiety, additional testing, and possible morbidity or mortality from subsequent testing or interventions.137

Even a valueless screening test may appear to be beneficial because of lead-time bias. That is, if screened and unscreened patients have the same prognosis from the time of onset of symptoms to death, screened patients only appear to live longer because the time elapsed from diagnosis by screening to death exceeds the time elapsed from time of symptom onset to death.138 A second bias, length bias, also leads to overestimating the benefit from screening.139 A randomized trial of screening or no screening is conducted over a limited duration from study initiation to termination. Among the unscreened patients, disease only becomes evident through the development of symptoms, which would be more likely with the aggressive form of the disease, conferring a poorer prognosis for these patients. Among the screened patients, the screening test detects patients with both aggressive and indolent forms of the disease. Some of the patients with indolent cancer may not develop symptoms until after the trial ends. These slower growing cases can make the patients detected by screening appear to live longer, enriching the prognosis of the screening group with those patients with indolent cancers.

Screening can also result in pseudo-disease or overdiagnosis, such as the identification of slow-growing cancers that, even if untreated, would never cause symptoms or reduce survival.140 Although lung cancer is commonly thought to be one of the more aggressive cancers, an autopsy study found that one-third of lung cancers were unsuspected prior to autopsy, and nearly all of these patients with unsuspected lung cancer prior to autopsy died from other causes.141 The lifetime risk of dying from prostate cancer is about 3%, yet 60% of men in their sixties have prostate cancer, and so, screening and detecting all men with prostate cancer in their sixties would lead to the treatment of many men who would not have died from prostate cancer.142

137. A radiologist described his own experience to illustrate the clinical aphorism that “the only ‘normal’ patient is one who has not yet undergone a complete work-up.” He had a negative CT scan of the colon, but the CT scan also provided images outside the colon with radiologists identifying lesions in the kidneys, liver, and lungs. This resulted in additional CT scans, a liver biopsy, PET scan, video-aided thoracoscopy (a flexible scope inserted into the chest), and three wedge resections of the lung leading to multiple tubes, medications, and “excruciating pain” that required five weeks for recovery. William J. Casarella, A Patient’s Viewpoint on a Current Controversy, 224 Radiology 927 (2002), https://doi.org/10.1148/radiol.2243020024.

138. Black, supra note 66, at 1280.

139. Id.

140. Id. at 1280–81.

141. Charles K. Chan et al., More Lung Cancer but Better Survival: Implications of Secular Trends in “Necropsy Surprise” Rates, 96 Chest 291, 293 (1989), https://doi.org/10.1378/chest.96.2.291.

142. Karsten J. Jørgensen & Peter C. Gøtzsche, Overdiagnosis in Publicly Organised Mammography Screening Programmes: Systematic Review of Incidence Trends, 339 B.M.J. b2587 at 1 (2009), https://doi.org/10.1136/bmj.b2587; Michael J. Barry, Prostate-Specific–Antigen Testing for Early Diagnosis of Prostate Cancer, 344 NEJM 1373, 1373 (2001), https://doi.org/10.1056/NEJM200105033441806.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Diagnostic Testing

Based on patients’ history and physical examination, clinicians establish diagnostic possibilities. They may then request additional tests to reduce uncertainty and to confirm the diagnosis as part of diagnostic verification. Although theoretically all tests could be ordered, tests should be chosen based on a clinical suspicion because of possible morbidity or even mortality from inappropriate testing. Normative, prescriptive decision models for reasoning in the presence of uncertainty suggest that whether and which tests get ordered should depend on test sensitivity and specificity and the benefit and harm of treatment as discussed in the section titled “Diagnostic Reasoning” above; when testing carries risks for morbidity or mortality harms, the probabilities of disease for which testing should be done become narrower, and so physicians should be more likely to treat empirically or neither test nor treat.143 As sensitivity and specificity increase, the range of probabilities in which testing should be done expands.

Although an abnormal test result may be found, that abnormality may not be causing symptoms. For example, herniated lower back (lumbar spine) discs are found in approximately 25% of healthy individuals without back pain as an incidental finding. If signs such as a foot drop develop, additional muscle and nerve conduction studies could confirm evidence of nerve compromise from the herniated disc, but such tests are painful. Over time, sequential images show that herniated discs have a partial or complete resolution after six months without surgery as the disc shrinks. Therefore, a herniated disc may be seen with CT or MRI scanning in patients with or without symptoms, so just having symptoms and evidence of a herniated disc would not be a sufficient indication for back surgery.144 In the absence of severe or progressive neurological deficits, elective disc surgery could be considered for patients with probable herniated discs who have persistent symptoms and findings consistent with sciatica (not just low back pain) for four to six weeks, but such “patients should be involved in decision making” (see section titled “Patient care” below).145

Tests initially felt to be useful may be found to be less valuable over time.146 Among other potential biases,147 this may occur because of the choice of the

143. Stephen G. Pauker & Jerome P. Kassirer, The Threshold Approach to Clinical Decision Making, 302 NEJM 1109, 1110, 1115–16 (1980), https://doi.org/10.1056NEJM198005153022003.

144. Richard A. Deyo & James N. Weinstein, Low Back Pain, 344 NEJM 363, 366–67 (2001), https://doi.org/10.1056/NEJM200102013440508.

145. Id. at 368.

146. David F. Ransohoff & Alvan R. Feinstein, Problems of Spectrum and Bias in Evaluating the Efficacy of Diagnostic Tests, 299 NEJM 926, 926 (1978), https://doi.org/10.1056/nejm197810262991705.

147. Penny Whiting et al., Sources of Variation and Bias in Studies of Diagnostic Accuracy: A Systematic Review, 140 Annals Internal Med. 189, 192–94 (2004), https://doi.org/10.7326/0003-4819-140-3-2004020430-00010.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

study population used to determine the test’s sensitivity and specificity. For example, an FDA-approved rapid test for HIV infection has a reported specificity of 100%, implying that any positive tests must indicate truly infected individuals, yet one of the populations in which testing is recommended is women who have had prior children and are in labor but have not yet had an HIV test during the pregnancy.148 In fifteen multiparous women, this rapid HIV test resulted in one false-positive test result, yielding a specificity of 93%,149 and so not all pregnant women with positive rapid HIV tests can be assumed to be truly infected.

Biomarker or Image Testing

Once a diagnosis has been established, additional biomarker testing may be performed to establish the extent of disease (e.g., staging of cancer), monitor response to therapy, or direct treatment. In women with breast cancer, for example, finding a genetic marker called the human epidermal growth factor receptor type 2 (HER2, also called HER2/neu) gene identified patients who responded poorly to any of the standard chemotherapeutic agents resulting in a poor prognosis. Illustrative of the era of pharmacogenomics, adjuvant chemotherapy combined with a monoclonal antibody in HER2-positive breast cancer patients has been found to delay progression and prolong survival.150 For the treatment of lymphomas, monitoring of disease response can be done through imaging. The lack of response has been used to indicate treatment failure and to recommend a change in chemotherapy.

Variation and Standards in Medicine

Variation in Medical Care

Studies over the past several decades show substantial geographic variation in the utilization rates for medical care within small areas or local regions (e.g., a twelve-fold variation in the use of tonsillectomy in rural Vermont) and between large areas or widespread regions (e.g., a four- to five-fold variation in the performance of other discretionary surgical procedures such as hip replacement, coronary

148. Food and Drug Administration, OraQuick Rapid HIV-1 Antibody Test, https://www.fda.gov/media/73699/download.

149. Id. at 14.

150. Dennis J. Slamon et al., Use of Chemotherapy Plus a Monoclonal Antibody Against HER2 for Metastatic Breast Cancer That Overexpresses HER2, 344 NEJM 783, 787–89 (2001), https://doi.org/10.1056/NEJM200103153441101.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

bypass surgery, and prostatectomy, among others).151 Even when limiting the analysis to seventy-seven U.S. hospitals with reputations for high-quality care in managing chronic illness, the care that patients received in their last six months of life varied extensively, ranging from hospital stays of nine to twenty-seven days (three-fold variation), intensive-care unit stays of two to ten days (five-fold variation); and physician visits of eighteen to seventy-six (four-fold variation), depending on the hospital at which patients received their care.152

Four categories of variation are recognized: (1) underuse of effective care, (2) issues of patient safety, (3) concern for preference-sensitive care, and (4) notions of supply-sensitive services.153 Effective care refers to treatments that are known to be beneficial and that nearly all patients should receive with little influence of patient preferences—for example, the use of beta blockers following myocardial infarction. The underuse of effective care was illustrated by one prominent study that identified 439 high-quality process measures for 30 conditions and preventive care. In assessing the use of measures that were clearly recommended (i.e., clearly beneficial), they found that only about 50% of patients received these highly recommended care processes.154 Issues of patient safety refer to the execution of care and the occurrence of iatrogenic complications (i.e., complications resulting from healthcare interventions). The IOM emphasized that “[g]etting the right diagnosis is a key aspect of healthcare: it provides an explanation of a patient’s health problem and informs healthcare decisions,” yet “[i]t is likely that most of us will experience at least one diagnostic error in our lifetime, sometimes with devastating consequences.”155 Concern for preference-sensitive care refers to treatment choices that should depend on patient health goals or preferences.156 Prostate surgery helps relieve symptoms of an enlarged prostate (such as frequent urination and waking up at night to urinate) but carries a risk of losing sexual function. Separate from the probability of losing sexual function, in preference-sensitive care, the decision to have prostate surgery depends on how much the enlarged prostate symptoms bother the patient and on how important sexual function is to them, that is, it depends on their preferences and values.157 Finally,

151. John D. Birkmeyer et al., Understanding Regional Variation in the Use of Surgery, 382 Lancet 1121, 1122 (2013), https://doi.org/10.1016/S0140–6736(13)61215–5.

152. John E. Wennberg et al., Use of Hospitals, Physician Visits, and Hospice Care During Last Six Months of Life Among Cohorts Loyal to Highly Respected Hospitals in the United States, 328 B.M.J. 607, 607 (2004), https://doi.org/10.1136/bmj.328.7440.607.

153. John E. Wennberg, Unwarranted Variations in Healthcare Delivery: Implications for Academic Medical Centres, 325 B.M.J. 961, 962–63 (2002), https://doi.org/10.1136/bmj.325.7370.961.

154. Elizabeth A. McGlynn et al., The Quality of Health Care Delivered to Adults in the United States, 348 NEJM 2635, 2642–43 (2003), https://doi.org/10.1056/NEJMsa022615.

155. 2015 CDEHC Report, supra note 56, at 19.

156. Wennberg, supra note 153, at 962.

157. Michael J. Barry et al., Patient Reactions to a Program Designed to Facilitate Patient Participation in Treatment Decisions for Benign Prostatic Hyperplasia, 33 Med. Care 771, 778 (1995), https://doi.org/10.1097/00005650-199508000-00003.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

supply-sensitive services refer to care that depends not on evidence of effectiveness or patient preferences but rather on the availability of services. Specifically, patients living in areas with more doctors or more hospitals experience more office visits, tests, and hospitalizations.158

Evidence-Based Medicine

The variation in the delivery of medical care led to a careful reexamination of diagnostic strategies, therapeutic decision-making, and the use of medical evidence, but it was not the only factor. Other circumstances that set the stage for an intense focus on medical evidence included (1) the development of medical research, including randomized controlled trials and other observational study designs; (2) the growth of diagnostic and therapeutic interventions;159 (3) interest in understanding medical decision-making and how physicians reason;160 and (4) the acceptance of meta-analysis as a method to combine data from multiple randomized trials.161 In response to the above conditions, evidence-based medicine gained prominence in 1992,162 defined by Sackett et al. as “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of the individual patient. It means integrating individual clinical expertise with the best available external clinical evidence from systematic research.”163

Evidence-based medicine contrasts with the traditional informal method of practicing based on anecdotes, applying the most recently read articles, doing what a group of eminent experts recommend, or minimizing costs.164 Rather,

158. Wennberg, supra note 153, at 962–63.

159. Cynthia D. Mulrow & K.N. Lohr, Proof and Policy from Medical Research Evidence, 26 J. Health Pol., Pol’y & L. 249, 250–56 (2001), https://doi.org/10.1215/03616878-26-2-249.

160. Robert S. Ledley & Lee B. Lusted, Reasoning Foundations of Medical Diagnosis: Symbolic Logic, Probability, and Value Theory Aid Our Understanding of How Physicians Reason, 130 Science 9, 9–10 (1959), https://doi.org/10.1126/science.130.3366.9.

161. See Steve C. Gold et al., Reference Guide on Epidemiology, “Methods for Synthesizing or Combining the Results of Multiple Studies,” in this manual; Video Software Dealers Ass’n v. Schwarzenegger, 556 F.3d 950, 963 (9th Cir. 2009) (analyzing a meta-analysis of studies on video games and adolescent behavior); Kennecott Greens Creek Min. Co. v. Mine Safety & Health Admin., 476 F.3d 946, 953 (D.C. Cir. 2007) (reviewing the Mine Safety and Health Administration’s reliance on epidemiological studies and two meta-analyses).

162. Evidence-Based Medicine Working Group, Evidence-Based Medicine: A New Approach to Teaching the Practice of Medicine, 268 JAMA 2420, 2420–21 (1992), https://doi.org/10.1001/jama.1992.03490170092032.

163. David L. Sackett et al., Evidence Based Medicine: What It Is and What It Isn’t, 312 B.M.J. 71, 71 (1996), https://doi.org/10.1136/bmj.312.7023.71.

164. Trisha Greenhalgh, How to Read a Paper: The Basics of Evidence-Based Medicine 5–9 (3d ed. 2006).

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

per Greenhalgh, it is “the use of mathematical estimates of the risks of benefit and harm, derived from high-quality research on population samples, to inform clinical decision making in the diagnosis, investigation or management of individual patients.”165 In a paper from a joint workshop held by the IOM and the Agency for Healthcare Research and Quality that addressed what clinicians consider to be sufficient evidence to justify their clinical practice and treatment decisions, Mulrow and Lohr wrote, “evidence-based medicine stresses a structured critical examination of medical research literature; relatively speaking, it deemphasizes average practice as an adequate standard and personal heuristics.”166

Hierarchy of Medical Evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence. A fundamental principle of evidence-based medicine is that the strength of medical evidence supporting a therapy or strategy is hierarchical (see also section titled “Uncertainty and Tradeoffs in Therapeutic Decision-Making Evidence” below). When ordered from strongest to weakest, a systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.167 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.168 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol169 or lung cancer caused by asbestos170). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative

165. Id. at 1.

166. Mulrow & Lohr, supra note 159, at 253.

167. Gordon H. Guyatt et al., Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice 11 (2d ed. 2008); see also Steve C. Gold et al., Reference Guide on Epidemiology, “Methods for Synthesizing or Combining the Results of Multiple Studies,” in this manual.

168. Nikolaos A. Patsopoulos et al., Relative Citation Impact of Various Study Designs in the Health Sciences, 293 JAMA 2362, 2364–65 (2005), https://doi.org/10.1001/jama.293.19.2362.

169. W.T. Clarke, Fatal Aplastic Anemia and Chloramphenicol, 97 Can. Med. Ass’n J. 815 (1967), https://perma.cc/9BR4–34H5.

170. Michael Gochfeld, Asbestos Exposure in Buildings, in Environmental Medicine 438, 440 (Stuart M. Brooks et al. eds., 1995).

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

novel association between coffee consumption and pancreatic cancer published in a prominent medical journal).171

Just as basic science findings require repeated confirming experiments, evidence about the benefits and harms of medical interventions arise through repetitive observations. A single randomized controlled trial relies on hypothesis testing, specifically assuming the null hypothesis that a new drug is equivalent to the comparator (e.g., placebo). As conceived about 100 years ago, interpreting a trial involves calculating the likelihood of the alpha error (p-value) wherein the study suggests that the drug or device is beneficial, but the “truth” is that it is not, that is, a false-positive study result. Similarly, a beta error is the likelihood of a study finding that the drug or device is not beneficial when the “truth” is that it is, that is, a false-negative study result (1 minus power, where power is the likelihood of a study finding that the drug or device is not beneficial and the truth aligns with that finding).

The choice of which specific error rates to use (e.g., false-positive or p-value or alpha of 0.05) was supposed to depend on a judgment of the relative consequences of the two errors, missing an effective drug (Type II beta error) or considering an ineffective drug to be effective (Type I alpha error) with a stricter (lower) error rate for the latter.172 The null hypothesis, however, assumes equivalence, and so it does not provide any measure of evidence outside of the particular study (e.g., prior studies or biological mechanism or plausibility). Thus, the null hypothesis assumption necessitates abandoning the ability to measure evidence or determine “truth” from a single experiment so that hypothesis testing is thereby “equivalent to a system of justice that is not concerned with which individual defendant is found guilty or innocent (that is, ‘whether each separate hypothesis is true or false’) but tries instead to control the overall number of incorrect verdicts” over time in the long run.173

Cumulative meta-analysis of treatments enables the accumulation of randomized trial evidence to examine trends in efficacy or risks, overcoming issues of underpowered trials with insufficient numbers of patients enrolled to reliably detect a benefit.174 For example, between 1959 and 1988, 33 randomized trials with streptokinase for acute heart attack (myocardial infarction) involving over 35,000 patients were published. As each trial is published, it is combined with all previously published trials. A cumulative meta-analysis found “a consistent, statistically significant reduction in total mortality” p < 0.001 with streptokinase use by 1977, yet an additional 32,600 individuals

171. Brian MacMahon et al., Coffee and Cancer of the Pancreas, 304 NEJM 630, 631–32 (1981), https://doi.org/10.1056/NEJM198103123041102; Dominique S. Michaud et al., Coffee and Alcohol Consumption and the Risk of Pancreatic Cancer in Two Prospective United States Cohorts, 10 Cancer Epidemiology, Biomarkers & Prevention 429, 429 (2001).

172. Goodman, supra note 71, at 998.

173. Id.

174. Joseph Lau et al., Cumulative Meta-Analysis of Therapeutic Trials for Myocardial Infarction, 327 NEJM 248, 248 (1992), https://doi.org/10.1056/NEJM199207233270406.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

underwent randomization through 1988, with half receiving the inferior placebo. In contrast, for many years, physicians used a prophylactic drug called lidocaine to prevent life-threatening heart-rhythm disturbances from acute heart attacks, yet none of the randomized trials of lidocaine demonstrated any benefit, and finally, the cumulative meta-analysis found a trend toward harm. When the results of the meta-analysis were compared with comments in textbooks and review articles,

discrepancies were detected between the meta-analytic patterns of effectiveness in the randomized trials and the recommendations of reviewers [the review article authors]. Review articles often failed to mention important advances or exhibited delays in recommending effective preventive measures. In some cases, treatments that have no effect on mortality or are potentially harmful continued to be recommended by several clinical experts.175

Guidelines

Clinical-practice guidelines are “systematically developed statements to assist practitioner and patient decisions about appropriate healthcare for specific clinical circumstances.”176 Such guidelines have been widely developed and issued by medical-specialty associations, professional societies, government agencies, and healthcare organizations.177 To avoid biases inherent in review articles (particularly single-authored ones) and to encourage transparency and acceptance, the IOM recommended best practices for the development of clinical-practice guidelines, including process transparency, management of conflict of interest, guideline group composition (multidisciplinary and balanced), systematic reviews that meet IOM standards, evidence foundation for rating the strength of the recommendations, recommendation articulation, external review, and updating.178

175. Elliott M. Antman et al., A Comparison of Results of Meta-Analyses of Randomized Control Trials and Recommendations of Clinical Experts: Treatments for Myocardial Infarction, 268 JAMA 240, 240 (1992).

176. 1990 CAPHSCPG Report, supra note 19, at 8.

177. See generally Sofamor Danek Grp., Inc. v. Gaus, 61 F.3d 929 (D.C. Cir. 1995) (reviewing guidelines issued by the Agency for Health Care Policy and Research in light of the Federal Advisory Committee Act); Levine v. Rosen, 616 A.2d 623 (Pa. 1992) (finding that differing guidance from two groups was evidence that reasonable physicians could follow either school of thought); Michelle M. Mello, Of Swords and Shields: The Role of Clinical Practice Guidelines in Medical Malpractice Litigation, 149 U. Pa. L. Rev. 645 (2001).

178. 2011 CSDTCPG Report, supra note 20, at 5–9; Committee on Standards for Systematic Reviews of Comparative Effectiveness Research, Inst. Med., Finding What Works in Health Care: Standards for Systematic Reviews 6–11, 14–15 (Jill Eden et al. eds., 2011), https://doi.org/10.17226/13059.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

The number, length, and diversity of guidelines developed by various professional organizations challenge practicing physicians.179 Different professional organizations may issue guidelines on the same topic but with competing or similar recommendations. The composition of the panel and the processes for developing guideline recommendations may differ. For example, the United States Preventive Services Task Force (USPSTF) is “an independent panel of non-Federal experts in prevention and evidence-based medicine and is composed of primary care providers (such as internists, pediatricians, family physicians, gynecologists/obstetricians, nurses, and health behavior specialists).”180 In their 2016 evaluation of mammography, the USPSTF “recommends biennial screening mammography for women aged 50 to 74 years” but “[t]he decision to start screening mammography in women prior to age 50 years should be an individual one.”181 In contrast, based on a writing group composed of radiologists, the American College of Radiology and Society of Breast Imaging recommend “annual mammography screening beginning at age 40” for all women at average risk.182

For prostate cancer screening, however, the USPSTF’s 2018 update stated, “For men aged 55 to 69 years, the decision to undergo periodic PSA-based screening for prostate cancer should be an individual one and should include discussion of the potential benefits and harms of screening with their clinician.”183 In the 2013 American Urological Association update, a statement panel composed of urologists, primary care physicians, radiation and medical oncologists, and epidemiologists “recommended shared decision-making for men age 55 to 69 years considering PSA-based screening, a target age group for whom benefits may outweigh harms”—essentially identical recommendations.

Practice guidelines provide recommendations on how to evaluate and treat patients, but because they apply to the general case, their recommendations may not apply to a particular individual patient—or some extrapolation may be required, particularly when multiple diseases exist, as in the elderly,184 or when

179. Arthur Hibble et al., Guidelines in General Practice: The New Tower of Babel?, 317 B.M.J. 862, 862 (1998), https://doi.org/10.1136/bmj.317.7162.862.

180. U.S. Preventive Servs. Task Force (USPSTF), Agency for Healthcare Rsch. and Quality, https://perma.cc/ZDT5-X752.

181. Albert L. Siu et al., Screening for Breast Cancer: U.S. Preventive Services Task Force Recommendation Statement, 164 Annals Internal Med. 279, 279, 282–83 (2016), https://doi.org/10.7326/M15-2886; see also section titled “When to Start Screening” in the report.

182. Debra L. Monticciolo et al., Breast Cancer Screening Recommendations Inclusive of All Women at Average Risk: Update from the ACR and Society of Breast Imaging, 18 J. Am. Coll. Radiology 1280, 1280 (2021), https://doi.org/10.1016/j.jacr.2021.04.021.

183. U.S. Preventive Servs. Task Force et al., Screening for Prostate Cancer: U.S. Preventive Services Task Force Recommendation Statement, 319 JAMA 1901, 1901 (2018), https://doi.org/10.1001/jama.2018.3710.

184. Cynthia M. Boyd et al., Clinical Practice Guidelines and Quality of Care for Older Patients with Multiple Comorbid Diseases: Implications for Pay for Performance, 294 JAMA 716, 718–22 (2005),

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

treatment entails competing risks of harm. For example, anticoagulation is generally recommended for patients with atrial fibrillation (an abnormal heart rhythm disturbance) to prevent blood clots in the heart that could flow to the brain and cause a stroke, yet anticoagulation can also lead to life-threatening bleeding; therefore, for individual patients, physicians must weigh the risk of developing clots versus the risk of bleeding. Consequently, guidelines typically include statements such as “clinical or policy decisions involve more considerations than this body of evidence alone. Clinicians and policymakers should understand the evidence but individualize decision-making to the specific patient or situation.”185

Some physicians who rely on personal style, review articles, and colleagues to influence their clinical practice have been concerned with how guidelines affect clinical autonomy and healthcare costs.186 But just as clinicians have been reluctant to apply guidelines in practice, courts have generally been slow to apply them in deciding cases.187 Political and legal issues have arisen with the development of guidelines.188 In 2006, the Connecticut attorney general launched an antitrust suit against the Infectious Disease Society of America (IDSA) after IDSA promulgated a guideline recommending against the use of long-term antibiotics for the treatment of chronic Lyme disease.189 Although the findings of the Centers for Disease Control and Prevention and the Food and Drug Administration (FDA) seemed to concur with IDSA’s guidelines, an advocacy group representing patients afflicted with chronic Lyme disease and the physicians who treated them protested; the attorney general’s decision to investigate followed shortly afterwards.190 Organizations can violate antitrust laws if their guideline-setting process is an unreasonable attempt to advance their members’ economic interests by suppressing competition. After legal expenses that exceeded $250,000, IDSA settled without admitting any fault, providing an example of

https://doi.org/10.1001/jama.294.6.716.

185. U.S. Preventive Services Task Force, Screening for Carotid Artery Stenosis: U.S. Preventive Services Task Force Recommendation Statement, 147 Annals Internal Med. 854, 854 (2007), https://doi.org/10.7326/0003-4819-147-12-200712180-00005.

186. Sean R. Tunis et al., Internists’ Attitudes About Clinical Practice Guidelines, 120 Annals Internal Med. 956, 956 (1994), https://doi.org/10.7326/0003-4819-120-11-199406010-00008.

187. Arnold J. Rosoff, Evidence-Based Medicine and the Law: The Courts Confront Clinical Practice Guidelines, 26 J. Health Pol., Pol’y & L. 327, 330–53 (2001), https://doi.org/10.1215/03616878-26-2-327.

188. One element in the near demise of the Agency for Health Care Policy and Research (AHCPR) was a political audience receptive to complaints from an association of back surgeons who disagreed with the AHCPR practice guideline conclusions regarding low back pain. B.H. Gray et al., AHCPR and the Changing Politics of Health Services Research, 22 Health Affs. (Supp. 1) W3-283-307 (2003), https://doi.org/10.1377/hlthaff.w3.283.

189. John D. Kraemer & Lawrence O. Gostin, Science, Politics, and Values: The Politicization of Professional Practice Guidelines, 301 JAMA 665, 665 (2009), https://doi.org/10.1001/jama.201.6.665.

190. Id. at 666.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

“the politicization of health policy, with elected officials advocating for health policies against the weight of scientific evidence.”191

Besides clinical practice guidelines, the IOM defined other types of statements: (1) medical review criteria are systematically developed statements that can be used to assess the appropriateness of specific healthcare decisions, services, and outcomes; (2) standards of quality are authoritative statements of minimum levels of acceptable performance or results, excellent levels of performance or results, or the range of acceptable performance or results; and (3) performance measures are methods or instruments to estimate or monitor the extent to which the actions of a healthcare practitioner or provider conform to practice guidelines, medical review criteria, or standards of quality.192

Uncertainty and Tradeoffs in Therapeutic Decision-Making Evidence

Medical decision-making often involves complexity, uncertainty, and tradeoffs193 because of unique genetic factors, lifestyle habits, known conditions, medication histories, and ambiguity about possible diagnoses, test results, treatment benefits, and therapeutic harms. Given inherent diagnostic and therapeutic uncertainty, clinicians often make treatment decisions in the face of uncertainty.

Well-performed randomized trials provide the least biased estimates of treatment benefit and harm by comparing groups with equivalent prognoses. Sticking strictly to the scientific evidence, some physicians may limit their use of medications to the specific drug at the specific doses found to be beneficial in such trials. Others may assume class effects (drugs with similar structure, mechanism of action, or pharmacologic effects, e.g., lowering blood pressure) until proven otherwise. Others may consider additional factors such as out-of-pocket costs for patients or patient preferences. When physicians evaluate patients who might benefit from a treatment but who would have been excluded from the study in which the benefit was demonstrated, they must weigh the risks of harm and benefits through extrapolation in the absence of definitive evidence of benefit or of harm. Indeed, because few medical recommendations are based on randomized trials (the least biased level of evidence), physicians frequently and necessarily face uncertainty in making testing and treatment decisions. For example, in an academic pediatric cardiology hospital, only 3% of clinically

191. Id. at 665.

192. 1990 CAPHSCPG Report, supra note 19, at 8.

193. John P. A. Ioannidis & Joseph Lau, Systematic Review of Medical Evidence, 12 J. L. & Pol’y 509, 511, 524 (2004).

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

significant decisions are related to a specific study.194 In most disciplines, clear evidence of efficacy and risks of treatment are lacking. In cardiology (one of the best-studied areas of medical care), nearly one-half of guideline recommendations are based on expert opinion, case studies, or standards of care.195

Applying well-designed studies to populations of patients represents another problem. The Randomized Aldactone Evaluation Study demonstrated that spironolactone reduced mortality and hospitalizations for heart failure and improved quality of life with minimal risk of seriously high levels of potassium (hyperkalemia).196 Published in a prominent medical journal, prescriptions for spironolactone rose quickly because of familiarity with the medication and the poor prognosis of patients with heart failure. In Ontario, hospitalizations per 1,000 patients for high potassium, however, rose from 2.4 in 1994 to 11.0 in 2001, resulting in an estimated 560 additional hospitalizations for high potassium and 73 additional hospital deaths in older patients with heart failure.197 As opposed to the study population, community individuals were older, and more frequently women, and more often had absolute or relative contraindications to treatment with only 25% characterized as ideal candidates for spironolactone.198 These factors increased the risk that spironolactone therapy would lead to high potassium levels that could be life-threatening. Criteria for entry into randomized trials of drugs typically exclude individuals with concomitant medication use, with medical comorbidities, who are female, and who have disadvantaged socioeconomic status, thereby limiting the ability to generalize the results of a trial to the clinical population being treated.199 Clinicians refer to randomized controlled studies as assessments of drug efficacy in restricted patient populations, whereas studies of treatment in general clinical populations are often referred to as effectiveness studies.

To be sufficiently powered (i.e., to have enough patients enrolled to avoid Type II error) to demonstrate statistical significance,200 randomized controlled trials

194. Jeffrey R. Darst et al., Deciding Without Data, 5 Congenital Heart Disease 339, 340 (2010), https://doi.org/10.1111/j.1747-0803-2010.00433.x.

195. Pierluigi Tricoci et al., Scientific Evidence Underlying the ACC/AHA Clinical Practice Guidelines, 301 JAMA 831, 835 (2009), https://doi.org/10.1001/jama.2009.205.

196. Bertram Pitt et al., The Effect of Spironolactone on Morbidity and Mortality in Patients with Severe Heart Failure: Randomized Aldactone Evaluation Study Investigators, 341 NEJM 709, 709 (1999), https://doi.org/10.1056/NEJM199909023411001.

197. David N. Juurlink et al., Rates of Hyperkalemia After Publication of the Randomized Aldactone Evaluation Study, 351 NEJM 543, 543 (2004), https://doi.org/10.1056/NEJMoa040135.

198. Dennis T. Ko et al., Appropriateness of Spironolactone Prescribing in Heart Failure Patients: A Population-Based Study, 12 J. Cardiac Failure 205, 206 (2006), https://doi.org/10.1016/j.cardfail.2006.01.003.

199. Harriette G. C. Van Spall et al., Eligibility Criteria of Randomized Controlled Trials Published in High-Impact General Medical Journals: A Systematic Sampling Review, 297 JAMA 1233, 1237 (2007), https://doi.org/10.1001/jama.297.11.1233.

200. See Steve C. Gold et al., Reference Guide on Epidemiology, “Methods for Synthesizing or Combining the Results of Multiple Studies,” in this manual.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

usually require high event rates, prolonged follow-up, or large numbers of patients. Because of impracticality, expense, and the time period needed to obtain long-term outcomes, these trials may often choose a surrogate marker that is associated with a clinically important event or with survival. For example, statins were approved on the basis of their safety and efficacy in lowering cholesterol but were only demonstrated to improve survival in patients with known coronary heart disease years later.201 Fast-track approval of new drugs for HIV infection was based on safety and efficacy in reducing viral levels (as a surrogate or substitute outcome measure felt to be related to survival) as opposed to a demonstration of improved survival.

On the other hand, in the late 1970s, patients with frequent extra heartbeats (ventricular premature contractions) following a heart attack had an increased risk of sudden death. On that basis, those in the then-emerging field of cardiac electrophysiology believed that reducing ventricular premature beats (as a surrogate outcome measure) would decrease subsequent sudden cardiac death. In early randomized controlled trials, oral antiarrhythmic drugs such as encainide and flecainide were approved by the FDA on the basis of their ability to suppress these extra heartbeats in patients who had had a myocardial infarction. Years after approval of these drugs, however, a randomized controlled trial designed to demonstrate the survival benefit of these drugs was discontinued after only ten months because of a statistically significant higher rate of mortality in patients receiving the drugs. Although these drugs effectively suppressed the extra heartbeats, the study found that they actually increased the likelihood of fatal heart rhythm disturbances.202

Prior to approval by the FDA, drugs and devices must undergo Phase 1, 2, and 3 clinical trials to demonstrate safety and efficacy. Following preliminary chemical discovery, toxicology, and animal studies, Phase 1 studies examine the safety of new drugs in healthy individuals. Phase 2 studies involve varying drug doses in individuals with the disease to explore efficacy and responses and adverse effects. Based on the dose or doses identified in Phase 2, a Phase 3 study examines drug response in a larger number of patients to again determine safety and efficacy in the hope of getting a new drug approved for sale by regulatory authorities. However, because fewer than 10,000 individuals have usually received the drug during most trials, uncommon adverse outcomes may not become apparent until usage is broadened and extended. For example, about 1 in 24,200, or

201. Randomised Trial of Cholesterol Lowering in 4444 Patients with Coronary Heart Disease: The Scandinavian Simvastatin Survival Study (4S), 344 Lancet 1383, 1385 (1994), https://doi.org/10.1016/S0140–6736(94)90566–5.

202. Preliminary Report: Effect of Encainide and Flecainide on Mortality in a Randomized Trial of Arrhythmia Suppression After Myocardial Infarction: The Cardiac Arrhythmia Suppression Trial (CAST) Investigators, 321 NEJM 406, 409 (1989), https://doi.org/10.1056/NEJM198908103210629.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

40,500, patients who received the antibiotic chloramphenicol203 developed fatal aplastic anemia (in which the bone marrow no longer produces any blood cells). This adverse effect was discovered only in the 1960s after chloramphenicol was initially considered safe and had been widely used during the 1950s.204

For all approved drug and therapeutic biological products, the FDA managed postmarketing safety surveillance beginning in 1969 through the Adverse Event Reporting System, a voluntary reporting system with many limitations. For example, in 1999, rofecoxib (Vioxx), a Cox-2 selective nonsteroidal anti-inflammatory drug (NSAID), was approved for pain relief in part on the basis of studies that suggested that it induced less gastrointestinal bleeding than other NSAIDs. In 2004, the manufacturer announced a voluntary worldwide withdrawal of rofecoxib when a prospective study confirmed that the drug increased the risk of myocardial infarctions (heart attacks) and stroke with chronic use.205 Beginning in 2014 and officially in 2016, the FDA began Active Postmarket Risk Identification and Analysis (ARIA) with the development of the Sentinel Initiative to analyze real-world safety data through “the largest multisite distributed database in the world dedicated to medical product safety” that involves big data from twenty-five academic, health-system, and health-insurer collaborators with experience and expertise in epidemiology, clinical medicine, pharmacy, statistics, health informatics, and data science (natural language processing and machine learning).206

Even in a randomized trial in which a drug is found to be beneficial, some patients who received the drug may have been harmed, emphasizing the need to individualize the balancing of risks of harm and benefits and explaining in part why some clinicians may not adhere to guideline recommendations. The fundamental dilemma was articulated by Bernard in 1865: The response of the “average” patient to therapy is not necessarily the response of the patient being treated.207 Indeed, the average results of clinical trials do not apply to all patients in the trial; this is called heterogeneity of treatment effects208 wherein even with well-defined inclusion and exclusion criteria, variation in outcome risk, and therefore treatment benefit, exists so that even “typical” patients included in the trial may not get the average observed trial benefits.

The Global Utilization of Streptokinase and tPA for Occluded Coronary Arteries Trial is a case in point. The trial suggested that accelerated tissue

203. Clarke, supra note 169, at 815.

204. Id.

205. See generally In re Vioxx Prods. Liab. Litig., 360 F. Supp. 2d 1352 (J.P.M.L. 2005).

206. U.S. Food & Drug Administration, FDA’s Sentinel Initiative—Background, https://www.fda.gov/safety/fdas-sentinel-initiative.

207. Salim Yusuf et al., Analysis and Interpretation of Treatment Effects in Subgroups of Patients in Randomized Clinical Trials, 266 JAMA 93, 93 (1991), https://doi.org/10.1001/jama.1991.03470010097038.

208. David M. Kent et al., The Predictive Approaches to Treatment Effect Heterogeneity (PATH) Statement, 172 Annals Internal Med. 35, 41 (2020), https://doi.org/10.7326/M18–3667.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

plasminogen accelerator (tPA), a potent anticoagulant, reduced mortality from acute heart attacks (myocardial infarction) by dissolving blood clots, with the tradeoff being an increased risk of bleeding in the brain from tPA.209 In a reanalysis of this study, most (85%) of the survival benefit of tPA accrued to half of the patients (those at highest risk of dying from their heart attack). Some patients with very low risk of dying from their heart attack who received tPA were likely harmed because their risk of bleeding in their brain exceeded the benefit.210 Therefore, to optimize treatment decisions in practice, clinicians may attempt to individualize treatment decisions based on their assessment of the patient’s risk of harm versus benefit. Even then, clinicians may be reluctant to administer a medication such as tPA that can cause severe harm, such as bleeding into the brain (intracranial hemorrhage). A single clinical experience with a patient who bled when given tPA might well color a clinician’s judgment about the benefits of the treatment going forward (the availability heuristic).

A fundamental principle of evidence-based medicine is that “[e]vidence alone is never sufficient to make a clinical decision.”211 Nearly all medical decisions involve some tradeoff between a benefit and a harm. Besides the options and the likelihood of the outcomes, patient preferences about the resulting outcomes should affect care choices, especially when there are tradeoffs, such as a risk of complications or dying from a procedure or treatment versus some benefit such as living longer (provided the patient survives the short-term risk of the procedure) or improving their quality of life (relieving symptoms). Besides individualizing risk of harm and benefit assessments, clinicians may also deviate from guideline recommendations (warranted variation) because of a particular patient’s higher risk of adverse events or lower likelihood of benefit or because of patient preferences for the alternative outcomes, such as when risks occur at different times. For example, given a hypothetical choice between living 12.5 years for certain or a 50:50 chance of living 25 years or dying immediately, most individuals would choose 12.5 years. Although both options yield, on average, 12.5 years, individuals tend to be risk averse and prefer to avoid the near-term risk of dying. When interviewed, some patients with operable lung cancer were quite averse to possible immediate death from surgery, and so, based on their

209. The GUSTO Investigators, An International Randomized Trial Comparing Four Thrombolytic Strategies for Acute Myocardial Infarction, 329 NEJM 673, 673 (1993), https://doi.org/10.1056/NEJM199309023291001.

210. David M. Kent et al., An Independently Derived and Validated Predictive Model for Selecting Patients with Myocardial Infarction Who Are Likely to Benefit from Tissue Plasminogen Activator Compared with Streptokinase, 113 Am. J. Med. 104, 108 (2002), https://doi.org/10.1016/s0002-9343(02)01160–9.

211. Guyatt et al., supra note 167, at 8; see also section titled “Hierarchy of Medical Evidence” above.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

preferences, these patients probably should opt for radiation therapy despite its poorer long-term survival.212

Besides risk aversion, some treatments may improve quality of life but place patients at risk for shortened life expectancy, and some patients may be willing to trade off quality of life for length of life. When presented with laryngeal cancer scenarios, some volunteer research subjects chose radiation therapy over surgery to preserve their voices despite a reduced likelihood of future survival. “These results suggest that treatment choices should be made on the basis of patients’ attitudes toward the quality as well as the quantity of survival.”213

To illustrate this principle, a National Institutes of Health Consensus Conference recommended breast-conserving surgery when possible for women with Stage I and II breast cancer, because well-designed studies with long-term follow-up on thousands of women demonstrated equivalence of lumpectomy and radiation therapy or mastectomy for survival and disease-free survival (being alive without breast cancer recurrence).214 In one study, lumpectomy and radiation appeared to have a lower risk of breast cancer recurrence, with five women reported to have had breast cancer recurrences following lumpectomy and radiation versus ten women after mastectomy.215 However, breast cancer that recurred in the breast that had been operated on was censored (i.e., deliberately not considered in the statistical analysis, likely because the breast would have been removed with the alternative intervention mastectomy).216 When including these censored cancer recurrences, 20 breast cancer recurrences occurred after lumpectomy versus ten after mastectomy, and so lumpectomy actually had a higher overall risk of recurrence.217 As expressed by one woman, “The decision about treatment for breast cancer remains an intensely personal one. The mastectomy I chose . . . felt a lot less invasive than the prospect of six weeks of daily radiation, not to mention the 14% risk of local recurrence.”218 In such a case,

212. Barbara J. McNeil et al., Speech and Survival: Tradeoffs Between Quality and Quantity of Life in Laryngeal Cancer, 305 NEJM 982, 986 (1981), https://doi.org/10.1056/NEJM198110223051704.

213. Id. at 982.

214. NIH Consensus Conference: Treatment of Early-Stage Breast Cancer, 265 JAMA 391, 392 (1991), https://doi.org/10.1001/jama.1991.03460030097037.

215. Joan A. Jacobson et al., Ten-Year Results of a Comparison of Conservation with Mastectomy in the Treatment of Stage I and II Breast Cancer, 332 NEJM 907, 909 (1995), https://doi.org/10.1056/NEJM199504063321402.

216. Bernard Fisher et al., Eight-Year Results of a Randomized Clinical Trial Comparing Total Mastectomy and Lumpectomy With or Without Irradiation in the Treatment of Breast Cancer, 320 NEJM 822, 824 (1989), https://doi.org/10.1056/NEJM198903003201302; Jacobson et al., supra note 215, at 909.

217. Jacobson et al., supra note 215, at 909.

218. Karen Sepucha et al., Policy Support for Patient-Centered Care: The Need for Measurable Improvements in Decision Quality, 23 Health Affs. (Supp. 2) VAR54, VAR62 (2004), https://doi.org/10.1377/hlthaff.var.54.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

patient preferences219 about tradeoffs involving breast preservation and increased risk of breast cancer recurrence or the need for radiation therapy associated with lumpectomy may play an important role in determining the optimal decision for any particular patient.220

Biases
Cognitive psychology

In Predictably Irrational: The Hidden Forces That Shape Our Decisions, Dan Ariely refers to

the simple and compelling idea that we are capable of making the right decision for ourselves. . . . In fact, this book is about human irrationality—about our distance from perfection. . . . Understanding irrationality is important for our everyday actions and decisions, and for understanding how we design our environment and the choices it presents to us.221

Beyond the cognitive biases associated with heuristics (representativeness, availability, and anchoring, mentioned earlier) and patient- and hospital-related factors leading to medical errors, medical decision-making is susceptible to pitfalls related to other cognitive biases, including framing effects (being influenced by wording and context, e.g., choosing riskier options when described in negative terms such as mortality), blind obedience (deference to authority or technology), number of alternatives (choosing one treatment more often when presented with additional options), premature closure (focusing narrowly on a single opinion), and personality traits such as tolerance of ambiguity or overconfidence.222 A systematic review of cognitive biases in medical decisions made by physicians found that at least one cognitive factor or bias was present in every study, with five of seven studies showing an association with medical errors.223 Lastly, an

219. Proctor & Gamble Pharm., Inc. v. Hoffman-LaRoche, Inc., No. 06 CIV. 0034 (PAC), 2006 WL 2588002, at *10 (S.D.N.Y. Sept. 6, 2006) (detailing the testimony of a physician stating that, in addition to efficacy, he considers patient preferences when determining treatment for osteoporosis).

220. Jerome P. Kassirer, Adding Insult to Injury: Usurping Patients’ Prerogatives, 308 NEJM 898, 900–01 (1983), https://doi.org/10.1056/NEJM198304143081511.

221. Dan Ariely, Predictably Irrational: The Hidden Forces That Shape Our Decisions xix (2010).

222. Gustavo Saposnik et al., Cognitive Biases Associated with Medical Decisions: A Systematic Review, 16 BMC Med. Informatics & Decision Making 138 (2016), https://doi.org/10.1186/s12911-016-0377-1; Donald A. Redelmeier, Improving Patient Care: The Cognitive Psychology of Missed Diagnoses, 142 Annals Internal Med. 115, 119 (2005), https://doi.org/10.7326/0003-4819-142-2-200501180-00010.

223. Saposnik et al., supra note 222, at 5.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

emerging area concerns noise, that is, variability in judgments that should be identical. Specifically, different physicians do not make identical decisions for the same patient; this noise was especially high in psychiatry but found even in radiology (with, among other examples, forensic science and bail decisions): “Whenever you look at human judgments, you are likely to find noise [random scatter]. To improve the quality of our judgments, we need to overcome noise as well as bias [systematic deviation].”224

Race and ethnicity

Beyond the biases associated with the types and severity of the diseases that physicians may see, the 2003 IOM report Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care found that “[r]acial and ethnic minorities tend to receive a lower quality of healthcare than non-minorities, even when access-related factors, such as patients’ insurance status and income, are controlled. The sources of these disparities are complex, are rooted in historic and contemporary inequities, and involve many participants at several levels, including health systems, their administrative and bureaucratic processes, utilization managers, healthcare professionals, and patients.”225 Evidence of barriers leading to unequal treatment included access (even when equally insured), language, geography, cultural familiarity, health-system financial and institutional arrangements, and legal, regulatory, and policy environments.226 These health inequities arose from a “historic context in which healthcare has been differentially allocated on the basis of social class, race, and ethnicity.”227 Although this report is more than 20 years old, a 2021 study examining screening colonoscopy found consistent evidence of inequities including access to screening, the quality of screening, delay from diagnosis to initiation of treatment, and quality of treatment.228 Thus, criticism of the inclusion of race in prediction models (referred to as clinical algorithms) arose in major journals because “medicine is not a stand-alone institution immune to racial inequities, but rather is an institution of structural racism. A pervasive example of this participation is race-based medicine, the system by which research characterizing race as an

224. Daniel Kahneman et al., Noise: A Flaw in Human Judgment 4, 7, 273–86 (2021).

225. Comm. on Understanding and Eliminating Racial and Ethnic Disparities in Health Care, Inst. Med., Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care 1, 1 (2003), https://doi.org/10.17226/12875.

226. Id. at 140–59.

227. Id. at 123.

228. U.S. Preventive Services Task Force, Screening for Colorectal Cancer: US Preventive Services Task Force Recommendation Statement, 325 JAMA 1965, 1970 (2021), https://doi.org/10.1001/jama.2021.6238; Carolyn M. Rutter, Black and White Differences in Colorectal Cancer Screening and Screening Outcomes: A Narrative Review, 30 Cancer Epidemiology, Biomarkers & Prevention 3, 9–10 (2021), https://doi.org/10.1158/1055-9965.EPI-19-1537.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

essential biological variable, translates into clinical practice, leading to inequitable care”;229 “[t]hese algorithms propagate race-based medicine”;230 and “[r]ace is a historically derived social construct that has no place as a biological proxy.”231

Estimating kidney function with race and without

One area of focus became the estimation of kidney function and reconsidering whether the equation for estimating the glomerular filtration rate (GFR) should include race. The inclusion of race historically led to higher values of kidney function for people who are black. Those higher values could avoid overdiagnosis and overtreatment but could also delay referral to specialty care and transplantation and result in poorer health outcomes, particularly in a population at higher risk for end-stage kidney disease and related mortality than the general population.232 Others have raised performance and data issues including validation, overall and within-group accuracy, availability of assays, and equation parameters in patients who are black.233

In response, the National Kidney Foundation and American Society of Nephrology created the NKF-ASN Task Force on Reassessing the Inclusion of Race in Diagnosing Kidney Disease. The task force, including ninety-seven experts with input from the community at large (students, trainees, clinicians, scientists, other health professionals, patients, family members and other public stakeholders), clarified the problem and evidence, narrowed the list of twenty-six approaches that did or did not consider race for estimating GFR to five, and evaluated those on “assay availability and standardization; implementation; population diversity in equation development; performance compared to measured GFR; consequences to clinical care, population tracking and research; and patient centeredness.”234 They ultimately recommended immediate implementation of a CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration) creatinine equation that was revised by excluding the race variable in calculation and reporting based on “acceptable performance characteristics

229. Jessica P. Cerdeña et al., From Race-Based to Race-Conscious Medicine: How Anti-Racist Uprisings Call Us to Act, 396 Lancet 1125, 1125 (2020), https://doi.org/10.1016/S0140-6736(20)32076-6.

230. Darshali A. Vyas et al., Hidden in Plain Sight: Reconsidering the Use of Race Correction in Clinical Algorithms, 383 NEJM 874, 874 (2020), https://doi.org/10.1056/NEJMms2004740.

231. Joseph L. Wright et al., Eliminating Race-Based Medicine, 150 Pediatrics e2022057998, 1 (2022), https://doi.org/10.1542/peds.2022-057998.

232. Vyas et al., supra note 230, at 875.

233. James A. Diao et al., In Search of a Better Equation: Performance and Equity in Estimates of Kidney Function, 384 NEJM 396, 398 (2021), https://doi.org/10.1056/NEJMp2028243.

234. Cynthia Delgado et al., A Unifying Approach for GFR Estimation: Recommendations of the NKF-ASN Task Force on Reassessing the Inclusion of Race in Diagnosing Kidney Disease, 32 J. Am. Soc’y Nephrology 2994, 2994 (2021), https://doi.org/10.1681/ASN.2021070988.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

and potential consequences that do not disproportionately affect any one group of individuals.”235

Algorithms as standards

Beyond estimating kidney function, prediction algorithms and risk scores have become embedded in some guidelines to help clinicians to individualize treatment decisions—for example, for recommending a statin based on estimating a 10-year risk of having a stroke or heart attack with the American College of Cardiology/American Heart Association (ACC/AHA) Pooled Cohort Equations.236 However, studies suggest that these equations overpredict (higher estimates than those observed) in broad populations and underpredict (lower estimates than those observed) in disadvantaged communities. Moreover, the ACC/AHA, the U.S. Preventive Services Task Force, and the U.S. Department of Veterans Affairs each have different ten-year thresholds of heart attack or stroke for initiating a statin ranging from 7.5% to 12% or greater.237

A 2023 publication promoted using an algorithm to provide individualized risk for lung cancer. The editorial to that publication, however, points out that the use of that algorithm would have implications on safety (improved effectiveness of patient selection but worse safety profile, i.e., unnecessary invasive procedures and treatment of overdiagnosed cancers), patient-centeredness (the unmet need for clarity about patient values, preferences, and goals), timeliness (delays due to incorporating the algorithm into the electronic health record and an altered workflow), and equity (if minority-serving institutions have fewer resources to implement).238 On this topic the U.S. Preventive Services Task Force identifies the need for research “on the benefits and harms of using risk prediction models to select patients for lung cancer screening, including whether use of risk prediction models represents a barrier to wider implementation of lung cancer screening in primary care.” The task force recommended annual lung cancer screening for those aged 50 to 80 years old with a measure of smoking exposure (at least 20 pack-years, e.g., one pack of cigarettes smoked daily for 20 years), and with less than 15 years elapsed since stopping smoking.239

235. Id.

236. U.S. Preventive Servs. Task Force, Statin Use for the Primary Prevention of Cardiovascular Disease in Adults: U.S. Preventive Services Task Force Recommendation Statement, 328 JAMA 746, 747 (2022), https://doi.org/10.1001/jama.2022.13044.

237. Id. at 751–52.

238. Renda Soylemez Wiener & Michael K. Gould, Selecting Candidates for Lung Cancer Screening: Implications for Effectiveness, Efficiency, Equity, and Implementation, 176 Annals Internal Med. 413, 413–14 (2023), https://doi.org/10.7326/M23–0230.

239. U.S. Preventive Servs. Task Force, Screening for Lung Cancer: U.S. Preventive Services Task Force Recommendation Statement, 325 JAMA 962, 968 (2021), https://doi.org/10.1001/jama.2021.1117.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Thus, although a few clinical prediction or risk scores are recommended or mentioned in some guidelines and in use to assist with risk prediction and treatment decisions, implementation, external validity, and other issues as described above remain.

Algorithmic fairness and clinical prediction

In a perspective titled Predictably Unequal: Understanding and Addressing Concerns That Algorithmic Clinical Prediction May Increase Health Disparities, Paulus and Kent discuss concepts of algorithmic fairness versus algorithmic bias in healthcare in the context of increasing use of predictive algorithms (e.g., machine learning, artificial intelligence, or prediction models) to support decision-making.240 They acknowledge that fairness concerns apply when these predictive model algorithms lead to polar decisions, that is, where the predictions lead to decisions that benefit some groups but not others, such as in the allocation of healthcare resources, particularly scarce ones.241 They examine different fairness criteria and demonstrate that “[e]ven when models are used to balance benefits–harms to make optimal decisions for individuals (i.e., for non-polar decisions)—and fairness concerns are not germane—model, data or sampling issues can lead to biased predictions that support decisions that are differentially harmful/beneficial across groups.”242 Although they review potential sources of bias and methods for diagnosing and remedying bias, the core problematic issue may be that we lack agreed-upon definitions of fairness.

For example, they cite a commercial software used to predict reoffense after release based on data from 7,000 arrests.243 It found a higher risk of reoffense in defendants who were black, leading to potentially longer sentences, than those who were white, even though the algorithm was “race-unaware.” This was even the case among those who did not recidivate, that is, truly zero risk, resulting in unequal error rates consistent with “unfairness” and perhaps also “legal definitions of discrimination through ‘disparate impact.’” The software developers, however, argued that the model had good calibration—that is, good agreement between the observed and the predicted reoffense. However, when examining two fairness criteria that equalized either (1) type I/II error rates (sensitivity and specificity) or (2) test fairness with consistent calibration (strata-specific outcome

240. Jessica K. Paulus & David M. Kent, Predictably Unequal: Understanding and Addressing Concerns That Algorithmic Clinical Prediction May Increase Health Disparities, 3 N.P.J. Digit. Med. 99, at 1 (2020), https://perma.cc/W2NM-KH3A. See also James E. Baker & Laurie N. Hobart, Reference Guide on Artificial Intelligence, “Bias,” in this manual.

241. Paulus & Kent, supra note 240, at 1.

242. Id.

243. Id. at 2. See also Valena E. Beety et al., Reference Guide on Forensic Feature Comparison Evidence, “Special Issues with Machine-Generated Feature Comparison Evidence,” in this manual.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

rates), Paulus and Kent found both criteria cannot simultaneously be satisfied when the outcome rates differ across two groups (except in the case of perfect prediction), “leading to the conclusion that unfairness is inevitable” (Figure 6).244

Figure 6. Mutual incompatibility of fairness criteria.
Mutual incompatibility of fairness criteria Source: From Jessica K. Paulus & David M. Kent, Predictably Unequal: Understanding and Addressing Concerns That Algorithmic Clinical Prediction May Increase Health Disparities, 3(99) NPJ Digit. Med. 2 (2020), https://doi.org/10.1038/s41746-020-0304-9. Reproduced with permission.

A multitude of fairness criteria exist with likely mutual incompatibility, so satisfying all of them becomes impossible because of inevitable unfairness by at least one criteria, thereby suggesting the impossibility of a fair and unbiased decision because there will be at least one unequal outcome.

To satisfy [a] more stringent, narrow, and rigorous definition of unfairness, it is not enough to observe differences in outcomes—one must understand the causes for these outcome differences. Such a causal concept of fairness is closely aligned to the legal concept of disparate treatment. According to causal definitions of fairness, similar individuals should not be treated differently due to having certain protected attributes that qualify for special

244. Paulus & Kent, supra note 240, at 2. D+ = disease present, D- = disease absent, T+ = test positive, T- = test negative.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

protection from discrimination, such as a certain race/ethnicity or gender. However, causality is fundamentally unidentifiable in observational data, except with unverifiable assumptions. Thus, we are more typically stuck with deeply imperfect but ascertainable criteria serving as (often poor) proxies for causal fairness.245

Given the emerging healthcare applications of machine artificial-intelligence research despite the lack of consensus on application and evaluation, it is important to recognize that current literature provides some frameworks for guidance about the evaluation of artificial intelligence in medicine as well as stages of development, reporting, evaluation, implementation, and surveillance.246

Informed Consent

Principles

Medical informed consent is an ethical, moral, and legal responsibility of physicians.247 It is guided by four ethical principles: autonomy, beneficence, non-malfeasance, and justice.248 Autonomy refers to informed, rational decision-making after “unbiased and thoughtful deliberation.”249 Beneficence represents the “moral obligation of physicians to act for the benefit of patients.”250 These two principles place physicians in conflict because they wish to provide the care they believe is best for the patient, but because that care usually involves some risk or cost, physicians also recognize that patient autonomy and preferences may affect their recommendation.251 In a study examining the incidence of erectile dysfunction with use of a beta-blocker medication known to be beneficial, heart disease patients were (1) blinded to the drug, (2) informed of the drug name only, or (3) informed about its erectile dysfunction adverse effect. Among those blinded, 3.1% developed erectile dysfunction compared with 15.6% of those

245. Id. at 3.

246. Norah L. Crossnohere et al., Guidelines for Artificial Intelligence in Medicine: Literature Review and Content Analysis of Frameworks, 24 J. Med. Internet Rsch. e36823 (2022), https://doi.org/10.2196/36823; Sebastian Vollmer et al., Machine Learning and Artificial Intelligence Research for Patient Benefit: 20 Critical Questions on Transparency, Replicability, Ethics, and Effectiveness, 368 B.M.J. I6927 (2020), https://doi.org/10.1136/bmj.I6927; Lazaros Belbasis & Orestis A. Panagiotou, Reproducibility of Prediction Models in Health Services Research, 15 BMC Rsch. Notes 204 (2022), https://doi.org/10.1186/s13104-022-06082-4.

247. Timothy J. Paterick et al., Medical Informed Consent: General Considerations for Physicians, 83 Mayo Clinic Proc. 313, 313 (2008), https://doi.org/10.4065/83.3.313.

248. Jaime S. King & Benjamin W. Moulton, Rethinking Informed Consent: The Case for Shared Decision-Making, 32 Am. J. L. & Med. 429, 435 (2006), https://doi.org/10.1177/009885880603200401.

249. Id. at 435.

250. Id.

251. Id. at 436.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

given the drug name and 31.2% of those informed about adverse effects, showing that being informed increased the risk for adverse effects and might deprive patients of benefit from a drug because they stop taking it.252 Physicians must balance the desire to provide beneficial care with the obligation to promote autonomous decisions by informing patients of potential adverse effects or tradeoffs.

Standards

State jurisdictions differ in their standards for disclosure, with half adopting the physician or professional standard (the information that other local physicians with similar skill levels would provide) and the other half adopting the patient or materiality standard (the information that a reasonable patient would deem important in decision-making).253 The informed-consent process involves the disclosure of alternative treatment options, including no treatment, and the risks and benefits associated with each alternative. Discussion should include severe risks and frequent risks, but the courts have not provided explicit guidance about what constitutes sufficient severity or frequency. Patients should be considered by the court to be competent and should have the capacity to make decisions (understanding choices, risks, and benefits). The decision should be voluntary—of free mind and free will, without coercion or manipulation. The language used should be understandable to the patient, and treatment should not proceed unless the physician believes the patient understands the options and their risks and benefits.

Patients may withdraw consent or refuse treatment. Such an action should engender additional discussion, and documentation may include the completion of a withdrawal-of-consent form. In certain situations, exceptions to medical consent may arise in emergencies, when the treatment is recognized by prudent physicians to involve no material risk to patients and when the procedure is unanticipated and not known to be necessary at the time of consent.254

In 2024, the American Law Institute recognized that patients have choices among alternative treatments, so plaintiffs no longer need to prove that patients in similar situations would not have chosen treatment if they had received complete disclosure of risks, and now only need to prove that they would have chosen an alternative treatment that would have been reasonable for them.255

252. Antonello Silvestri et al., Report of Erectile Dysfunction After Therapy with Beta-Blockers Is Related to Patient Knowledge of Side Effects and Is Reversed by Placebo, 24 Eur. Heart J. 1928, 1928 (2003), https://doi.org/10.1016/j.ehj.2003.08.016.

253. King & Moulton, supra note 248, at 430.

254. Paterick et al., supra note 247, at 318.

255. Daniel G. Aaron et al., A New Legal Standard for Medical Malpractice, 333 JAMA 1161–65 (Feb. 26, 2025), https://doi.org/10.1001/jama.2025.0097.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Patient care

The Merenstein case described an unpublished trial in which, during his residency, Dr. Merenstein examined a highly educated man.256 The examination included a discussion of the relevant risks and benefits regarding prostate cancer screening using the prostate-specific antigen (PSA) test based on recommendations from the U.S. Preventive Services Task Force, the American College of Physicians–American Society of Internal Medicine, the American Medical Association, the American Urological Association, the American Cancer Association, and the American Academy of Family Physicians. Dr. Merenstein testified that the patient declined the test because of the high false-positive rate, the risk of treatment-related adverse effects, and the low risk of dying from prostate cancer. Another physician seeing the same patient subsequently ordered a PSA without any patient discussion. The PSA was high, and the patient was diagnosed with incurable advanced prostate cancer. The plaintiff’s attorney argued that despite the guidelines above, the standard of care in Virginia was to order the blood test without discussion, based on four physician witnesses who supported the plaintiff’s attorney “that the standard of care in Virginia was to order the test without discussing it with the patient,” so the jury ruled in favor of the plaintiff.257 It is important to note that the standard of care varies from state to state with about half having physician-based standards requiring “physicians to inform a patient of the risks, benefits and alternatives to a treatment in the same manner that a ‘reasonably prudent practitioner’ in the field would,” and the other half having patient-based standards requiring physicians to provide “all information on the risks, benefits and alternatives to a treatment that a ‘reasonable patient’ would attach significance to in making a treatment decision.”258

In 2024, the American Law Institute shifted its historic focus on local habitual practice to more patient-centered reasonable care, defined as that from competent and qualified clinicians that could override customary community practices, e.g., through evidence-based medicine.259

Patient preferences

To illustrate the importance of patient preferences, a woman with breast cancer described her experience:

256. Daniel Merenstein, A Piece of My Mind: Winners and Losers, 291 JAMA 15, 15–16 (2004), https://doi.org/10.1001/jama.291.1.15.

257. King & Moulton, supra note 248, at 434.

258. Id. at 432–34.

259. Aaron et al., supra note 255.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

[A]s the surgeon diagramed the incision points on my chest with a felt-tip pen, my husband asked a question: Is it really necessary to transfer this back muscle? The doctor’s answer shocked us. No, he said, he could simply operate on my chest. That would cut surgery and recovery time in half. He had planned the more complicated procedure because he thought it would have the best cosmetic result. “I assumed that’s what you wanted.”260

Instead the woman preferred the less invasive approach that shortened her recovery time.

Randomized trials

In the research setting, a randomized trial with and without informed consent demonstrated that the process of getting informed consent altered the effect of a placebo when given to patients with insomnia. The first patient of each pair was randomized to no informed consent and the second to informed consent. Out of 56 patients randomized to informed consent, 26 declined to participate in the study (the patients without informed consent had no choice and were unaware of their participation in a study). The informed consent process created a “biased” group because the age and gender for those who declined participation differed significantly from those who did agree to be included in the study. The hypnotic activity of placebo was significantly higher without informed consent, and adverse events were found more commonly in the group receiving informed consent. The study suggests that the process of getting informed consent introduced biases in the patient population and affected the efficacy and adverse effects observed in this clinical trial, thereby potentially affecting the general applicability of any findings involving informed consent.261

Health information sources

Besides physicians, patients may get health information from the internet, family, friends, and the media (newspapers, magazines, television). Of internet users, 80% had searched for information on at least 1 of 15 major health topics, but use varied from 62% to 89% by age, sex, education, and race/ethnicity groups.262 The

260. Julie Halpert, What Do Patients Want? 141 Newsweek, no. 17, Apr. 27, 2003, at 63–64, https://perma.cc/8AZE-Q3HE.

261. R. Dahan et al., Does Informed Consent Influence Therapeutic Outcome? A Clinical Trial of the Hypnotic Activity of Placebo in Patients Admitted to Hospital, 293 Brit. Med. J. (Clinical Rsch. Ed.) 363, 363 (1986), https://doi.org/10.1136/bmj.293.6543.363.

262. Susannah Fox, Pew Rsch. Ctr., Health Topics 13–14 (2011), https://perma.cc/F4QU-45FB.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

percent of all adults who look online for health information varied from 72% for those 18–29 to 30% for those 65 and older.263 Over the past decade, smartphone use has grown dramatically across all age groups, but especially those 65 and older (13% in 2012 to 61% in 2021).264 A cross-sectional November 2006 and May 2007 national survey of U.S. adults who had made a medical decision found that internet use averaged 28%, but varied from 17% for breast cancer screening to 48% for hip/knee replacement, among those 40 years of age and older.265 However, even among internet users, healthcare providers were felt to be the most influential source of information for medical decisions, followed by the internet, family and friends, and then media. In this digital age, clinicians remain the most influential information source for now.

Risk Communication

Multiple health outcomes may result from alternative treatment choices, and how patients feel about the relative importance of those outcomes varies.266 When patients with recently diagnosed, curable prostate cancer were presented with 93 possible questions that might be important to patients like themselves, 91 of the questions were cited as relevant to at least one patient, demonstrating that personal health goals and concerns vary substantially.267 Communication skills should include patient problem assessment (appropriate questioning techniques, seeking patient’s beliefs, checking patient’s understanding of the problem); patient education and counseling (eliciting patient’s perspective, providing clear instructions and explanations, assessing understanding); negotiation and shared decision-making (surveying problems and delineating options, arriving at mutually acceptable solutions); and relationship development and maintenance (encouraging patient expression, communicating a supportive attitude, explaining any jargon, and using nonverbal behavior to enhance communication).268

263. Mary Madden, Pew Rsch. Ctr., Older Adults and Internet Use: (Some of) What We Know 20 (2013), https://perma.cc/5L49–4E7T.

264. Michelle Faverio, Pew Rsch. Ctr., Share of Those 65 and Older Who Are Tech Users Has Grown in the Past Decade 1 (2022), https://perma.cc/TT4J-AFLE.

265. Mick P. Couper et al., Use of the Internet and Ratings of Information Sources for Medical Decisions: Results from the DECISIONS Survey, 30 Med. Decision Making 106S, 106S (2010), https://doi.org/10.1177/0272989X10377661.

266. Kassirer, supra note 220, at 899.

267. Deb Feldman-Stewart et al., What Questions Do Patients with Curable Prostate Cancer Want Answered?, 20 Med. Decision Making 7, 7 (2000), https://doi.org/10.1177/0272989X0002000102.

268. Michael J. Yedidia et al., Effect of Communications Training on Medical Student Performance, 290 JAMA 1157, 1159 (2003), https://doi.org/10.1001/jama.290.9.1157.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Types of risk-communication estimates

Certain forms of risk communication, however, may be confusing and should be avoided: “single event probabilities, conditional probabilities (such as sensitivity and specificity), and relative risks.”269 Single-event probabilities are a “source of miscommunication” because the events remain unspecified, for example, a statement that a medication results in a 30% to 50% chance of developing erectile dysfunction could be misconstrued by patients to mean an erectile dysfunction problem in 30% to 50% of each their sexual encounters instead of a more specific statement such as out of 10 people like you taking this medication, three to five of them experience erectile dysfunction.270 This natural frequency statement specifies a reference class (e.g., out of 10 people and not each encounter), thereby reducing misunderstanding.271

For communicating benefit by citing relative risk (ratio of the risk of dying in a group taking a medication divided by the risk of dying in a group not taking a medication), consider an advertising statement that taking a cholesterol-lowering medication reduces the risk of dying by 22%.272 This may be misinterpreted as saying that out of 1,000 patients with high cholesterol, 220 of them can avoid dying by taking cholesterol-lowering medications. The actual data show that 32 deaths occur among 1,000 patients taking the medication, and 41 deaths occur among 1,000 patients taking the placebo, so the relative risk is 0.78 (32/41). The relative risk reduction equals one minus the relative risk or 0.22 (22%). A preferred way to express the benefit would be the absolute risk reduction (the difference between 41 and 32 deaths in 1,000 patients), or to say that in a group of 1,000 people like you with high cholesterol, taking a cholesterol medication for five years helps nine of them avoid dying.273

To illustrate potential misinterpretation of risk further, a relative risk reduction of 22% has very different absolute risk reductions depending on the event rates without treatment. If the mortality rate is 20% without treatment, then the absolute risk reduction is 4.4% or saving 44 of 1,000 treated patients (22% times 20%), but for an event rate of 2% without treatment, then the same 22% risk reduction results in an absolute reduction of 0.44% or saving 4.4 of 1,000 treated patients. The number needed to treat is an alternative form of risk communication to account for the risk without treatment. It is the reciprocal of the absolute risk difference or one divided

269. Gerd Gigerenzer & Adrian Edwards, Simple Tools for Understanding Risks: From Innumeracy to Insight, 327 B.M.J. 741, 741 (2003), https://doi.org/10.1136/bmj.327.7417.741.

270. Id.

271. Id. at 743; see also section titled “Multiple Diseases and Multiple Tests” above and David H. Kaye & Hal S. Stern, Reference Guide on Statistics and Research Methods, “Appendix: Conditional Probability and Bayes’ Rule,” in this manual.

272. Gerd Gigerenzer, Calculated Risks: How to Know When Numbers Deceive You 34 (2002).

273. Id. at 34–35.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

by the quantity nine lives saved per 1,000 (1÷ (9/1000)) treated with cholesterol medications in the prior paragraph. Therefore 111 patients need to be treated with a cholesterol medication for five years to save one of them, or in the illustrative example above with a relative risk reduction of 22% and a mortality rate of 20% or 2%, either 23 or 227, respectively, would need to be treated to save one patient.

Breast-cancer screening communication

In the analysis of randomized trials of mammography for the U.S. Preventive Services Task Force, the number needed to screen (NNS) to avoid one breast-cancer death over 10 years was 3,448 (10,000/2.9) for 39- to 49-year-olds, 1,299 (10,000/7.7) for 50- to 59-year-olds, and 469 (10,000/21.3) for 60- to 69-year-olds.274 The numbers needed to harm (NNH) with annual or biennial mammography over 10 years were: screening every 1.6 (1/0.61) to 2.4 (1/0.42) 40-year-old women yields one false positive result and screening 14 (1/0.07) to 20 (1/0.05) 40-year-old women yields one breast biopsy.275 Screening for breast cancer and treating those found to have breast cancer saves lives. From epidemiologic population-based incidence and mortality data and from screening trials with long-term follow-up, it becomes clear that some patients experience over-diagnosis, i.e., some women have screen-detected cancer that would never have developed clinical signs or symptoms or caused their mortality in their lifetime. Estimates of overdiagnosis from trials ranged from 11% to 22%, and so, out of 100 women diagnosed with breast cancer from screening, five to nine of them undergo treatment for a cancer that may have never caused any mortality.276 Importantly, no one can tell if any particular woman has been overdiagnosed because this is unobservable.277

To summarize the evidence,

Mammography does save lives, more effectively among older women, but does cause some harm. Do the benefits justify the risks? The misplaced propaganda battle seems to now rest on the ratio of the risks of saving a life compared with the risk of overdiagnosis, two very low percentages that are imprecisely estimated and depend on age and length of follow-up.278

274. Heidi D. Nelson et al., Effectiveness of Breast Cancer Screening: Systematic Review and Meta-Analysis to Update the 2009 U.S. Preventive Services Task Force Recommendation, 164 Annals Internal Med. 244, 246 (2016), https://doi.org/10.7326/M15–0969.

275. Heidi D. Nelson et al., Harms of Breast Cancer Screening: Systematic Review to Update the 2009 U.S. Preventive Services Task Force Recommendation, 164 Annals Internal Med. 256, 257–59 (2016), https://doi.org/10.7326/M15–0970.

276. Id. at 256.

277. Klim McPherson, Screening for Breast Cancer: Balancing the Debate, 340 B.M.J. c3106 at 234 (2010), https://doi.org/10.1136/bmj.c3106.

278. Id.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Technologies

Stephen Hawking said, “Our future is a race between the growing power of technology and the wisdom with which we use it.”279 Rapidly growing sources of data fuel technology growth, particularly artificial intelligence (AI) with the potential to improve diagnosis, clinical care, health research, drug development, and public health (surveillance, outbreak response, and health system monitoring).280 However, the World Health Organization (WHO) identifies six key ethical principles for the use of AI in health:

  • Protecting human autonomy: Providers must have the necessary information to make safe and effective use of AI systems. Patients must understand what part those systems play in their care. The AI systems must protect privacy and confidentiality with informed consent and appropriate legal protection of data;281
  • Promoting human well-being and safety and the public interest: The AI systems should meet regulatory requirements and be safe, accurate and efficient. The systems should practice quality control and quality improvement and, importantly, do no harm;282
  • Ensuring transparency, explainability, and intelligibility: The design or deployment of an AI technology should be published or documented with sufficient information to facilitate public deliberation and consultation;283
  • Fostering responsibility and accountability: Establishing predefined points of human supervision through regulatory principles upstream and downstream of any AI algorithm would provide a “human warranty”;284
  • Ensuring inclusiveness and equity: Design features should promote the “widest possible appropriate, equitable use and access, irrespective of age, sex, gender, income, race, ethnicity, sexual orientation, ability or other characteristics protected under human rights codes” and should seek to “minimize inevitable disparities in power that arise between providers and patients, between policy-makers and people, and between companies and governments”;285 and
  • Promoting AI that is responsive and sustainable: Designers, developers, and users should “continuously, systematically and transparently assess AI

279. Stephen Hawking, Brief Answers to the Big Questions 196 (2018).

280. Health Ethics & Governance (HEG), Ethics and Governance of Artificial Intelligence for Health: WHO Guidance, Executive Summary iii (World Health Organization ed., 2021), https://www.who.int/publications/i/item/9789240037403.

281. Id. at iv.

282. Id. at v.

283. Id.

284. Id.

285. Id. at v–vi.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
  • applications during actual use” and governments and companies should address anticipated disruptions in the workplace to ensure sustainability.286
Data collection and use

Personal health data has grown massively to include “genomic data, radiological images, medical records[287] and nonhealth data converted into health data . . . from standard sources (e.g. health services, public health, research) and further sources (environmental, lifestyle, socioeconomic, behavioural and social).” Concerns include (1) the quality of the data and inherent biases in the training data (e.g., underrepresentation and their effect on algorithms); (2) safeguarding individual privacy (e.g., discrimination due to health status; cyber theft and accidental or intentional disclosure); (3) collected health data may exceed what is required—of particular concern is a “behavioral data surplus” that is “repurposed for uses that raise serious ethical, legal and human rights” questions;288 and (4) data colonialism “may foster a divide between those who accumulate, acquire, analyze and control such data and those who provide the data but have little control over their use.” As noted by the WHO, “true informed consent is increasingly infeasible in an era of biomedical big data, especially in an environment driven mainly by companies seeking to generate profits from the use of data” because of the “scale and complexity” of such data and because “all of the potential uses may not be known,” such as for “population-level data analytics or predictive-risk modeling.”289

286. Id. at vi.

287. Malpractice litigation includes the prominent use of electronic medical records with metadata (often referred to as “audit trails”) that allow someone to determine exactly when an entry was made into a record, what work station it was entered from, and whether any edits were made and when. An area of uncertainty remains as to the scope of a patient’s official or “legal” medical record under HIPAA and whether it includes text messages between clinicians, e.g., does the hospital have a duty to preserve such data and disclose it to the patient?

288. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance, supra note 280, at 37. See, e.g., Dinerstein v. Google, LLC, 484 F. Supp. 3d 561 (2020), in which the University of Chicago was accused of sharing identifiable patient data with Google for the purposes of developing medical diagnostic tools. The health records shared with Google “were de-identified, except that dates of service were maintained” in the dataset, and the dataset also included “de-identified, free-text medical notes.” Marcelo Corrales Compagnucci et al., Box 3: Dinerstein v. Google, in Ethics and Governance of Artificial Intelligence for Health: WHO Guidance, supra note 280, at 38 (citing Alvin Rajkomar et al., Scalable and Accurate Deep Learning with Electronic Health Records, 1(18) NPJ Digit. Med. 6 (2018), https://doi.org/10.1038/s41746-018-0029-1). Another example is hospitals that gain consent to retain discarded biological samples from patients for research purposes when consenting to treatment at a facility.

289. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance, supra note 280, at 40.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Shared Decision-Making

The “professional values of competence, expertise, empathy, honesty, and commitment are all relevant to communicating risk: Getting the facts right and conveying them in an understandable way are not enough.”290 Shared and informed decision-making has emerged as one part of patient care. It distinguishes problem solving that identifies one “right” course that leaves little room for patient involvement from decision-making in which several courses of action may be reasonable and in which patient involvement should determine the optimal choice. In such cases, healthcare choices depend not only on the likelihood of alternative outcomes resulting from each strategy but also on the patient preferences for possible outcomes and their attitudes about risk taking to improve future survival or quality of life and the timing of that risk (whether the risk occurs now or in the future).291

Informed decision-making is

when an individual understands the nature of the disease or condition being addressed; understands the clinical service and its likely consequences, including risks, limitations, benefits, alternatives, and uncertainties; has considered his or her preferences as appropriate; has participated in decision making at a personally desirable level; and either makes a decision consistent with his or her preferences and values or elects to defer a decision to a later time.292

Shared decision-making is “when a patient and his or her healthcare provider(s), in the clinical setting, both express preferences and participate in making treatment decisions.”293 To assist with shared decision-making, health decision aids have been developed to help patients and their physicians choose among reasonable clinical options together by describing the “benefits, harms, probabilities, and scientific uncertainties.”294 In 2007, the legislature in the state of Washington became the first to establish and recognize in law a role for shared decision-making in informed consent.295 The bill goes on to encourage the development, certification, use, and evaluation of decision aids. The consent form

290. Adrian Edwards, Communicating Risks, 327 B.M.J. 691, 691 (2003), https://doi.org/10.1136/bmj.327.7417.691.

291. Michael J. Barry, Health Decision Aids to Facilitate Shared Decision Making in Office Practice, 136 Annals Internal Med. 127, 127 (2002), https://doi.org/10.7326/0003-4819-136-2-200201150-00010.

292. Peter Briss et al., Promoting Informed Decisions About Cancer Screening in Communities and Healthcare Systems, 26 Am. J. Preventive Med. 67, 68 (2004), https://doi.org/10.1016/j.amepre.2003.09.012.

293. Id.

294. Annette M. O’Connor et al., Risk Communication in Practice: The Contribution of Decision Aids, 327 B.M.J. 736, 736 (2003), https://doi.org/10.1136/bmj.327.7417.736.

295. Bridget M. Kuehn, States Explore Shared Decision Making, 301 JAMA 2539, 2539 (2009), https://doi.org/10.1001/jama.2009.867.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

provides written documentation that the consent process occurred, but the crux of the medical consent process is the discussion that occurs between a physician and a patient. Physicians share their medical knowledge and expertise, and patients share their values (health goals) and preferences. It is an opportunity to strengthen the patient–physician relationship through shared decision-making, respect, and trust. Incorporating patient health goals into shared decision-making is supported by the core principles of medical decision analysis: the concept of utility-maximizing decisions, in which the probabilities of possible health outcomes from alternative choices is multiplied by the patient’s utility (a quantitative measure of preference) for those outcomes and summed over all possible probability-outcome pairs for each treatment choice (including no treatment). The intervention with the highest expected utility should be the preferred option.

In six mock trials of malpractice alleging failure to perform a prostate-specific antigen in response to the Merenstein case, a study found that 47 jurors were not willing to accept verbal testimony that a discussion occurred, but when documentation occurred, most (72%) felt that the defendant fulfilled the standard of care—although 28% still felt that screening should be done even if documentation had occurred. When a decision aid was included, 94% felt that the standard of care had been met.296 With regard to variation in care, a decision aid for surgery for a benign enlargement of the prostate decreased the likelihood of surgery versus control in one study but increased it in the other, the difference being the extent of surgery in the control group. In the United States, surgery decreased from 13% in the control group to 8% with a decision aid, and in the United Kingdom, surgery increased from 2% in the control group to 11% with a decision aid.297

Summary and Future Directions

Having sequenced the human genome, medical research is poised for exponential growth as the code for human biology (genomics) is translated into proteins (proteomics) and chemicals (metabolomics) to identify molecular pathways that lead to disease or that promote health. With advances in medical technologies in diagnosis and preventive and symptomatic treatment, the practice of medicine will be profoundly altered and redefined. For example, consider lymphoma, a blood cancer that used to be classified simply by appearance under the microscope as either

296. Michael J. Barry et al., Reactions of Potential Jurors to a Hypothetical Malpractice Suit: Alleging Failure to Perform a Prostate-Specific Antigen Test, 36 J. L. Med. Ethics 396, 400 (2008), https://doi.org/10.1111/j.1748-720X.2008.00283.x.

297. Michael J. Barry et al., Randomized Trial of a Multimedia Shared Decision-Making Program for Men Facing a Treatment Decision for Benign Prostatic Hyperplasia, 1 Disease Mgmt. Clinical Outcomes 5 (1997); Elizabeth Murray et al., Randomised Controlled Trial of an Interactive Multimedia Decision Aid on Benign Prostatic Hypertrophy in Primary Care, 323 B.M.J. 493, 497 (2001), https://doi.org/10.1136/bmj.323.7311.493.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Hodgkin’s or non-Hodgkin’s lymphoma. As science has evolved, it is now further classified by cellular markers that identify the underlying cancer cells as one of two cells that help with immunity (protecting the body from infection and cancer): T cells or B cells. Current research has characterized those cells further by identifying underlying genetic and cellular markers and pathways that distinguish these lymphomas and provide potential therapeutic targets. The growth in the research enterprise, both basic science and clinical translational—the translation of bench research to the bedside or the identification of a clinical need leading to basic science research to further elucidate molecular pathways and develop novel treatments or diagnostics—has greatly expanded research capacity to generate scientific medical research of all types.

With greatly expanded knowledge, research, and specialization, judgments about admissibility and about what constitutes expertise become increasingly difficult and complex. The sifting of this research into sufficiently substantiated, competent, and reliable evidence, however, relies on the traditional scientific foundation: first, biological plausibility and prior evidence; and second, consistent repeated findings. The practice of medicine at its core will continue to be a physician and patient interaction with professional judgment and communication as central elements of the relationship. Judgment is essential because of uncertainties in the underlying professional knowledge or because even if the evidence is credible and substantiated, there may be tradeoffs in risks of harms and benefits for testing and for treatment. Communication is critical because most decisions involve tradeoffs, in which case individual patient preferences for the outcomes that may be unique to a patient and that may affect decision-making should be considered.

In summary, medical terms shared by the legal and medical professions have differing meanings, for example, differential diagnosis, differential etiology, and general and specific causation. The basic concepts of diagnostic reasoning and clinical decision-making and the types of evidence used to make judgments as treating physicians or experts involve the same overarching theoretical issues: (1) alternative reasoning processes; (2) weighing risks, benefits, and evidence; and (3) communicating those risks.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

Glossary of Terms

adequacy. In diagnostic verification, testing a hypothesized diagnosis for its adequacy in explaining the course of the disease.

atrial fibrillation. An abnormal heart rhythm where the heart muscles do not contract together so that blood swirls, leading to blood clots usually in the upper left part of the heart (left atrium). Those clots can then leave the heart and cause strokes or other symptoms.

attending physician. The physician primarily responsible for the patient’s care at the hospital in which the patient is being treated.

Bayes’ theorem (rule). In medicine, it is a method to calculate a post-test probability after a test result in a patient with a pretest probability of disease (suspicion) with additional information such as from a test result by using test characteristics: sensitivity for how well it performs in individuals with disease and specificity for how well it performs in those without disease.

causal reasoning. For physicians, causal reasoning typically involves understanding how abnormalities in physiology, anatomy, genetics, or biochemistry led to the clinical manifestations of disease. Through such reasoning, physicians develop a “causal cascade” or “chain or web of causation” linking a sequence of plausible cause-and-effect mechanisms to arrive at the pathogenesis or pathophysiology of a disease.

chief complaint. The primary or main symptom that caused the patient to seek medical attention.

coherency. In diagnostic verification, testing a particular diagnosis for its coherency involves determining the consistency of that particular diagnosis with predisposing risk factors, physiological mechanisms, and resulting manifestations.

conditional probability. The probability or likelihood of something, given that something else occurred or is present, for example, the likelihood of disease if a test is positive (post-test probability) or the likelihood of a positive test if disease is present (sensitivity). See Bayes’ theorem (rule).

consulting physician. A physician, usually a specialist, who is asked by the patient’s attending physician to provide an opinion regarding diagnosis, testing, or treatment or to perform a procedure or intervention, for example, surgery.

deductive reasoning. The process of having a specific diagnosis in mind and identifying the symptoms, signs, and lab tests that may occur with that diagnosis with the conclusion being certain. See differential diagnosis.

diagnostic test. A test ordered to confirm or exclude possible causes of a patient’s symptoms or signs (distinct from screening test).

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

diagnostic verification. The last stage of narrowing the differential diagnosis to a final diagnosis by testing the validity of the diagnosis for its coherency, adequacy, and parsimony.

differential diagnosis. A set of diseases that clinicians consider as possible causes of the patient’s complaint until the diagnosis is nearly final (diagnostic verification).

differential etiology. Term used by the court or witnesses to establish or refute external causation for a plaintiff’s condition. For physicians, etiology refers to cause.

external causation. External causation is established by demonstrating that the cause of harm or disease originates from outside the plaintiff’s body, for example, a defendant’s action or product.

general causation. General causation is established by demonstrating, usually through scientific evidence, that a defendant’s action or product causes (or is capable of causing) disease.

heuristics. Quick automatic rules of thumb or cognitive shortcuts often involving pattern recognition that facilitate rapid diagnostic and treatment decision-making. Use of heuristics or “thinking fast” may predispose to known cognitive errors, even among experts. See hypothetico-deductive.

hypothesis generation. A limited list of potential diagnostic hypotheses in response to symptoms, signs, and lab test results. See differential diagnosis.

hypothesis modification. A change in the list of diagnostic hypotheses (differential diagnosis) in response to additional information, for example, symptoms, signs, and lab test results. See differential diagnosis.

hypothesis refinement. A change in the likelihood of the potential diagnostic hypotheses (differential diagnosis) in response to additional information, for example, symptoms, signs, and lab test results. As additional information emerges, physicians evaluate those data for their consistency with the possibilities on the list and whether those data would increase or decrease the likelihood of each possibility. See differential diagnosis.

hypothetico-deductive. Deliberative and analytical reasoning involving hypothesis generation, hypothesis modification, hypothesis refinement, and diagnostic verification. Thinking “slow” is typically applied for problems outside an individual’s expertise or difficult problems with atypical issues, as it may avoid known cognitive errors. See heuristics.

individual causation. See specific causation.

inductive reasoning. The process of arriving at a diagnosis based on symptoms, signs, and lab tests with the conclusion being probable rather than certain. See differential diagnosis.

intracranial hemorrhage. Bleeding that occurs within the substance of the brain.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

myocardial infarction. The technical term for a heart attack.

osteopaths. Doctors of osteopathy (D.O.s), who receive similar training, licensure, and credentialing as medical doctors (M.D.s).

overdiagnosis. Screening and diagnostic testing can lead to the detection of apparent disease that may never cause harm, for example, the identification of slow-growing cancers that, even if untreated, would never cause symptoms or reduce survival because the screening test cannot distinguish the cancerous-appearing cells that would become symptomatic or cause mortality from those that would never do so. See overtreatment.

overtreatment. The treatment of patients with pseudo disease whose disease would never cause symptoms or reduce survival. The treatment may place patients at risk for treatment-related morbidity and possibly mortality without any potential for benefit. See overdiagnosis.

parsimony. In diagnostic verification, testing a particular diagnosis for its parsimony involves choosing the simplest single explanation as opposed to requiring the simultaneous occurrence of two diseases to explain the findings.

pathogenesis. See causal reasoning.

pathology test. Microscopic examination of body tissue typically obtained by a biopsy or during surgery to determine if the tissue appears to be abnormal (different than would be expected for the source of the tissue). The visual components of the abnormality are typically described (e.g., types of cells, appearance of cells, scarring, effect of stains or molecular markers that help facilitate identification of the components) and on the basis of visual pattern, the abnormality may be classified, for example, malignancy (cancer) or dysplasia (precancerous).

post-test probability. The suspicion or probability of a disease after additional information (such as from a test) has been obtained. The predictive value positive (or positive predictive value) is the probability of disease in those known to have a positive test result. The predictive value negative (or negative predictive value) is the probability of disease in those known to have a negative test result.

predictive value. The predictive value of a test when applied to a specific population of patients with a specific prevalence of disease.

pretest probability. The suspicion or probability of a disease before additional information (such as from a test) is obtained. Also referred to as prior probability.

prior probability. See pretest probability.

sarcoidosis. A collection of inflammatory cells called granulomas causing scar tissue that collect most often in the lungs or skin but can involve multiple other parts of the body.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

screening test. A test performed in the absence of symptoms or signs to detect disease earlier—for example, cancer screening (distinct from diagnostic test).

sensitivity. Likelihood of a positive finding (usually referring to a test result but could also be a symptom or a sign) among individuals known to have a disease (distinct from specificity).

sign. An abnormal physical finding identified at the time of physical examination (distinct from symptoms).

specific causation. Established by demonstrating that a defendant’s action or product is the cause of a particular plaintiff’s disease. Also referred to as individual causation.

specificity. Likelihood of a negative finding (usually referring to a test result but could also be a symptom or a sign) among individuals who do not have a particular disease (distinct from sensitivity).

symptoms. The patient’s description of a change in function, sensation, or appearance (distinct from sign).

syndrome. A collection of symptoms and signs that together characterize a specific disease or a specific group of diseases.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

References on Medical Testimony

Lynn S. Bickley et al., Bates’ Guide to Physical Examination and History Taking (13th ed. 2020).

Gerd Gigerenzer, Calculated Risks: How to Know When Numbers Deceive You (2002).

Trisha M. Greenhalgh & Paul Dijkstra, How to Read a Paper: The Basics of Evidence-Based Healthcare (7th ed. 2024).

Gordon Guyatt et al., Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice (3d ed. 2014).

Miguel A. Hernan & James M. Robins, Causal Inference: What If (2024).

Guidon W. Imbeds & Donald B. Rubin, Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (2015).

Jerome P. Kassirer et al., Learning Clinical Reasoning (2d ed. 2010).

National Academies of Sciences, Engineering, and Medicine, Improving Diagnosis in Health Care (2015).

Harold C. Sox et al., Medical Decision Making (3d ed. 2024).

Sharon E. Straus et al., Evidence-Based Medicine: How to Practice and Teach EBM (5th ed. 2018).

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.

This page intentionally left blank.

Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1107
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1108
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1109
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1110
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1111
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1112
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1113
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1114
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1115
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1116
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1117
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1118
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1119
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1120
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1121
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1122
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1123
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1124
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1125
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1126
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1127
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1128
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1129
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1130
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1131
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1132
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1133
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1134
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1135
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1136
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1137
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1138
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1139
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1140
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1141
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1142
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1143
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1144
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1145
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1146
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1147
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1148
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1149
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1150
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1151
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1152
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1153
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1154
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1155
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1156
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1157
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1158
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1159
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1160
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1161
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1162
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1163
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1164
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1165
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1166
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1167
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1168
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1169
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1170
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1171
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1172
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1173
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1174
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1175
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1176
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1177
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1178
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1179
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1180
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1181
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1182
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1183
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1184
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1185
Suggested Citation: "Reference Guide on Medical Testimony." National Academies of Sciences, Engineering, and Medicine and Federal Judicial Center. 2025. Reference Manual on Scientific Evidence: Fourth Edition. Washington, DC: The National Academies Press. doi: 10.17226/26919.
Page 1186
Next Chapter: Reference Guide on Neuroscience
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.