Dropdown items
My Academies

Personal Library

Account settings

Aligning Investments in Therapeutic Development with Therapeutic Need: Closing the Gap (2025)

Chapter: Appendix D: IHME Methods

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter
Page of 228
Search this publication

Previous Chapter: Appendix C: Disclosure of Unavoidable Conflicts of Interest

Page 225 Cite Bookmark

Suggested Citation: "Appendix D: IHME Methods." National Academies of Sciences, Engineering, and Medicine. 2025. Aligning Investments in Therapeutic Development with Therapeutic Need: Closing the Gap. Washington, DC: The National Academies Press. doi: 10.17226/29157.

D

IHME Methods

The Institute for Health Metrics and Evaluation’s mapping of drug–use pairs to Global Burden of Disease (GBD) categories involved the following steps: (1) We identified drug uses in the Evaluate Pharma database, covering current drugs for the top 20 pharmaceutical companies, and pipeline drugs for all companies; (2) for validation, we manually mapped drug–use pairs to GBD conditions (causes, risk factors, impairments, injuries, or pathogens) for two companies’ current and pipeline portfolios; (3) we then applied a large language model (LLM) to assign drug–use pairs to GBD categories, using the manual mappings as a benchmark for optimizing our input configuration; (4) this highest performing LLM method was used to map the current portfolios of the top 20 pharmaceutical companies and pipeline portfolios for all companies; and (5) we compared these pharmaceutical portfolios by GBD cause to the respective disease burden. The remaining sections in this document provide additional information about each of these steps.

Identification of Drugs and Drug–Use Pairs

We used the Evaluate Pharma database to identify both current pharmaceutical products and pipeline pharmaceutical products. To discover all uses for each of the current drugs, we mapped drug names from the Evaluate Pharma database to reference sources (e.g., Redbook) that specify the use of each drug. For pipeline drugs, we relied on the “specified use” variable in the Evaluate Pharma database.

Page 226 Cite Bookmark

Manual Mapping of Drug–Use Pairs to Create a Validation Dataset

To assess and optimize the performance of the LLM-based mapping, we created a validation dataset from Pfizer and Sanofi’s current and pipeline drug portfolios. Two independent coders mapped each drug–use pair to GBD causes, risk, and injury codes, with a third reviewer resolving any discrepancies. We also compared LLM-based assignments to manual mappings to refine the validation dataset. In addition to causes, other entities were included as options for mapping. The final mapping included 334 causes, 47 injury codes, 18 noncause groupings, 4 risk factors, and the heart failure impairment.

Performance Optimization of an LLM-Based Classification

We supplied the LLM with drug–use pairs and a list of GBD conditions, instructing it to identify the most relevant condition. We refined the prompt to enhance accuracy, using our validation set to evaluate improvements. We also tested different foundational models, including GPT4, o1-mini, and o1-preview. In addition to prompt refinement, we undertook a range of performance optimization approaches. These included the provision of condition keywords generated through a separate LLM process and an adjudication process, whereby we used multiple LLM instances, each with its own medical specialty focus, with a final LLM instance determining the most likely condition assignment.

The table below describes concordance between different LLM approaches that vary according to the foundational model used, whether condition keywords were provided to the LLM, and whether an adjudication

	Level 1 Cause	Level 2 Cause	Level 3 Cause	Level 4 Cause
o1-preview with keywords, adjudicated	98.5%	96.0%	93.9%	92.8%
o1-preview with keywords	98.3%	95.3%	93.0%	93.0%
o1-preview without keywords	97.0%	91.8%	84.8%	83.8%
o1-mini without keywords	97.1%	90.5%	83.5%	85.7%
o1-mini with keywords	97.3%	91.6%	86.5%	91.7%
GPT-4 with keywords	95.3%	87.5%	80.1%	85.6%

Page 227 Cite Bookmark

process was used. We evaluated concordance at the four levels of the GBD cause hierarchy, with higher levels indicating greater granularity. The highest performing approach was one that uses the o1-preview foundational LLM, condition keywords, and adjudication (limited to instances where the initial classification by the LLM had a confidence level less than or equal to 80 percent).

Application of the Optimized LLM Approach and Postprocessing

Using Evaluate Pharma, we extracted the most recent product data as of February 2025. We then applied our most accurate LLM method for classifying the complete dataset, which includes over 7,000 current and pipeline products from the top 20 companies and over 37,000 additional pipeline products from other companies. Some adjustments were made to the LLM outputs. Specifically, for a small number of cases where the LLM’s assignments did not match any valid condition in our hierarchy, we manually mapped the drug–use pairs to the correct condition.

Comparison of Pharmaceutical Portfolios by Cause Against the Corresponding Disease Burden

This analysis encompassed pharmaceutical products globally, both on-market and in development. Comparison of findings to disease burden was made for current drugs to 2021 disease burden and for pipeline drugs to 2030 forecasted disease burden, as defined by GBD 2021.