Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation (2025)

Chapter: 3 Risks of Generative Artificial Intelligence in Health and Medicine

Previous Chapter: 2 Opportunities and Early Evidence for Generative Artificial Intelligence in Health and Medicine
Suggested Citation: "3 Risks of Generative Artificial Intelligence in Health and Medicine." National Academy of Medicine. 2025. Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation. Washington, DC: The National Academies Press. doi: 10.17226/28907.

3
RISKS OF GENERATIVE ARTIFICIAL INTELLIGENCE IN HEALTH AND MEDICINE

Generative AI (GenAI) poses several risks, which are important to identify and mitigate to effectively use the technology. Primary risks of GenAI include data privacy and security, bias, output limitations, algorithmic brittleness, and hallucinations.

DATA PRIVACY AND SECURITY

Key among the risks of GenAI is the concern over data privacy and security. GenAI systems require access to vast amounts of sensitive patient data, and it is important to ensure these data are handled securely. Because the output of GenAI algorithms is fed back into the transformer engine to refine future output, there is a significant risk of sensitive information disclosure to non-authorized parties. Health systems and related entities will need to understand GenAI data flows and uses, then take measures to ensure that protected health information is not inadvertently shared in the public domain.

BIAS AND EQUITY

As with other forms of AI, GenAI solutions fundamentally could worsen equity, given the risks of using non-representative or biased data in generating its output or applying its output in inequitable ways. Accordingly, it will be important to continually monitor the use of GenAI in the context of health equity and move to address any inequities to which it may be contributing. Large language models (LLMs) and GenAI rely on vast amounts of training data to generate responses and recommendations. If the training data are biased, incomplete, or incorrect, or are not representative of the activities, populations, and specific environments

Suggested Citation: "3 Risks of Generative Artificial Intelligence in Health and Medicine." National Academy of Medicine. 2025. Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation. Washington, DC: The National Academies Press. doi: 10.17226/28907.

to which resultant GenAI will be applied, the result can worsen care or lead to unfair treatment for certain patient groups, affecting diagnostic accuracy and treatment recommendations. Models trained using historical data may reflect biases already present in the health care system. If certain demographic groups are underrepresented in the training data or if there are disparities in how patients from different backgrounds are diagnosed and treated, the model may perpetuate these human-induced biases. A phenomenon known as AI drift occurs when LLMs begin to exhibit unpredictable or unwanted behaviors that deviate from their original parameters. This may develop when new data points are introduced or when attempts are made to improve certain components of AI models, inadvertently causing its performance to degrade over time. As a result, the model may provide skewed diagnostic or treatment recommendations, which can negatively affect patient care and perpetuate health disparities. Transparency and rigorous validation methodologies that employ bias detection are essential for identifying and mitigating bias in health care algorithms.

OUTPUT LIMITATIONS

The lack of originality in LLM-generated responses for diagnoses can diminish the quality of patient care. To understand this, consider a scenario in which a health care organization uses a GenAI-powered chatbot to assist patients with health inquiries and symptom assessments. When patients interact with the chatbot and describe their symptoms or medical history, the responses generated are based on pre-existing patterns learned from previous training data. As a result, they may only get generic or repetitive answers that lack specificity. If multiple patients present with similar symptoms, the chatbot may provide identical responses for each, without considering each patient’s specific medical history, leading to frustration, dissatisfaction, and lack of trust in the technology. To address this issue, health care organizations can implement strategies like incorporating dynamic response generation techniques that consider individual patient characteristics and contextual factors and continuously update and refine models based on feedback from patients.

Furthermore, the use of generative AI in clinical settings could pose negative outcomes or harm patients if not responsibly designed, developed, and deployed. Safeguards are crucial to help minimize risks and ensure patient safety. For example, health care organizations should create processes to assess the use of new AI tools, monitor AI-related safety issues, and identify and address challenges. While health care organizations may vary in size and resources, federal guidelines such as the National Institute of Standards and Technology AI Risk Management Framework

Suggested Citation: "3 Risks of Generative Artificial Intelligence in Health and Medicine." National Academy of Medicine. 2025. Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation. Washington, DC: The National Academies Press. doi: 10.17226/28907.

and the Executive Order on Safe, Secure, and Trustworthy Development and Use of AI provide a foundation for establishing consistent practices in AI governance, promoting ethical usage, and ensuring patient safety across diverse settings (National Institute of Standards and Technology, 2024; The White House, 2023).

ALGORITHMIC BRITTLENESS

Another risk is algorithmic brittleness, which refers to an algorithm’s vulnerability to errors, breakdowns, or failures when encountering unexpected inputs or situations (Eliot, 2024). While an algorithm may perform well under specific conditions, it can yield inaccurate or unreliable results when dealing with data or scenarios different from its training environment. Brittleness arises from an algorithm’s inability to effectively generalize across datasets or adapt to environmental changes. Consequently, brittle algorithms lack robustness and reliability, posing challenges in critical domains like health care.

GENAI “HALLUCINATIONS” OR “CONFABULATIONS”

A “hallucination” or “confabulation” is when a GenAI tool produces information that is incorrect, misleading, or not based on fact despite seeming plausible (Farquhar, 2024). Hallucinations, as we refer to them here, are a consequence of the way GenAI is designed to predict and generate novel, but seemingly relevant, words and phrases. Awareness of this inherent limitation is essential among users who might otherwise take the seemingly accurate output at face value.

The following case illustrates this scenario: After discovering a study on tick behavior and following up with a query for the referenced scientific paper supporting the study claims, GenAI provided a citation to a paper authored by known experts in the field, complete with a reference. On further investigation it was discovered that the paper’s title, authors, and other citation information were fake (Goddard, 2023). The impact of hallucinations or fake information from using GenAI in health care underscore the importance of validating generated outputs, maintaining transparency in systems’ capabilities and limitations, and integrating human oversight into health care processes to ensure safety and quality of care. For example, one reliable approach to reduce hallucinations is to use two models to address the same issue, then compare their outputs. Discrepancies can reveal inaccuracies, biases, or areas prone to hallucinations. Additionally, effective prompt engineering, combined with high-quality knowledge bases and fact-checking models that cross-reference verified sources, enhances accuracy and minimizes hallucinations by grounding responses.

Suggested Citation: "3 Risks of Generative Artificial Intelligence in Health and Medicine." National Academy of Medicine. 2025. Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation. Washington, DC: The National Academies Press. doi: 10.17226/28907.

COLLABORATION TO MITIGATE RISKS

To mitigate these and other GenAI risks, the path forward for integrating GenAI into health care should involve engaging multiple stakeholders, including health care providers, patients, policy makers, technology developers, insurance providers, medical researchers, and ethicists. Collaboration among these groups is essential to ensure that the deployment and use of GenAI are intentional, coordinated, and ethically sound. It is important for stakeholders to work together to establish guidelines and regulations that protect patient data, ensure fairness and transparency, and evaluate the effectiveness and safety of GenAI applications in health care.

Suggested Citation: "3 Risks of Generative Artificial Intelligence in Health and Medicine." National Academy of Medicine. 2025. Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation. Washington, DC: The National Academies Press. doi: 10.17226/28907.
Page 11
Suggested Citation: "3 Risks of Generative Artificial Intelligence in Health and Medicine." National Academy of Medicine. 2025. Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation. Washington, DC: The National Academies Press. doi: 10.17226/28907.
Page 12
Suggested Citation: "3 Risks of Generative Artificial Intelligence in Health and Medicine." National Academy of Medicine. 2025. Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation. Washington, DC: The National Academies Press. doi: 10.17226/28907.
Page 13
Suggested Citation: "3 Risks of Generative Artificial Intelligence in Health and Medicine." National Academy of Medicine. 2025. Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation. Washington, DC: The National Academies Press. doi: 10.17226/28907.
Page 14
Next Chapter: 4 Application Readiness Cadence
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.