Previous Chapter: 1 Introduction
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

2
Insights from the Artificial Intelligence Field

The first panel, chaired by Leo Chiang, senior research and development fellow of the Dow Chemical Company, and Shaheen Dewji, assistant professor in the Nuclear and Radiological Engineering and Medical Physics Program at the Georgia Institute of Technology, set the stage for the rest of the meeting by offering insights from artificial intelligence (AI) developers, users, and government regulators on balancing the benefits of rapid AI innovation with the risks, led by thought leaders to foster community connections. The topics included principles for responsible AI development, human decision making in AI, ethical considerations, and acceptable uncertainty levels for AI model outputs.

TRANSFORMATIONAL AI OPPORTUNITIES IN HEALTHCARE

David C. Rhew, global chief medical officer and vice president of healthcare for Microsoft, began by offering several examples of how AI is already being used in healthcare and then sketching what will be important in the future if AI is to assume a major role in clinical care. His presentation had four main components: an examination of how to leverage technological innovation to enable the use of AI in healthcare, how to ensure trust related to AI in healthcare, how to democratize AI, and how to develop a workforce that can take advantage of AI’s potential for use in healthcare.

Rhew began by saying that while most people associate AI in healthcare with improving operational efficiencies and enhancing the overall care experience, the technology’s greatest impact will likely come from enabling entirely new capabilities, particularly in making healthcare more proactive than reactive. The concept of preventing illness before it occurs has existed for decades, but healthcare practitioners have historically been limited to a small set of tools including education, symptom awareness training, vaccination programs, and basic screening initiatives. AI now offers a powerful new addition to this tool kit, providing the potential to screen large populations and identify at-risk individuals before they become symptomatic and enabling targeted interventions that could dramatically improve health outcomes.

Rhew stated that the current healthcare system requires individuals to seek clinical care actively, but many people have minimal contact with healthcare providers until they experience serious issues such as heart attacks, strokes, or advanced cancers that can no longer be ignored. This population represents exactly the group that healthcare systems hope to reach earlier in the disease process, and AI holds tremendous promise for identifying and engaging these individuals before they reach crisis points.

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

One of the most compelling examples, in Rhew’s opinion, of AI’s transformative potential lies in routine eye examinations. Traditionally, retinal images captured during eye exams remained stored on imaging equipment with little additional analysis beyond the immediate examination. However, AI can now analyze these retinal images to screen for multiple conditions including diabetic retinopathy, cardiovascular disease, hypertension, chronic kidney disease, ophthalmologic conditions, and even neurovascular disorders.

This advancement has been made possible by optical coherence tomography (OCT), which provides three-dimensional images at submicron resolution compared to the two-dimensional surface images produced by traditional fundoscopy. OCT enables identification of arterioles, nerves, and surrounding retinal structures, essentially providing magnetic resonance imaging–level detail at a fraction of the cost and with much greater accessibility. While OCT technology itself is not new, recent innovations have automated the data acquisition process, reducing examination time from 20 minutes with a clinician to just 2 minutes without clinical supervision, making it practical for routine screening.

Rhew provided the example that Stanford Medical School has implemented this approach in satellite clinics, where patients receive routine OCT examinations alongside standard vital sign measurements. The results have been dramatic; Healthcare Effectiveness Data and Information Set (HEDIS) scores improved from 20–40 percent to more than 80–90 percent in participating clinics, while overall system HEDIS scores have increased substantially. The technology also generates revenue through established Current Procedural Terminology codes.

Beyond diabetic retinopathy detection, OCT serves as a portal into systemic health conditions throughout the body. Research published in Nature Communications demonstrated that chronic kidney disease causes thinning of choroidal and retinal membranes in the eye, while kidney transplantation restores choroidal membrane thickness but not retinal membrane thickness (Farrah et al., 2023). This reveals that membrane thickness detected by OCT serves as a dynamic indicator of glomerular function, illustrating how systemic diseases affect blood vessels and membranes uniformly throughout the body.

Current research is exploring connections between ocular and neurovascular conditions including Alzheimer’s disease, multiple sclerosis, and Parkinson’s disease. The eye may ultimately serve as a comprehensive screening and monitoring platform for numerous systemic conditions, representing a paradigm shift toward holistic health assessment through a single accessible examination.

Rhew then discussed how AI applications in electronic health record (EHR) analysis may reveal gaps in clinical care and missed diagnoses. While ideal medical practice would involve evidence-based care without any gaps, reality demonstrates wide variability in clinical performance. AI analysis of medical records could close this gap and potentially identify patients with diseases that were missed during clinical encounters.

An AI platform developed by Pangaea Data analyzed medical records and identified 296 percent more lung cancer patients than had International Classification of Diseases codes for lung cancer, 179 percent more patients with ovarian cancer, and 71 percent more patients with lupus nephritis. Among cancer patients specifically, AI analysis revealed numerous cases where appropriate biomarker testing was not performed, and when testing was completed, results were not concordant with prescribed treatments; this suggests that diagnostic tests were ordered but not acted upon. He said, the frequency of such clinical oversights is surprisingly high and represents a significant opportunity for improvement.

Rhew noted that he is seeing some programs across the country now use AI to analyze medical records and identify care gaps, particularly focusing on patients who are eligible for specific treatments but are not receiving appropriate care. He believes this application has substantial implications for clinical trial enrollment and overall patient management, ensuring that eligible patients receive optimal care and are considered for relevant research opportunities.

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

Another frontier in AI-powered healthcare applications that Rhew mentioned was voice analysis. While healthcare providers have long used dictation and transcription services for medical documentation, and ambient voice technologies now convert clinician–patient conversations into medical records, the capture of voice data opens new possibilities for AI analysis of both patient and clinician speech patterns.

Existing AI programs can analyze voices to detect depression, stress, and anxiety, for example, with validation against established screening tools. This capability eliminates the need for patients to complete traditional survey instruments, providing real-time assessment of mental and emotional states during clinical encounters. These tools can identify and quantify depression and anxiety levels, assist in clinical trial candidate identification, and monitor clinician burnout, representing a significant expansion of diagnostic capabilities without additional patient burden.

Rhew noted, disease identification through AI represents only the first step in a three-phase approach to population health improvement. The subsequent phases involve risk stratification, triage, and capacity building (the overall improvement in the organization’s ability to produce, perform, or deploy), followed by education, engagement, and activation. These latter phases often present the greatest barriers, particularly in underserved and rural communities, but AI can contribute significantly to each component.

Identifying large numbers of individuals with specific conditions and treatment needs, especially in underserved areas, can overwhelm already stressed healthcare systems, increasing wait times and reducing timely care delivery. Risk stratification and triage may become essential for prioritizing the highest-risk patients, while capacity building can help ensure the system can accommodate increased demand. Many patients of specialists and primary care physicians already face extended wait times, making capacity optimization crucial.

Rhew offered one example of how AI could help with this. AI-powered pre-screening could identify higher-risk patients requiring specialist care while directing other cases to optometrists, enabling ophthalmologists to practice at higher complexity levels while optimizing overall system efficiency. Optometrists represent one of the fastest-growing healthcare professions, making this approach particularly viable for expanding access to eye care (Tang, 2025).

The Alliance for Healthcare from the Eye exemplifies this comprehensive approach by involving multiple government agencies, healthcare systems, and payers in an initiative to screen the entire U.S. population for advanced-stage chronic disease using OCT-based screening methods deployed in Stanford-associated clinics nationwide. This program aims to demonstrate the feasibility of population-wide AI-powered health screening.

Patient engagement represents the final component of population health improvement. Rhew noted that simply identifying diseases and building capacity does not guarantee patient participation in care. Educating about risks and treatment options, enhancing healthcare system engagement, and addressing social determinants of health all contribute to successful population health programs, he said. No single organization can address all these components effectively, highlighting the importance of partnerships among organizations specializing in different aspects of the care continuum.

Rhew said that these partnerships include organizations capable of identifying individuals through imaging, blood biomarkers, wearable devices, and EHR analysis, as well as entities that can perform risk stratification and capacity building. Digital health companies provide education, behavioral nudging, and ongoing support through chatbots and other AI-powered mechanisms. He stated that successful implementation requires coordinated integration of these diverse capabilities rather than isolated problem-solving approaches.

Rhew then discussed the Trustworthy and Responsible AI Network, established in March 2024 as a consortium spanning the United States and Europe, which focuses on key principles of responsible AI including transparency, explainability, validation, and bias assessment. The consortium emphasizes efficient processes for establishing responsible AI within institutions while optimizing time, resources, and costs.

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

He outlined three primary goals that guide efficient operationalization of responsible AI. First, technology could facilitate implementation rather than relying solely on manual processes such as individual model reviews in meetings. Second, collaboration among health systems enables shared learning at the model level. Third, ensuring that no systems are left behind addresses resource disparities that might prevent some organizations from participating in AI advancement.

The consortium identified four essential components for institutional AI leadership. First, registration involves identifying all AI systems currently operating within an organization, which involves defining what constitutes “AI” and focusing specifically on clinical AI applications. The fundamental principle that improvement involves measurement and measurement warrants tracking makes AI registration important for responsible implementation using standardized “model cards.”

Second, testing encompasses both pre- and post-deployment evaluation of AI systems on local datasets. Rhew mentioned that, previously, organizations often assumed that externally developed models had undergone sufficient testing and could be trusted for local applications. However, rapid AI advancement has highlighted the importance of local testing and ongoing monitoring to ensure reasonable performance on institutional datasets.

Third, bias assessment involves evaluating AI systems for discriminatory outcomes and implementing mitigation measures. Monitoring extends beyond model performance to include outcome assessment, with particular attention to subpopulation performance to detect bias in model results. Fourth, governance necessitates establishing scalable processes for responsible AI oversight to ensure systematic management of AI implementations across the organization.

Rhew discussed how various technologies support these four areas, may include tools for identifying hallucinations in large language models (LLMs), bias assessment capabilities, and quality assurance mechanisms (Bird, 2024). However, technology alone is insufficient; examining outcomes introduces implementation variables that significantly impact performance.

The initial deployment of EHRs illustrates the importance of implementation factors. Rhew said that despite assumptions that superior user interfaces would produce better outcomes, research demonstrated similar performance across different EHR systems, with the greatest variability occurring within the same EHR due to implementation differences. He stated that this emphasizes the importance of understanding not only AI model outcomes but also implementation details including who receives outputs, timing of delivery, format of presentation, and subsequent actions taken.

Democratizing healthcare AI involves enabling rural and resource-limited hospitals to access AI advances comparable to those of academic and urban medical centers. Most organizations currently attempt to build proprietary models, essentially “reinventing the wheel,” despite the difficulty of developing models that incorporate multiple data modalities.

Healthcare AI models commonly utilize EHR data, imaging data, and genomics data. Multimodal models combining two or more of these data types provide more comprehensive patient assessments and enable three distinct capabilities: Risk categorization assigns patients to low-, medium-, or high-risk groups. Cohort matching places patients into groups similar to entire databases, enabling outcome predictions for patient populations without specific clinical trial data through longitudinal database analysis. Predictive modeling forecasts specific patient outcomes based on comprehensive data integration.

Rhew noted that “adapted multimodal AI models”—those that can process both images and text—could serve multiple organizations without requiring individual development efforts. Many AI tasks are common across organizations and models. He noted that in imaging applications, structure identification represents a fundamental requirement for all foundation models, involving boundary detection that every group should implement. Rather than each group developing these basic capabilities independently, models can be fine-tuned to adapt pre-trained foundation models to specific datasets and questions, such as cancer subtype analysis. This

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

approach LLMs as general reasoning engines, enabling users to query foundation models, perform tasks, and generate results.

This embedded approach—where AI capabilities are integrated directly into existing clinical workflows and imaging systems—involving model adaptation to specific datasets and questions with general reasoning capabilities, offers significant advantages over traditional single-model applications to various datasets. Reduced hallucination susceptibility represents a key benefit, leading to widespread adoption of this approach across various AI applications.

Recent work published in Nature describes a whole-slide foundation model trained on Providence’s collection of 1.3 billion pathology image tiles from 171,000 whole slides representing more than 30,000 patients from 28 cancer centers across 31 major tissue types (Xu et al., 2024). This foundation model enables gene mutation prediction and cancer subtyping and has been made open source to accelerate development, following a trend toward open-source foundation model availability.

Rhew commented that rural hospitals would have to address fundamental infrastructure needs before deploying advanced technologies such as AI. For example, he continued, affordable broadband access represents a primary requirement and is supported by significant national initiatives. Cloud access with integrated cybersecurity represents another essential component, and one that Microsoft is hoping to address through their Cybersecurity Program for rural hospitals.

Rhew described efforts to make Rural Health AI Lab consortia focuses on AI applications specifically relevant to rural hospitals, particularly cybersecurity applications and insurance claim denial assistance. More than 550 rural hospitals participate in these efforts, demonstrating growing recognition of the benefits of a trustworthy, responsible AI network.

He went on to explain that successful network creation extends beyond technology to collaborative approaches, particularly hub-and-spoke models where high-resource settings partner with low-resource settings to share technologies, best practices, and costs. Data sharing represents only one component; ensuring that model developers feel confident about intellectual property protection is equally important.

Rhew noted that federated learning strategies may be developed that maintain data privacy, but some model developers avoid participation due to model exposure requirements, as models represent highly sensitive intellectual property. Security enclaves where models are maintained and used for data analysis, combined with data-in-use encryption, ensure protection of both models and data during analysis.

Rhew stated that he believes healthcare AI workforce development could address a transition similar to that of previous technological revolutions, in which certain jobs become obsolete while others emerge. This transition uniquely affects knowledge workers, including clinicians, lawyers, and engineers, rather than primarily impacting those in manual labor roles.

To address this, Rhew provided the example of how other companies are thinking about next-generation healthcare jobs to include AI model engineers, prompt engineers, AI data scientists, AI developers, AI governance specialists, and Machine Translation Post-Editors. According to McKinsey predictions, these translators who interpret and adapt AI for businesses and users will experience the greatest demand, representing a critical workforce development consideration (Singla, 2025).

Rhew highlighted how Microsoft is working on a three-step program in collaboration with multiple sites nationwide to address workforce transition needs: Education provides fundamental AI knowledge and an understanding of related ethical considerations. Skills development enables effective use of tools such as prompt engineering and AI collaboration techniques. Job specification defines new role requirements and responsibilities.

Rhew explained that AI transformation affects diverse roles including call center agents, legal clerks, and numerous other positions. Specifying new job requirements and building programs that provide both certification and employment opportunities includes collaboration between academia and industry. This comprehensive

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

approach ensures that workforce development aligns with technological advancement while providing meaningful career pathways for affected workers.

The integration of AI into healthcare represents a fundamental shift from reactive to proactive care delivery, with applications spanning from routine screening to complex population health management. Rhew stated that he believes success requires coordinated implementation of technological capabilities, responsible governance frameworks, and comprehensive workforce development to ensure that AI’s transformative potential benefits all healthcare stakeholders while maintaining safety, equity, and effectiveness standards.

PROMOTING A SAFE AND EFFECTIVE CLINICAL ENVIRONMENT FOR AI

Mike Tilkin, chief information officer and executive vice president for technology at the American College of Radiology (ACR), spoke about the journey to safe and effective AI adoption, with a focus on what the ACR has been doing to reach that point.

Tilkin began by providing background on ACR, an organization more than a century old with approximately 40,000 members including radiologists and radiation oncologists. The group’s primary mission is “to serve patients and society by empowering [its] members to advance the practice, science, and professions of radiological care.” In 2017, the board decided to “really lean into” the emerging area of AI and data science, establishing as a basic tenet that “we wanted to advance data science as core to clinically relevant, safe, and effective radiologic care.” Recognizing that AI and data science would become central components of medicine and radiology, the organization has engaged “professionals and volunteers and really quite a bit of machinery of the ACR” in contemplating the AI revolution.

From the provider perspective, Tilkin presented data from a 2024 AI survey conducted by ACR’s Data Science Institute. Nearly 1,000 ACR members responded, with 86 percent reporting they were using AI in some part of their practice—a dramatic increase from only 58 percent in 2022. When asked about their motivations for using AI, radiologists most commonly cited desires to automate monotonous tasks, save time, and enhance accuracy or precision. “Clearly there is an appetite in the radiology community for this technology,” Tilkin noted.

The attitude toward AI in the radiology community has evolved significantly over the past decade. In 2017, AI-related questions were typically “What is this AI?” and “What is it going to do for me?” Today, after years of hearing about AI’s revolutionary potential, the questions have shifted to “Why isn’t it helping me do my work better?” Tilkin said, this change reflects the community’s strong interest in realizing the promised benefits of AI technology.

However, significant concerns persist. The 2024 survey identified the two greatest AI-related concerns as issues with accuracy and errors and the potential for liability. These concerns highlight the importance of AI solutions that ensure accuracy and compliance with safety protocols, provide risk mitigation strategies for adopting practices, and highlight trustworthy products that mitigate medico-legal liability. According to the survey, most in the radiology community trust AI moderately or somewhat.

The survey results revealed mixed success in meeting expectations. Three-quarters of respondents said AI met or exceeded their expectations on ease of use, but half reported it failed to meet expectations on return on investment and supporting clinical decision making. Specifically, only 42 percent agreed that “I find most AI outputs to be clinically meaningful,” and just 18 percent agreed that “AI outputs frequently modify my clinical decision making.” While much work remains, Tilkin noted that the technology is still very new, and current solutions are early iterations.

Transitioning to marketplace dynamics, Tilkin drew on concepts from Everett M. Rogers’s classic 1962 book Diffusion of Innovations. When new technology is introduced, early adopters tend to be risk takers will-

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

ing to “wade through a problematic technology and try and realize the benefit.” The next user group is more data-driven, wanting to see proven solutions and requiring vendors to demonstrate value clearly. Finally, late adopters are skeptics with low tolerance for failure who need solid infrastructure and platforms that ensure success, because “they are just not going to tolerate subpar solutions.”

Tilkin also referenced the Gartner Hype Cycle, which describes how expectations for new technologies cycle through early development, a peak of inflated expectation, a trough of disillusionment, and finally realistic expectations as technology matures (Gartner, 2025). For AI in radiology, the peak of inflated expectation occurred around 2017 after Nobel Prize winner Geoff Hinton predicted that AI would make radiologists obsolete within 5 years (Byju, 2024). Today, AI in radiology is in the maturity phase, with numerous market products and clearer expectations emerging.

The ACR has been working for years to identify gaps in this technology while developing standards and interoperability. “For this market to mature,” Tilkin explained, “there are going to have to be sufficient standards that the vendors can build mature workflow solutions.” Simultaneously, the field is working with regulators to help understand testing requirements for this new technology while addressing ongoing concerns about ethics and bias in AI radiology applications.

The evolution of the field is evident in annual summits held by ACR with the Society for Imaging Informatics in Medicine. In 2018, their “conversations were very much general: How do we think about the economics of AI? How do we think about getting data?” Recent summits have focused on implementation questions, such as demonstrating return on investment to stakeholders and ensuring appropriate testing in production environments.

ACR has also worked extensively on developing use cases and specifications to “provide an anchor for what these algorithms were doing, how to think about what success looks like, and how to think about how they are triggered.” While AI models are covering more clinical areas, significant gaps remain. Currently, about 351 AI software and medical device solutions have Food and Drug Administration (FDA) approval and are market-available, but “we’re still fairly early in providing solutions that range across the spectrum,” Tilkin said.

Tilkin then discussed workforce training and how it has become an ACR focus in recent years. He explained, “At the very least, we need everybody to be conversant, be a good consumer, understand the basics of the technology” while also developing groups with increasingly sophisticated AI understanding up to those capable of creating AI products. Various tools foster engagement and literacy, including the ACR AI-LAB, a workbench allowing hands-on technology engagement and model creation for educational purposes.

Data are crucial for AI development, prompting several radiological data collection efforts. Examples include updating the National Lung Screening Trial with AI-appropriate data as well as updating the Medical Imaging and Data Resource Center, operated under ACR auspices with the National Institute of Biomedical Imaging and Bioengineering and others, with COVID-19–related data.

ACR has emphasized federated learning, believing it is “critical that we move the analysis to where the data [are].” The organization has undertaken numerous projects focusing on training models in institutions by moving models to institutions for local data training and testing. Federated learning enables model training across multiple sites while keeping data local, although ensuring proper data curation remains critical.

Currently, ACR works with approximately 40,000 nationwide facilities, requiring software creation and deployment to communicate with local systems and collect data. Tools called TRIAD and ACR Connect are used to engage with local systems and participate in large data projects. ACR also collaborates with regulators, payers, and industry on policy topics like AI oversight in radiology and works with FDA on real-world testing of approved and marketed algorithms to verify that performance characteristics meet expectations.

Moving to a discussion of scaling AI adoption, Tilkin described several ACR efforts to increase effective AI use in radiology clinics, particularly targeting late adopters and those awaiting better infrastructure.

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

AI Central—a free ACR resource cataloging market-available AI products for radiology use—represents a key initiative. The concept involves collecting all platform, product, and manufacturer details in one location. Its necessity “speaks to the fact that this is a tough market to penetrate if you’re sitting on the outside looking in,” Tilkin explained.

AI Central allows users to examine FDA-approved products for specific uses. One clear outcome is that most current market products serve triage or operational functions rather than detection and diagnosis, because FDA approval bars are much higher for the latter. However, available data show increasing numbers of FDA-approved products entering the market over time.

ACR promotes transparency regarding AI model contents in its database. Details provided about each product include training data characteristics and performance statistics. ACR encourages vendors to make these data public so consumers better understand their purchases. “We think it is a critically important part of the puzzle,” Tilkin noted.

Monitoring real-world product performance is another vital marketplace aspect. At the end of 2024, ACR launched the Assess-AI registry to monitor AI results. The registry collects AI outputs from vendors and compares them with “ground truth” from radiology reports to determine agreement frequency between AI results and radiologist-generated results. Extensive contextual information is collected, with de-identified data loaded into the registry for analysis.

The registry provides concordance rates indicating how well particular model results agree with radiologists’ results, analyzed by individual facilities and demographic features like age and sex. Individual facility concordance rates can be compared with national averages and examined longitudinally to track performance improvements or deterioration over time. The most important signal, Tilkin suggested, can be simply observing performance changes over time, indicating that something may need attention.

A related program is ARCH-AI (ACR Recognized Center for Healthcare-AI), described as the “first national AI quality assurance program for radiology facilities.” Its base function offers recognition for good AI technology work through ARCH-AI designation. Beyond that, “increasing levels of formality” exist, including formal accreditation programs and specific designations like the Diagnostic Imaging Center of Excellence designation. The accreditation program builds on the recognition program by adding white papers, practice parameters, and technical standards; Tilkin noted that he and colleagues had published a process description article (see Larson et al., 2025).

As discussed by Tilkin, the accreditation process evaluates institutions across four key areas. First, governance involves having institutions establish formal programs that create policies for model evaluation, acceptance, monitoring, and retirement. These institutions should maintain inventories of all clinically deployed models and require mandatory cybersecurity and compliance reviews before any deployment. Second, model selection follows predetermined, formal processes where each AI model’s intended purpose is clearly documented. Third, acceptance testing involves formal, documented procedures along with proper training for all clinicians and staff who will use the models. Finally, institutions demonstrate effective monitoring practices.

Tilkin described a typical deployment pathway starting with nonclinical setting deployment, usually on retrospective data, followed by “shadow deployment” in production settings where models are not used clinically yet but performance can be monitored. Next comes limited deployment for assessing model performance before full rollout. The fourth factor is monitoring, specifically local monitoring and registry participation.

Finally, he noted that institutions developing AI applications locally cannot take shortcuts. He said that just as institutions call for rigorous vendor development, they should demand the same rigor of themselves, including using trained and trusted personnel and following standard procedures.

Tilkin finished by examining LLMs, vision language models, and generative AI. He highlighted the dramatic field advancement since neural networks of a decade ago. At that time, the largest neural networks had tens

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

of millions of trainable parameters, while in 2024 OpenAI’s Generative Pretrained Transformer 4 (GPT-4) had more than 1 trillion trainable parameters.

This growth accompanied dramatic power increases. “Back in 2018 when we were all excited about cat and dog recognizers, I put an image of my dog into a recognizer that said, yes, this is a dog,” he recalled. Better models would identify breeds, but when he provided the full image instead of a cropped version, they struggled significantly. The full image showed his dog in front of a ripped sofa and cushion filling, apparently pulled out by the dog, and neural networks could not interpret the filling—one thought it might be popcorn.

However, when Tilkin showed the same picture to ChatGPT, released in November 2022, it knew exactly what was depicted. It recognized the dog and the sofa (although the latter was misidentified as a chair); correctly identified the cushion filling; and even surmised the dog might be the culprit, noting a “guilty or sheepish” look on the dog’s face. This represented incredible capability advancement in just 4 years.

Tilkin offered examples of exercises used in his laboratory to build comfort with and demonstrate usage of AI. In one case, an LLM made a radiology report user-friendly, while in another, the model extracted quantitative characteristics from reports (such as kidney length and gallstone numbers) and created tables. The real purpose was building AI familiarity and capability understanding.

Tilkin next talked about how ACR also focuses on testing various AI models, leading to the creation of the Healthcare AI Challenge consortium for AI model validation and monitoring. The consortium includes ACR’s Data Science Institute along with Mass General Brigham, Emory Healthcare, University of Wisconsin–Madison School of Medicine and Public Health, and UW (University of Washington) Medicine. The consortium’s goal is collecting geographically and demographically diverse data, particularly embargoed data not used for AI model training; defining clinical use cases (e.g., draft report generation); and engaging geographically and institutionally diverse experts for AI agent evaluations using input data. The consortium is similar to the Large Model Systems Organization, which develops large AI models and systems, but is focused on radiology.

The Healthcare AI Challenge concept provides a forum where AI performance ratings are posted and discussed, with healthcare professionals rating various AI model performance. Information on AI model performance can then inform internal decision making at radiology institutions using or considering AI models. As an example, Tilkin showed online pages where practitioners could rate AI model performance at reading and interpreting X-rays. Ratings use a five-level scale: unacceptable, student (poor), resident (acceptable), fellow (good), and attending (outstanding). Raters do not know which AI model they are rating until after posting ratings, at which point the model is revealed. The consortium uses scores for different models to compute Elo scores for all models, similar to chess player ratings, and users can examine ratings for different AI models and leave comments to provide the consortium with feedback.

In his ending remarks, Tilkin discussed factors influencing risk–reward calculations that individuals and organizations make when deciding whether to use healthcare AI. Subjective risk attitudes play important roles, appearing at both personal and practice levels and influenced by local market conditions. The type of intended use—triage, detection, or diagnosis—is another factor, as is the clinical use case. Finally, operational safeguards—having transparency, monitoring results, and implementing best practices—can all reduce risk, or at least risk perception, making people more likely to accept new technologies.

DISCUSSION

A brief discussion followed the two panelists’ presentations, with questions posted on Slido by in-person and online audience members.

Chiang presented the first question: Given the need to collect massive amounts of personal health data to train AI models, how can data collection be encouraged before it is clear how the data will be used? Rhew responded with two key aspects. First, technology can secure data and create confidence that they will not be

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

exposed to different organizations. “Federated learning obviously has a significant role to play in this,” he continued, “because you are not releasing control of the data to another entity, and it allows you to be able to put your own safeguards in place.” When institutions send data externally, they rely on other organizations for security, potentially leaving them liable for breaches.

Tilkin added that even with good data protection, model developers may hesitate to share their datasets or models because running models on external datasets can reveal model details. “This is a problem if you are trying to accelerate the amount of collaboration,” Tilkin said. “This is why the privacy-preserving environments are so essential.”

Rhew explained that the second aspect involves educating patients and consumers about data use. Rhew emphasized transparency: “I think it is extremely important that we address the issue of patients feeling comfortable with how we’re managing [the data] and really being transparent in that process.”

Tilkin agreed, adding that multimodal models complicate matters since multiple people and institutions handle different data types. As genomics and other clinical data are added, he said that “at some point you do have the need for a federation, if nothing else just to deal with the challenges, even for centrally collected data.” He said, the solution is developing methods to leave data in controlled environments and bring analysis to the data rather than moving data around.

Building on that discussion, Dewji asked whether data collections should have minimal quality standards. Tilkin explained that minimal standards serve two purposes: reducing the burden of making data available (since curation is difficult and costly) and better mirroring production data. He added, “We collect data, we polish [them] up, [they look] really good, we then work with [them], and then your mileage varies when you’re actually in a clinical environment.”

Regarding whether AI healthcare models have a prerequisite of massive data collections with predefined uses, Tilkin noted the difficulty of getting institutions to open data for general investigations. Institutional review boards want to understand specific data uses before approving research. “Being able to turn on the ‘data spigot’ tends to be use-case specific,” he said, due to required approvals. He noted that to obtain representative data including rural areas and underrepresented populations, strategies must account for legal and practical sharing restrictions.

Rhew emphasized understanding privacy/personal dataset sensitivity levels. Basic disease information is less sensitive than whole genome sequences, and he said that “we’re now starting to realize that, based on level of sensitivity, we have to have higher levels of safeguards.” Patient education about data uses and implications is crucial.

He continued by stating that assembling representative datasets is challenging because most data come from people entering medical facilities—that is, just a subset of the population. “If we truly want this to be representative of larger populations,” he said, “we have strategies for how we acquire data outside of people coming into the system.” Rhew mentioned a stalled initiative that would have given individuals access to their medical records to decide where information goes: “Ultimately, we wanted to empower consumers with their own health information so that they could make their own determinations.”

Chiang then asked about overcoming difficulties in interpreting multimodal models containing data from clinical records, images, pathology reports, and genomic records. Rhew identified the current challenge as simply building these expensive models, which require massive amounts of diverse, well-curated datasets. One approach is assembling datasets collectively and making resulting models open source. This creates value since individual organizations do not have to build models themselves and can use unique datasets to train their own models off of collective ones. Rhew noted that LLMs can then handle complex interpretation tasks, making information readily accessible to end users. He explained, “Today, it is a lot of everyone building based on their own datasets and their own resources. That’s not a scalable model.” Instead, he said that different players can use coordinated strategies that leverage their unique capabilities: “Most organizations would love to be in a

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

place where they can accelerate the development of their models and do it in a way that creates new valuable [intellectual property] for them.”

Regarding the best way to present uncertainty in AI model outputs, Tilkin noted that current vendor solutions differ in presenting uncertainty to radiologists and end users. The strategy depends on algorithm details, but feedback loops—that “provide feedback on what the model is doing, what the confidence intervals are, [and] what the uncertainty looks like”—are important. These allow users to process information, make decisions, and provide feedback. He claimed that vendors should avoid black-box presentations and give context for model answers.

Rhew explained that the field is moving from single models handling everything toward workflows and models specific to particular scenarios. Each model excels at its task with high confidence and then communicates with the others. This approach mitigates concerns about AI hallucination or other failures. “We’re starting to realize that as much as LLMs can do so many amazing things, if we just simply let one LLM try to solve all the problems, we run into a lot of issues with certain types of use cases,” he said.

The next question concerned preventing data monetization by vendors. Rhew said finding the right balance is important since innovation is driven by organizations generating revenue through value creation. Transparency—making sure people know what they are signing up for—is key: “I do not think there is anything wrong with businesses thriving in the pursuit of something that’s very beneficial for individuals if everyone is aware of it,” he said.

Tilkin agreed about transparency’s importance in ensuring people understand what data are used and how. Increasingly, data contributors want to see results from their data. “So the more you can just provide transparency on what data [are] being used, [and] in what context, I think the better off we’ll be,” he continued.

The final question addressed balancing healthcare AI benefits with possibilities that patients are stigmatized by AI detecting issues, like depression, that may not have been detected previously. Rhew noted the medical community underdiagnoses for various reasons: “We’re oftentimes too busy. We do not ask the right questions. The patients do not share the information. And because of that underdiagnosis, when bad things happen, we always wonder, ‘I wish we would have been able to have identified that.’” AI is valuable for uncovering undiagnosed conditions. However, he continued, when AI makes discoveries, “there will, in some cases, be a conversation that needs to occur, and it needs to be done with sensitivity.” AI uncovers problems rather than solving them; Rhew stated that clinicians should communicate with empathy, compassion, and contextual understanding.

With patients increasingly accessing laboratory results, people might discover conditions before clinicians can discuss them. Rhew noted that the AI issue is “just another example of how we need to smooth out these rollouts and processes so that when sensitive information is identified, there is always a clinician there to be able to make that communication as seamless as possible.”

Tilkin added that agents could be particularly useful, allowing clinicians to set rules about information exposure timing and location and enabling intervention when needed. As health information is collected for various purposes, information flow could become bidirectional, with AI models choosing when and how to provide information to patients. “The question ultimately is how we fold this into the workflow [of the] clinician–patient dynamic, but the potential is tremendous,” Tilkin explained.

REFERENCES

Bird, S. 2024. “Announcing new tools in Azure AI to help you build more secure and trustworthy generative AI applications.” Microsoft Azure Blog. https://azure.microsoft.com/en-us/blog/announcing-new-tools-in-azure-ai-to-help-you-build-more-secure-and-trustworthy-generative-ai-applications/ (accessed July 1, 2025).

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

Byju, A. 2024. The “Godfather of AI” predicted I wouldn’t have a job. He was wrong. New Republic, October 25. https://newrepublic.com/article/187203/ai-radiology-geoffrey-hinton-nobel-prediction.

Farrah, T. E., D. Pugh, F. A. Chapman, E. Godden, C. Balmforth, G. Oniscu, D. Webb, B. Dhillon, J. Dear, M. Bailey, P. Gallacher, and N. Dhaun. 2023. Choroidal and retinal thinning in chronic kidney disease independently associate with eGFR decline and are modifiable with treatment. Nature Communications 14:7720.

Gartner. 2025. Gartner Hype Cycle. https://www.gartner.com/en/research/methodologies/gartner-hype-cycle.

Larson, D. B., M. Bhargavan-Chatfield, M. Tilkin, L. Coombs, and C. Wald. 2025. The road map for ACR practice accreditation for radiology artificial intelligence. Journal of the American College of Radiology 22(5):586–592.

Rogers, E. M. 1962. Diffusion of innovations. New York: The Free Press of Glencoe.

Singla, A., A. Sukharevsky, L. Lee, M. Chui, and B. Hall. March 2025. “The State of AI: Global Survey, McKinsey.” https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai#.

Tang, L. 2025. Optometrists in the US—Market Research Report (2015–2030). IBISWorld, February 2025. https://www.ibisworld.com/united-states/industry/optometrists/1560.

Xu, H., N. Usuyama, J. Bagga, S. Zhang, R. Rao, T. Naumann, C. Wong, Z. Gero, J. González, Y. Gu, Y. Xu, M. Wei, W. Wang, S. Ma, F. Wei, J. Yang, C. Li, J. Gao, J. Rosemon, T. Bower, S. Lee, R. Weerasinghe, B. J. Wright, A. Robicsek, B. Piening, C. Bifulco, S. Wang, and H. Poon. 2024. A whole-slide foundation model for digital pathology from real-world data. Nature 630(8015):181–188.

Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 4
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 5
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 6
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 7
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 8
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 9
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 10
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 11
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 12
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 13
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 14
Suggested Citation: "2 Insights from the Artificial Intelligence Field." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 15
Next Chapter: 3 The Potential for Use of Artificial Intelligence in Radiation Health Fields
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.