The state of the art in using artificial intelligence (AI) in radiation health was the main topic of three separate panels at the symposium. The panels were focused on the use of AI in three different areas: medical diagnostics, radiation therapy and oncology, and radiation occupational health and future directions and opportunities in AI and machine learning (ML) methods and technology to advance each of those fields. Although these three areas differ in the specific ways that AI is used and the degree to which AI applications have been applied, common themes did emerge among the areas.
The panel on the use of AI in medical diagnostics was chaired by Caroline Chung, vice president and chief data and analytics officer, co-director of the Institute for Data Science and Oncology, and professor in radiation oncology and diagnostic imaging at MD Anderson Cancer Center in Houston.
Jayashree Kalpathy-Cramer, endowed chair in ophthalmic data science and founding chief of the Division of Artificial Medical Intelligence in the Department of Ophthalmology at the University of Colorado School of Medicine, provided an overview of how AI is transforming medical imaging. She emphasized that AI and ML are used throughout the entire patient journey, from accelerating image acquisition and reconstruction to improving diagnosis and treatment monitoring. While radiology has been at the forefront of AI adoption in healthcare, she noted a significant gulf between theoretical capabilities and clinical implementation.
Despite tremendous progress in foundation models, large language models, and vision models, relatively little has been translated to clinical practice. “There are many different reasons why that might be,” she said, “but I'm hoping that, as a community, we figure out how to translate things and move them from what’s possible to what we really like to see.”
Kalpathy-Cramer discussed how deep learning has made substantial advances in computed tomography (CT) image reconstruction, with many algorithms now commercially available and deployed. She noted that these AI reconstructions enable 30–71 percent reduction in necessary CT doses compared to hybrid iterative reconstruction, while maintaining image quality (Koetzier et al., 2023). Deep learning methods for metal artifact reduction also appear to remove artifacts more accurately than current state-of-the-art approaches.
Various approaches exist for converting low-dose scans to high-quality images, including direct conversion methods and hybrid approaches that combine deep learning with traditional reconstruction techniques. These hybrid methods might use deep learning to improve scan quality, apply traditional reconstruction methods, and then employ additional deep learning to enhance image resolution.
Extensive AI development has occurred over the past 5–10 years and is now entering routine clinical use in segmentation—that is, the delineation of object boundaries in images. This technology is particularly valuable for radiation therapy planning in oncology and radiation oncology for quantifying tumor burden and contouring tumors and organs at risk during radiation. AI segmentation not only makes radiologists’ jobs easier but also potentially reduces variability and improves conformity (Elhalawani et al., 2019; Ng et al., 2018).
A study involving Kalpathy-Cramer’s team demonstrated that an AI model performed as well as five expert radiologists in segmenting CT images of adrenal glands (Robinson-Weiss et al., 2023). Significant work has also been done on AI-based brain tumor segmentation (Beers et al., 2021; Chang et al., 2019; Peng et al., 2022), with the number of available algorithms greatly increasing. AI can also track temporal changes in tumor configuration—something difficult with standard clinical assessments that focus on overall volume rather than precise morphology.
AI tools enable comprehensive body analysis from images taken for specific purposes, making opportunistic screening possible. By segmenting all body components—organs, bones, muscle, and fat—AI can identify conditions like sarcopenia, cachexia, or osteoporosis from routine imaging (Al-Sawaf et al., 2023; Elhakim et al., 2023; Pickhardt et al., 2013).
Generative AI has become increasingly prominent in medical imaging, with diffusion models, generative adversarial networks, and other approaches being used for reconstruction, registration, outlier detection, disease detection, classification, and segmentation. Automated machine learning now allows clinicians without coding experience to generate AI models from data and annotations. Modern tools like ChatGPT and AI code editors enable algorithm development through natural language commands.
Kalpathy-Cramer outlined 10 challenges facing AI implementation in clinical settings:
Kalpathy-Cramer then discussed the recently developed FUTURE-AI consensus framework, which calls for AI tools that are “fair, universal, traceable, usable, robust, and explainable” throughout the development cycle (Lekadir et al., 2025). The framework provides a comprehensive checklist covering four phases: design, development, evaluation, and deployment.
Key pre-development questions include the following: Will the model work on different scanners and populations? Does it have out-of-distribution detection? Is it well calibrated? Does it make grave errors? The framework also addresses whether to use single federated models (i.e., models that are trained across decentralized databases or other data sources) or locally adapted ones, acknowledging trade-offs between general applicability and local optimization.
Kalpathy-Cramer referenced a recent Food and Drug Administration publication highlighting key challenges in AI regulation and implementation (Warraich et al., 2025). The paper emphasized preparing for unknowns in large language models and generative AI; understanding the importance of AI lifecycle management; balancing roles among big tech, startups, and academia; and addressing tensions between financial optimization and health outcomes improvement.
These considerations represent ongoing challenges that the radiology community continues to address as AI becomes increasingly integrated into clinical practice. The path forward demands careful attention to validation, safety, and ethical implementation to realize AI’s potential while minimizing risks to patient care.
Cameron Piron, co-founder and president of Synaptive Medical, outlined how improvements in magnetic resonance imaging (MRI) technology combined with AI are enabling MRI to replace CT across many medical applications. His journey began with founding Sentinelle Medical, which focused on developing MRI technology to replace mammography for early cancer detection.
While MRI images were more expensive and complex than two-dimensional mammograms, and radiologists had to transition from reading single images to thousands of three-dimensional, multi-parametric datasets with contrast injection, the increased tumor detectability provided an advantage. Over time, his company optimized magnetic resonance (MR) images, shortened scan times, and added early AI analytics to facilitate tumor detection. Today, MRI is used for breast cancer detection at thousands of sites.
Near the end of his tenure at Sentinelle, the company began using MRI to detect responsive tumors in late-stage cancer patients. Working with the National Institutes of Health and the National Cancer Institute (NCI), they used multi-parametric MRI to detect very small tumor changes in response to different chemotherapy doses before surgery, demonstrating MRI’s value across different points in the patient care cycle. The progression into Synaptive Medical builds complete MRI systems designed for neurosurgery. “We have hundreds of sites using robotics and MR data being integrated into better patient planning [and] into better navigation procedures,” Piron said.
Making MRI more accessible has been key to replacing CT, he said. The machines are both smaller and less expensive than earlier versions. The INOVAIT program, a Canadian multicenter collaboration between academic centers and companies, exemplifies this accessibility by using AI to improve image-guided therapy. Piron argued that the combination of AI and MRI provides doctors with superior images for both diagnosis and surgical guidance, leading to rapidly growing demand for neurological MRI applications that now exceeds available supply.
Synaptive Medical focuses on mid-field (~0.5 T) MRI rather than traditional high-field (1.5–3.0 T) systems. While improvements traditionally came from increasing magnetic field strength, mid-field machines offer many advantages without sacrificing performance. They have demonstrated equivalent imaging performance to high-field systems while providing additional benefits.
Piron explained that the growth in MRI usage in neurology stems from various advances, including stereotactic accuracy capabilities. For brain imaging, particularly in pediatric and developmental areas, an enormous shift away from CT has occurred due to concerns about radiation exposure risks in young patients including in areas related to stroke detection, psychiatric applications, and Alzheimer’s disease.
Traditionally, MRI relied on very high-field machines that were complex, heavy, and difficult to use—complexity that hindered consistent and reproducible AI analytics. By decreasing field strength and system complexity without compromising performance, Synaptive has achieved previously unseen results.
Mid-field MRI eliminates temporal variability issues present in high-field systems. While 3.0 tesla (T) machines typically show 10–14 percent variability in structure measurements over time, 0.5 T systems avoid these effects. Piron explained that this results in temporal variability nearly an order of magnitude lower than high-field machines, which is “absolutely essential for complex AI analysis when we want to be able to do more with smaller patient datasets.”
Piron ended his remarks by explaining that AI enables MRI use in traditionally challenging areas—like the sinus region, where scan times optimized to 2 minutes have the potential for AI-driven reduction to 15 seconds. He stated that these advances position AI-enhanced mid-field MRI as a versatile, accessible alternative to CT across diverse medical applications while providing superior diagnostic capabilities and the potential to reduce patient risk.
Chung began the panel discussion by asking the two speakers what new AI applications in the diagnostic imaging space they were most excited about and which they thought were ready for prime time.
Piron began by saying that several applications accelerate MRI and CT. He said that anticipating unacknowledged trade-offs—things that might be missed when the time to obtain an image is significantly reduced—is important. However, he continued, one advantage of faster MRI is a reduced cost, so having an immediate return on investment is possible. What he is most excited about, he continued, is new ways to link results across disciplines, such as linking pathology results with spatial information and imaging—in short, “the ability to get ground truth built into imaging in a way that hasn’t been done before.”
Kalpathy-Cramer pointed to increased quantification in imaging as an exciting development. The use of AI for segmentation, for instance, helps to quantify tumor burden on the lifespan, she said; more generally, MRI and other types of imaging can be made much more quantitative than they have been. “We have the technology now,” she said. “We just need to start deploying it.”
Chung followed up by asking, What should vendors be doing in the quality assurance and quality control area, and what should they be revealing to end users about the acceleration techniques? Piron expressed his feeling that vendors do not seem to be doing enough in that space. The systems have become more opaque and less accessible at a time when much more visibility is important so that users can understand what is going on. “There needs to be a really definitive pushback to . . . let people see what’s happening behind the algorithms so that we can have that trust,” he said.
Kalpathy-Cramer agreed, saying that users need to know what data were used in training a model, what the model characteristics are, and where the model is likely to break. The data used in training the model are a particularly important factor, she continued, because if the people whose data were used to train a model are significantly different from the people that the model will be used to examine, the model’s results can be wildly inaccurate. One approach to supplying the necessary data would be to use model cards (a document that provides detailed, transparent information about an AI model), she said.
Chung continued by asking about model cards: Several model card templates exist, she noted. What information should be included on a model card? Kalpathy-Cramer answered that a model card should have
information not only on the demographics of the population that the model was trained on but also much more, including such things as the spectrum of disease, anatomical variations, disease presentations, and population differences. “So,” she concluded, “we need biology plus all these other things.”
In response to a question from Chung on the future of generative AI and large language models in general medical diagnosis, Kalpathy-Cramer said that she is very excited about it, but concerns exist for a good reason. “Until we can have sufficient guardrails and until we have the technology that allows us to figure out when there are hallucinations or when there are other things, I think we still have to be a little cautious,” she said.
Another question, Kalpathy-Cramer continued, relates to how the accuracy and performance of various models should be balanced with the need for human explanation. If no explanation is provided for how a model reaches its results, monitoring the model’s performance is more difficult, but some models seem to work well despite the absence of an explanation of how the model is getting its answers. For instance, some AI models predict the risk of developing breast cancer 5 years in the future, and they seem to work phenomenally well, but no human can assess if they are actually correct. Other models can predict the risk of heart attack by examining images of the retina, but no one knows what signal they are detecting. She pondered how we could gain trust in deploying these models that are not easy to check.
Piron added that because AI models do not come to answers in the same way that human minds do, they provide an “orthogonal” way of thinking. “You’re seeing solutions coming up that are just not intuitive, but they are correct and they are logical,” he said. “This is a valuable addition to the human intellectual toolbox, but it is still important to develop a way of understanding the machine logic because relying on purely black-box answers is a bit unnerving.”
Chung then switched subjects to image reconstruction. Humans use AI to produce reconstructed images because images are meaningful to them, but, she continued, “these AI models can also potentially find a lot of patterns in data that we cannot necessarily appreciate.” So, what data should AI models be trained on? Should they be trained on the raw data? Kalpathy-Cramer answered first: “We reconstruct it for the way we think. Is that the best way? Can we go from the raw data to the result that we want?” Chung asked whether vendors should provide the raw data in addition to the reconstructed images so that users can better understand what is happening. “Yes,” Piron answered. In reconstructing MR images, for instance, one merges data up front and puts those data through a Fourier reconstruction. That process ends up filtering out a “wealth of information” that could potentially be valuable. As an example, he told a story from the early days of breast MRI. Ultrasound units, which had been the standard for detecting breast tumors, had been trained for years to highlight particular types of tumors that clinicians were familiar with, but at the same time that process made other types of tumors harder to see. When MRI was introduced, those types of tumors became detectable, and the cost of “optimizing” the ultrasound units became obvious.
Another audience question emerged: Given how AI is leading to faster screening and the ability to proactively image, could this lead to the sort of over diagnosing or overtreating of disease that prostate-specific antigen screening caused for cancer? And what can be done to avoid such overtreatment? Piron stated that in the future, much more early screening and full-body MR screening, among other actions, will likely occur, and this could clearly lead to more unnecessary procedures. This sort of screening could come with some way to improve specificity, he said: “If you do not have a perfectly specific imaging modality—which you do not—then you’re going to lead to biopsies, lead to surgeries, [and] lead to everything else downstream.” Watchful waiting, which is becoming an important paradigm for such screening as prostate cancer, could have to be part of the answer as well.
The session concluded with questions from the audience and discussions around utilizing AI in imaging and radiation therapy.
The panel on AI in radiation therapy and oncology was co-moderated by Anyi Li, chief of computer service at Memorial Sloan Kettering Cancer Center, and Ceferino Obcemea, program director of medical physics at the Radiation Research Program at NCI. As context, Li noted that the doses that radiation oncologists use in their treatments are larger and more acute than the doses used in imaging and those generally experienced outside of a medical setting, but the exposures in radiation therapy are more tightly focused.
The first speaker was Steve Jiang, vice chair of digital health and AI and chief of the Medical Physics and Engineering Division at the University of Texas Southwestern. His presentation was about using AI agents for adaptive radiotherapy (ART).
Radiotherapy is one of the three major cancer treatment options alongside surgery and chemotherapy. Jiang provided an overview statistical example that for much of the past century, patients received conventional treatment doses of about 2 gray (Gy) per day over several weeks, totaling 60–70 Gy. Stereotactic body radiotherapy, developed about a decade ago, uses higher doses of around 18 Gy delivered every 1–3 days for a total of about 54 Gy. However, both approaches are one-size-fits-all, with patients in the same cohort receiving identical treatment without interruption. Furthermore, radiologists monitor only for acute toxicity, assessing benefits and harms long after treatment completion when changes are no longer possible.
According to Jiang, several treatment variables could be adjusted to personalize treatments, including the number of fractions, radiation amount per dose, time between fractions, and total fractions. While radiologists typically keep these variables constant across cohorts, personalization involves choosing values that reflect each individual’s unique needs and situation.
Jiang identified two dimensions of personalized radiotherapy. The first involves determining an individual’s treatment course up front based on biomarkers, tumor characteristics, and patient health status. The second dimension adapts treatment based on ongoing tumor response and patient status, using anatomical imaging (i.e., MRI, CT) to assess changes in tumor size and shape, examining biomarkers to evaluate tissue response, and employing functional imaging to assess metabolic activity changes. However, this second dimension remains largely unexplored.
Imaging-based ART can be characterized as Generation 1 or Generation 2. Generation 1 ART aims to maximize tumor control and minimize toxicity based on anatomic changes like tumor shrinkage and organ relationships. This approach functions more as course correction than true adaptation, with incremental overall impact. Generation 2 ART modifies treatment in real time based on tumor and patient response. While still maximizing tumor control and minimizing toxicity, the prescription changes in response to the patient’s body dynamics. “It is a true feedback mechanism in a control theory model,” Jiang explained. “It is true adaptation, and I believe that is the future of cancer radiotherapy.”
Moving from current image-guided radiotherapy to ART presents significant challenges including increased workloads, workflow complexity, technological complexity, decision-making challenges, error risks from frequent plan modifications, coordination difficulties under time pressure, and real-time decision-making requirements. ART also demands increased resources—more skilled personnel, longer treatment sessions, and continuous monitoring—which makes AI incredibly useful.
Jiang continued by explaining that various AI technologies are important for Generation 1 ART, including image acquisition, processing, segmentation, adaptation decision making, treatment planning and replanning, quality assessment, and delivery. Many commercial auto-segmentation products based on deep learning models are now available.
He said that for Generation 2 ART, AI becomes even more critical due to more severe challenges. “The decision making involves many more variables—indeed, far more than what human doctors can handle,” Jiang noted. The workflow is “very resource-demanding, very complex, and tedious, so you need AI to automate the workflow,” he added.
The main challenge for Generation 2 ART is insufficient appropriate data. Because past treatments have been uniform, an inadequate data spectrum exists to train AI models on outcomes for different fractions, doses, and timing variables. To address these challenges, Jiang’s group is developing an AI assistant for ART based on AI agents. This assistant works alongside clinicians throughout treatment to retrieve, process, analyze, and present information for decision making while automating and streamlining clinical workflow. The assistant collaborates with multiple AI agents that can work together, use tools, make function calls for task-specific AI models, utilize clinical software tools, and access clinical databases.
Jiang distinguished between AI tools and AI agents. AI tools, like supervised deep learning models, are relatively simple input-to-output mappings—that is, “more like fitting a complex mathematical function with a deep neural network.” In contrast, he said that AI agents “can perceive, think, reason, make decisions, take actions, and use tools to enhance their abilities.”
His University of Texas Southwestern team is developing a multi-agent AI system featuring an AI assistant working with clinicians and a team of communicating AI agents. Key agents include a data collection and integration agent that gathers patient data from electronic health records, oncology information systems, treatment planning systems, and machine data, and then processes, curates, and indexes this information for other agents’ use.
The system provides various AI agents with access to tool sets, including clinical software collections, task-specific AI models, and other tools, plus a knowledge database containing clinical guidelines, procedures, and scientific literature. Additional agents handle communication (i.e., processing spoken requests about patient history, diagnosis, or previous treatment doses), patient matching with clinical trials, real-time monitoring, automated chart preparation, clinical documentation assistance, workflow streamlining, AI-driven treatment recommendations, error detection, and patient question answering.
Jiang discussed how single AI models are insufficient for clinical deployment. Clinically viable solutions involve compound, integrated AI systems that combine AI tools, AI agents, software tools, and databases. For example, accurate auto-segmentation requires more than just an auto-segmentation model—it needs deployment and adaptation agents for acceptance testing, model commissioning, and local alignment; segmentation workflow agents; performance review agents; and monitoring agents to track model performance over time and address data drift. Similarly, decision-support AI systems involve multiple agents including predictive AI models, clinical dashboard and visualization agents, predictive modeling and case-based reasoning agents, AI expert collaboration agents, and clinical guidelines and literature agents.
Response-based ART enables fully personalized radiotherapy with longitudinal adaptation. Jiang stated that AI may be essential for workflow automation, clinical decision support, and quality assurance to ensure scalable and clinically viable ART. Single AI models cannot handle ART’s complexity; instead, compound AI systems integrating agents, clinical software tools, databases, and AI models provide practical solutions for the future of cancer radiotherapy.
Lei Xing, professor of radiation oncology and medical physics at Stanford University, discussed the use of AI in integrating multiple omics for use in healthcare and, specifically, precision medicine.
Xing outlined that AI for precision medicine rests on three pillars: computing resources, AI models, and data. While data can be evaluated across multiple dimensions—amount, quality, and format—his focus centers
on format optimization. Healthcare data exist in diverse formats including images, pathology reports, and electronic health records and often suffer from noise, small sample sizes, and missing data.
The challenge lies in tabular data, which, despite being familiar through Excel spreadsheets, represent one of the most difficult formats for AI agents to process, particularly with large omics datasets. Xing’s team developed an innovative solution: converting tabular data into images that AI can analyze more effectively (Yan et al., 2025).
Using single-cell RNA sequencing data as an example—a common biology laboratory output—Xing described the typical data structure as a massive table with columns representing individual cells and rows representing genes (can be on the order of 10,000 genes per cell). Each cell contains read counts for specific gene–cell combinations.
The key breakthrough involves converting gene expression vectors into optimized image representations called genomaps. A vector can be converted into an image in many ways, Xing noted. His group’s technique accounts for gene–gene interactions to optimally place genes into pixels, ensuring each cell converts into a meaningful image rather than random noise.
He noted that different cell types produce distinct patterns in their genomap representations—patterns so clear that visual inspection alone can distinguish cell types without AI assistance. However, when analyzed through convolutional neural networks (CNNs), these image representations significantly outperform traditional one-dimensional vector approaches.
The genomaps serve as the foundation for a comprehensive analytical pipeline. Individual cell data convert into genomaps, which then feed into GenoNet, a CNN performing deep-level feature extraction. This system enables multiple downstream applications including cell recognition, cell-specific gene set discovery, trajectory mapping, multi-omics integration, dimensionality reduction, visualization, and clustering.
Xing’s team successfully applied this approach to radiation oncology and radiology, specifically with radiomics data. Converting tabular radiomics data into images and analyzing them with CNNs improved survival prediction and biomarker discovery accuracy by 10–20 percent over existing methods. The technique reveals temporal patterns, showing how cells at different ages display distinct representations and how lung fibrosis cells change appearance before and after radiation treatment.
Spatial transcriptomics represents another application area, where the method combines morphology and molecular information for cancer subtyping and gene mutation prediction. The genomaps concept extends beyond individual applications, serving as a method for incorporating physical understanding into data representations. This enhanced data understanding proves valuable for training models and fine-tuning large foundation models.
A recent application demonstrates this integration through GPT-4Vision (Liu et al., 2024), an automated treatment planning framework developed by Xing’s team. This system incorporates both tabular data and dose-volume histograms to evaluate radiotherapy treatment plans and provide textual feedback for plan improvements. Looking forward, Xing’s team is developing a multimodal and multiscale foundation model that incorporates relationships among data and modalities, creating more powerful and useful systems than current foundation models.
By converting tabular biomedical data into optimized image representations, Xing’s genomaps approach addresses a fundamental challenge in medical AI. The technique’s consistent 10–20 percent accuracy improvements across diverse applications—from single-cell analysis to radiation oncology—demonstrate its broad utility. As the field moves toward integrated foundation models and multi-agent systems, this data format optimization serves as a crucial building block for more effective precision medicine applications.
Xing ended by stating that this work exemplifies how addressing seemingly technical challenges in data representation can yield substantial improvements in clinical AI applications, bridging the gap between complex biological data and actionable medical insights.
Obcemea began the discussion by asking Xing about the spatially semantic topographic maps he constructs. “Most of us are familiar with the clustering technique of K nearest neighbors,” Obcemea said, “but this is so much more because it is on the semantic scale.” He continued that he tends to think of the approach in terms of the tabular data being a multidimensional Rubik’s cube that one manipulates in order to extract a pattern. In supervised learning, he said, one extracts a pattern by doing a steep descent, which is a criterion. However, he asked, with unsupervised learning, what sort of criteria does one apply in order to get to that pattern discovery?
Xing replied that he liked that analogy. He explained that his approach amounts to reshuffling the data: “It is actually surprising to us just by simply reshuffling the data in a meaningful format, in a meaningful way, that you can get so much more information out of the data.” In his team’s approach, the interaction among the data elements is encoded into the spatial configuration of the data. “So, in a way, it is like clustering,” he explained. “If you can cluster them first, you can throw away a lot of data in the reclustering.”
The image is the key to his approach. “I’m a physicist,” he said. “I see everything as an image.” So he incorporates the interaction pattern into the image and then analyzes it with a CNN. “CNN is a very powerful tool,” he said, “but your pattern has to have some semantic meaning. So that’s what we did—we assigned some semantic meaning to the data.” This same approach should be workable for other types of data, even images, he said. Some people might not think that one could get much additional information by turning an image into another image, he said, “but we are doing it. We are trying to get more semantic information for imaging data, but I do not want to go into too much technical detail.”
Following up, Obcemea asked Xing for further details on how spatially semantic topographic maps extract a useful figure from a blur and how one can be certain that the extracted figure corresponds to something real—to a “ground truth.” One solution, Xing said, is to add a condition. In the work that his team does, for instance, biological knowledge can be used as a condition. Thus, in the case of clustering, which is unsupervised learning, one must apply biological or clinical knowledge to understand what the model has produced. “Sometimes it may not make sense,” he said, “and then you have to think, What’s going on in the pipeline? So that’s why I keep saying that human knowledge is the key. It is not only in the construction of this topographic representation but also in the whole pipeline.” AI will always come up with an answer, he said, and then people digest and understand the answer and make it trustworthy.
Li asked Jiang about the use of agents. In healthcare, he said, the accuracy of an agent is crucial, and a single mistake can be life-threatening. Then if one uses multiple agents, errors could propagate or even be made exponentially worse. He asked, “What sorts of guardrails can be put in place to minimize the effects of such errors?” Jiang answered, “This is a very important question,” particularly in cases where a multi-agent AI system in in place. One way his team minimizes errors in its agents is by building the entire system from the bottom up, step by step, and doing a very thorough validation of each individual AI model to make sure that it works. Furthermore, individual agents are not used immediately in a multi-agent system but are first thoroughly vetted. For instance, the patient–trial matching agent that his team developed is being deployed clinically, but he said that “right now, it is more like an assistant to human clinical research coordinators.” It will take a while of using this agent and checking its recommendations to have enough confidence in it to embed it in a multi-agent system, he said. Furthermore, since accuracy is important when AI is used clinically, researchers and users could quantify the certainty in a system’s answers so that the AI system knows what it knows and does not know.
Finally, Jiang continued, AI systems need redundancy. One can check the accuracy of a large individual model by, for instance, having multiple models working on the same task. And in the case of a chatbot that answers questions, he said, one should not rely just on a pretrained model to give accurate answers to clinical questions; rather, the answers should be grounded in clinical guidelines and the literature.
Xing added that having an agent that serves as a quality officer is important. “You’re building a human society using computer agents,” he said, “so a lot of key components in the clinic need to be clearly reflected in the agent world.” Quality assurance is one of those components that is important in a multi-agent system.
The session ended with multiple questions from the audience focusing on specific applications for each speaker.
The panel on AI in radiation occupational health was chaired by Sylvain Costes, former National Aeronautics and Space Administration (NASA) data officer for space, biological, and physical sciences.
Heidi Hanson, senior scientist and group lead of biostatistics and biomedical informatics at Oak Ridge National Laboratory, spoke about computational approaches for assessing radiation exposure across the life course. She described her ideal situation of having all of the data that she wanted to follow an individual across the life course, from in utero until time of death.
Hanson’s fundamental premise challenges conventional exposure assessment approaches. Rather than examining single-point exposures or limited time frames, she suggested comprehensive lifetime exposure tracking. “Exposures at any time in an individual’s life can play a role in what that person is experiencing currently,” she explained. Early life exposures can moderate effects of later exposures, making complete exposure histories essential for understanding health outcomes.
To realize this vision, Hanson collaborated with pediatrician Shari Barkin to develop a five-step blueprint for charting life course exposures (Hanson et al., 2020):
Hanson’s work primarily addresses the first two components, focusing on data harmonization rather than the more commonly pursued outcomes-driven AI applications in healthcare.
Electronic health data present formidable obstacles: heterogeneity across institutions, varied collection methods, and siloed storage systems. “It is very difficult to get to charting someone’s trajectory across their entire life course because it is hard to link those datasets together,” Hanson noted. This fragmentation highlights the importance of computational tools capable of managing large amounts of heterogeneous sensitive information while enabling longitudinal exposure assessments and geographical risk identification.
The solution involves standardized processes ensuring data harmonization produces clean, noise-free datasets suitable for trustworthy predictive algorithms. Hanson envisions this as a 10-year horizon goal requiring comprehensive computational infrastructure.
Hanson leads the MOSSAIC project (Making Outcomes Using Surveillance Data and Scalable Artificial Intelligence for Cancer), a Department of Energy–NCI partnership running for 9 years. Over the past 2 years, MOSSAIC has deployed AI tools across all U.S. Surveillance, Epidemiology, and End Results (SEER) registries, dramatically improving cancer reporting efficiency.
She explained that the project’s tools automatically process pathology reports into tabular form using common data models. Currently, these tools autocode 23–27 percent of all pathology reports entering SEER registries. When confidence levels are insufficient, reports undergo manual review. This automation contributed to reducing cancer reporting time from 22 months to 14 months across all SEER registries.
Hanson stated that these tools demonstrate flexibility across data types and institutions. The Department of Veterans Affairs (VA) successfully adapted MOSSAIC’s hierarchical self-attention model for VA electronic health records, proving cross-institutional applicability essential for standardized health record information extraction.
Scaling to national levels involves federated learning approaches, but privacy protection presents significant obstacles. While federated learning inherently protects individual data, it does not provide sufficient privacy guarantees for personal health information. “There has to be some sort of privacy mechanism imposed as you’re doing the federated learning,” Hanson explained.
However, privacy enhancements create accuracy trade-offs. When implementing differential privacy for NCI cancer prediction models, she said that accuracy dropped to unacceptable levels: “By turning on differential privacy, at least for the tests that we’ve seen, we get a considerable drop in accuracy.” Hanson’s team collaborates with mathematicians to address this challenge, but considerable research remains before real-world implementation becomes viable.
She next described the Centralized Health and Exposomic Resource (C-HER), which addresses geospatial data harmonization challenges. This system provides standardized processes for data ingestion, data processing, metadata storage, ontology integration, and spatial indexing across exposure types. The goal involves developing models predicting indoor radon exposure, PM2.5 (Fine Particulate Matter, airborne particles with diameters smaller than 2.5 micrometers [μm]) exposure, and exposure–cancer relationships.
C-HER emerged from early project experiences where multiple teams processed identical data differently, producing non-reproducible results. The centralized approach provides spatially indexed data in “stackable hexes,” enabling rapid model development for various exposure predictions.
Through MOSSAIC, Hanson’s team accessed LexisNexis residential history data for 11 SEER registries, enabling 25-year retrospective environmental exposure linking to cancer patients. This capability allows assessment of how indoor radon exposure across life periods affects treatment outcomes.
Hanson’s most ambitious project, EHRLICH (Electronic Health Record–Informed Lagrangian method for precision publiC Health), aims to create computational capabilities for rapid real-world data assimilation supporting digital twins of population-level biological threats. EHRLICH employs scalable, trustworthy AI with synthetic data generation for enhanced privacy protection, using datasets completely devoid of personal health information.
Beyond residential history analysis, EHRLICH includes tools simulating individual daily movement patterns, enabling uncertainty assessment across exposure types rather than simple residential location assignments. This approach provides more nuanced exposure modeling reflecting actual human behavior patterns.
Hanson’s “big dream” involves placing “a pulse on population health” through interoperable computational tools, enabling comprehensive human health understanding. Her work demonstrates that achieving precision medicine involves foundational advances in data harmonization and integration, not just sophisticated predictive algorithms.
The next speaker, Pierre-Antoine Gourraud, university professor and hospital practitioner at the Faculty of Medicine at the University of Nantes, expanded on a topic that Hanson introduced in the previous presentation: generating synthetic data to preserve patient privacy while still allowing for the types of analyses that are crucial in making advances in healthcare.
Gourraud started by discussing his 2023 publication in npj Digital Medicine, which he believes fundamentally challenges current approaches to biomedical data analysis by arguing that patient-centric synthetic data generation eliminates the need for risky personal health data usage, even in anonymized form (Guillaudeux et al., 2023). “There is no reason to risk re-identification in biomedical data analysis because synthetic data offers an effective replacement,” he asserted. He described this as opening “a new era where we actually are going to all use anonymous, synthetic, and probably augmented datasets.”
He then discussed the development of synthetic health data and how it has been accelerated by Europe’s stringent privacy regulations. The General Data Protection Regulation (GDPR), a comprehensive data protection law in the European Union, defines three health data categories: personal data, pseudonymous data with direct identifiers removed, and truly anonymous data proven impossible to link to individuals. Under GDPR, data controllers must prove anonymity by answering three critical questions: Can individuals be singled out? Can records be linked to individuals? Can information be inferred about individuals?
Gourraud manages a biomedical data warehouse containing more than 100 million text documents from 3 million patients. When exporting data beyond the warehouse, he must prove complete anonymity. His synthetic data generation method enables negative answers to all three GDPR questions, ensuring true anonymization.
Gourraud’s team developed Avatar, a method transforming original data into functionally equivalent but completely anonymous synthetic data. Avatar-generated data possess three essential properties: structural identity (same granularity, statistical units, observations, variables, and types as original data), analytical relevance (producing comparable results to original data), and indistinguishability (neither experts nor algorithms can differentiate synthetic from original data).
The Avatar process creates individual models for each observation in the original dataset. For each row in a data table, the method defines mathematically a set of neighbor rows—typically 12—that closely resemble the original. These neighbors form a local model containing no original data, only similar data points. Within this local model, he said, a single simulation occurs: “It is a little bit counterintuitive because you do a private local model that is used only once, and the very link you’re trying to destroy, you’re going to use it just to measure how many other simulations are actually closer from the original observation than the one you just sampled.”
Gourraud’s first major application involved a study on intracranial aneurysm rupture risk factors. When colleagues requested open science publication including data sharing, privacy concerns initially prevented pseudonymous data release. He noted that the Avatar method generated simulations for approximately 2,300 observations, creating “the exact same spreadsheet, the exact same table, same number of variables, same type of variables, [and] same types of rows, by selecting the hyperparameters that were very important to see what balance we can get between accuracy and privacy.” He stated that the resulting synthetic dataset maintained analytical integrity while eliminating personal health information risks, enabling full open science publication.
Another application involved developing a clinical decision-support system for multiple sclerosis treatment. The AI tool selects reference patients from databases combining clinical trial and observational study data, identifying patients with similar characteristics to support treatment decisions. Neurologists reported identical results whether using synthetic or real data for clinical decisions. Detailed validation by Gourraud’s PhD student confirmed that synthetic data performed equivalently to real data across all tool applications and population subgroups. The synthetic approach delivered identical clinical outcomes without privacy compromise.
Beyond privacy-protected analysis, synthetic health data enable multiple applications. Data sharing becomes feasible through “sandboxes” where potential partners can explore synthetic datasets to demonstrate intended usage without accessing real patient information. Educational applications benefit significantly, providing students with challenging, realistic datasets for learning without privacy concerns. Software testing traditionally requires real data but becomes safer using synthetic alternatives.
Gourraud frames synthetic data generation as fundamentally ethical: “There is no reason to risk re-identification if the patient is not using or the caregiver is actually not directly benefiting from the analysis that we are conducting.” This perspective positions privacy protection as a moral imperative rather than merely regulatory compliance.
The GDPR-driven innovation model suggests broader potential for regulation-fostered advancement. Gourraud expressed conviction that “new regulation can foster worldwide innovation that indeed replicates individual data,” extending beyond European borders to global biomedical research.
Gourraud’s Avatar method represents a paradigm shift from privacy-risk management to privacy-risk elimination in biomedical data analysis. By generating structurally identical, analytically relevant, and truly anonymous synthetic data, the approach enables full scientific openness without compromising patient privacy.
The success in clinical applications—from aneurysm research to multiple sclerosis treatment support—demonstrates practical viability. As regulatory frameworks worldwide increasingly emphasize data protection, Gourraud’s synthetic data generation offers a pathway forward that simultaneously advances scientific progress and upholds ethical standards.
He ended his remarks highlighting how this innovation exemplifies how regulatory constraints can catalyze technological advancement, transforming compliance challenges into opportunities for methodological breakthroughs that benefit both research communities and patient populations.
Costes opened the discussion session by asking how the presenters’ tools address the issue of data silos and transform diverse data sources into formats ready for AI.
Hanson began by describing her team’s work with unstructured text data from national cancer registries across six to seven states. She explained that data are collected differently at every site, requiring extensive data engineering tools to create harmonized structures. “We try to develop a set of rules that can be applied across all the different datasets,” she said, noting that they hope model training will address remaining heterogeneity. Her team is now transitioning to structured data from electronic health records, which she described as more difficult than unstructured data work.
When Costes asked about scalability from state-level to national questions and similarly asked Gourraud about scaling from university hospitals to entire countries, the responses revealed significant challenges. Gourraud explained that coordination exists among university hospitals nationally, with the key issue being data exchange. “That’s how we got interested in anonymous synthetic data, because the best way is actually to show the data,” he said, noting that data exchange can begin discussions about converging on comparable formats.
Hanson was frank about scaling limitations: “No, we’re not ready to scale up. As optimistic as I am, what I’ve also noticed as we’ve brought on new datasets is even if they are very similar in nature, there are ‘gotchas’ as you start to process the data [including] characters that you didn’t expect to be in there.” She believes scaling is possible but acknowledged that “it will take work to get there.”
The discussion shifted to synthetic data when Costes relayed audience questions about preserving correlations between features in multimodal datasets. Gourraud answered affirmatively, but Hanson disagreed, stating that the complexity of multimodal modeling would necessitate the creation of comprehensive synthetic datasets
equivalent to human population datasets, which is not yet attainable. Gourraud countered by highlighting his Avatar program’s robustness with heterogeneous populations containing multiple subpopulations. “Most of the time when my colleagues come with a dataset, they have actually three, four, [or] five different subpopulations,” he said, explaining that the method automatically uses local models for subgroups, which makes it “extremely robust and extremely useful for hypothesis generation.”
Regarding a question on democratizing healthcare data access, Hanson suggested combining federated learning with synthetic data. “It is the combination of both of those that I think is really innovative and really going to lead us to cool places in the future,” she said. Gourraud agreed, noting that even with perfect synthetic data, controlling institutional data use makes federated learning important.
Costes then asked about privacy concerns in federated learning. Hanson identified membership inference attacks as basic concerns, explaining that “there are a couple of repositories online where you can reconstruct data from the gradients that you’re passing back and forth in a federated learning run, and it is a shockingly small amount of code that gets you to that recreation.”
The discussion moved to handling outliers in synthetic data. Gourraud explained that this is very difficult because “somebody who stands out of the crowd is somebody that you are going to re-identify.” The solution involves bringing outliers “back into the pack,” and when outliers form subgroups, generating more synthetic outliers than existed originally. His group generates “at least 10 times as much synthetic data as the original data.”
Hanson noted problems with synthetic data approaches depending on purpose. For predictive models, overinflating minority classes might improve performance while protecting privacy, but for population assessments requiring unchanged distributions, “overinflating those minority classes may lead to a problem in what I'm trying to estimate,” she explained.
When Costes asked about inappropriate synthetic data uses, Hanson stated she does not believe synthetic data are ready for hypothesis generation because correlation structures must be protected. Gourraud countered that in many situations, “it is either synthetic data or nothing,” and he suggested using synthetic data with subsequent validation against original datasets.
On integrating mechanistic models with AI, Gourraud emphasized focusing on data rather than algorithms: “Rather than being in the hype of ‘let’s try the newest model or AI algorithm,’ I think we’re both big believers in data, [so] that’s where we’re putting our efforts.” Hanson expressed preference for multiple approaches over one-size-fits-all solutions suggesting triangulation with different models. “I think that you can use mechanistic models to help you identify where there should be guardrails on some of what we’re doing with AI,” she said, emphasizing the goal of combining models.
Finally, Costes asked about establishing consortiums for data standardization. Hanson supported this but cautioned against rule development–focused groups, instead favoring those that “start to think about how we can identify technological solutions that will allow us to standardize and harmonize the data.” Gourraud agreed: “I think it is great to have the ambition of having standards, but we have to remember that they are a means to an end, and that it is by doing projects [and] by being grounded into practical examples that are useful for patients and for caregivers that we will succeed.”
Al-Sawaf, O., J. Weiss, M. Skrzypski, J. M. Lam, T. Karasaki, F. Zambrana, A. C. Kidd, A. M. Frankell, T. B. K. Watkins, C. Martínez-Ruiz, C. Puttick, J. R. M. Black, A. Huebner, M. A. Bakir, M. Sokač, S. Collins, S. Veeriah, N. Magno, C. Naceur-Lombardelli, P. Prymas, A. Toncheva, S. Ward, N. Jayanth, R. Salgado, C. P. Bridge, D. C. Christiani, R. H. Mak, C. Bay, M. Rosenthal, N. Sattar, P. Welsh, Y. Liu, N. Perrimon,
K. Popuri, M. F. Beg, N. McGranahan, A. Hackshaw, D.M. Breen, S. O’Rahilly, N. J. Birkbak, H. J. W. L. Aerts, the TRACERx Consortium, M. Jamal-Hanjani, and C. Swanton. 2023. Body composition and lung cancer-associated cachexia in TRACERx. Nature Medicine 29(4):846–858.
Beers, A., J. Brown, K. Chang, K. Hoebel, J. Patel, K. I. Ly, S. M. Tolaney, P. Brastianos, B. Rosen, E. R. Gerstner, and J. Kalpathy-Cramer. 2021. DeepNeuro: An open-source deep learning toolbox for neuroimaging. Neuroinformatics 19(1):127–140.
Chang, K., A. L. Beers, H. X. Bai, J. M. Brown, K. I. Ly, X. Li, J. T. Senders, V. K. Kavouridis, A. Boaro, C. Su, W. L. Bi, O. Rapalino, W. Liao, Q. Shen, H. Zhou, B. Xiao, Y. Wang, P. J. Zhang, M. C. Pinho, P. Y. Wen, T. T. Batchelor, J. L. Boxerman, O. Arnaout, B. R. Rosen, E. R. Gerstner, L. Yang, R. Y. Huang, and J. Kalpathy-Cramer. 2019. Automatic assessment of glioma burden: A deep learning algorithm for fully automated volumetric and bidimensional measurement. Neuro-Oncology 21(11):1412–1422.
Elhakim, T., K. Trinh, A. Mansur, C. Bridge, and D. Daye. 2023. Role of machine learning–based CT body composition in risk prediction and prognostication: Current state and future directions. Diagnostics (Basel) 13(5):968.
Elhalawani, H., B. Elgohari, T. A. Lin, A. S. R. Mohamed, T. J. Fitzgerald, F. Laurie, K. Ulin, J. Kalpathy-Cramer, T. Guerrero, E. B. Holliday, G. Russo, A. Patel, W. Jones, G. V. Walker, M. Awan, M. Choi, R. Dagan, O. Mahmoud, A. Shapiro, F. S. Kong, D. Gomez, J. Zeng, R. Decker, F. O. B. Spoelstra, L. E. Gaspar, L. A. Kachnic, C. R. Thomas Jr., P. Okunieff, and C. D. Fuller. 2019. An in-silico quality assurance study of contouring target volumes in thoracic tumors within a cooperative group setting. Clinic al and Translational Radiation Oncology 15:83–92.
Guillaudeux, M., O. Rousseau, J. Petot, Z. Bennis, C. A. Dein, T. Goronflot, N. Vince, S. Limou, M. Karakachoff, M. Wargny, and P. A. Gourraud. 2023. Patient-centric synthetic data generation, no reason to risk re-identification in biomedical data analysis. npj Digital Medicine 6(1):37.
Hanson, H. A., C. L. Leiser, G. Bandoli, B. H. Pollock, M. R. Karagas, D. Armstrong, A. Dozier, N. G. Weiskopf, M. Monaghan, A. M. Davis, E. Eckstrom, C. Weng, J. N. Tobin, F. Kaskel, M. R. Schleiss, P. Szilagyi, C. Dykes, D. Cooper, and S. L. Barkin. 2020. Charting the life course: Emerging opportunities to advance scientific approaches using life course research. Journal of Clinical and Translational Science 5(1):e9.
Koetzier, L. R., D. Mastrodicasa, T. P. Szczykutowicz, N. R. van der Werf, A. S. Wang, V. Sandfort, A. J. van der Molen, D. Fleischmann, and M. J. Willemink. 2023. Deep learning image reconstruction for CT: Technical principles and clinical prospects. Radiology 306(3):e221257.
Lekadir, K., A. F. Frangi, A. R. Porras, B. Glocker, C. Cintas, C. P. Langlotz, E. Weicken, F. W. Asselbergs, F. Prior, G. S. Collins, G. Kaissis, G. Tsakou, I. Buvat, J. Kalpathy-Cramer, J. Mongan, J. A. Schnabel, K. Kushibar, K. Riklund, K. Marias, L. M. Amugongo, L. A. Fromont, L. Maier-Hein, L. Cerdá-Alberich, L. Martí-Bonmatí, M. J. Cardoso, M. Bobowicz, M. Shabani, M. Tsiknakis, M. A. Zuluaga, M. C. Fritzsche, M. Camacho, M. G. Linguraru, M. Wenzel, M. De Bruijne, M. G. Tolsgaard, M. Goisauf, M. Cano Abadía, N. Papanikolaou, N. Lazrak, O. Pujol, R. Osuala, S. Napel, S. Colantonio, S. Joshi, S. Klein, S. Aussó, W. A. Rogers, Z. Salahuddin, M. P. A. Starmans, and the FUTURE-AI Consortium. 2025. FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ 388:e081554.
Liu, S., O. Pastor-Serrano, Y. Chen, M. Gopaulchan, W. Liang, M. Buyyounouski, E. Pollom, Q. T. Le, M. Gensheimer, P. Dong, Y. Yang, J. Zou, and L. Xing. 2024. Automated radiotherapy treatment planning guided by GPT-4Vision. ArXiv [Preprint]. Jul 1:arXiv:2406.15609v2.
Ng, S. P., B. A. Dyer, J. Kalpathy-Cramer, A. S. R. Mohamed, M. J. Awan, G. B. Gunn, J. Phan, M. Zafereo, J. M. Debnam, C. M. Lewis, R. R. Colen, M. E. Kupferman, N. Guha-Thakurta, G. Canahuate, G. E. Marai,
D. Vock, B. Hamilton, J. Holland, C. E. Cardenas, S. Lai, D. Rosenthal, and C. D. Fuller. 2018. A prospective in silico analysis of interdisciplinary and interobserver spatial variability in post-operative target delineation of high-risk oral cavity cancers: Does physician specialty matter? Clinical and Translational adiation Oncology 12:40–46.
Peng, J., D. D. Kim, J. B. Patel, X. Zeng, J. Huang, K. Chang, X. Xun, C. Zhang, J. Sollee, J. Wu, D. J. Dalal, X. Feng, H. Zhou, C. Zhu, B. Zou, K. Jin, P. Y. Wen, J. L. Boxerman, K. E. Warren, T. Y. Poussaint, L. J. States, J Kalpathy-Cramer, L. Yang, R. Y. Huang, and H. X. Bai. 2022. Deep learning-based automatic tumor burden assessment of pediatric high-grade gliomas, medulloblastomas, and other leptomeningeal seeding tumors. Neuro-Oncology 24(2):289–299.
Pickhardt, P. J., B. D. Pooler, T. Lauder, A. M. del Rio, R. J. Bruce, and N. Binkley. 2013. Opportunistic screening for osteoporosis using abdominal computed tomography scans obtained for other indications. Annals of Internal Medicine 158(8):588–595.
Robinson-Weiss, C., J. Patel, B. C. Bizzo, D. I. Glazer, C. P. Bridge, K. P. Andriole, B. Dabiri, J. K. Chin, K. Dreyer, J. Kalpathy-Cramer, and W. W. Mayo-Smith. 2023. Machine learning for adrenal gland segmentation and classification of normal and adrenal masses at CT. Radiology 306(2):e220101.
Warraich, H. J., T. Tazbaz, and R. M. Califf. 2025. FDA perspective on the regulation of artificial intelligence in health care and biomedicine. JAMA 333(3):241–247.
Yan, R., M. T. Islam, and L. Xing. 2025. Interpretable discovery of patterns in tabular data via spatially semantic topographic maps. Nature Biomedical Engineering 9(4):471–482. https://doi.org/10.1038/s41551-024-01268-6.