The first session of the workshop’s second day, moderated by Aarti Singh (Carnegie Mellon University), was devoted to the topic of grand challenges in the question of developing artificial intelligence (AI) for use in scientific discovery. This particular use of AI, Singh said, would require meeting key milestones that constitute the scientific discovery process, including the ability to formulate a scientific hypothesis, identify variables that need to be measured along with the corresponding data sources (simulations, experiments, literature, etc.) and record the relevant data, test a hypothesis reliably using the collected data, and then refine the hypothesis based on inference. The goal of the session, she said, would be to develop insights into which specific domains AI can—or cannot—be of use for scientific discovery and to identify fundamental roadblocks and opportunities relating to that use. Singh said that one way to approach this goal is through educating people about AI capabilities and limitations.
Anthony Bak (Palantir Technologies) spoke first and focused on the issue of understanding by referencing AI’s involvement in scientific discovery. AI can be used to produce predictions or hypotheses, he said, but
it cannot truly develop any understanding of a subject, at least not in the way that humans do.
As an example, he pointed to current AI-driven image classification models, which can perform as well as or better than humans at identifying images. Show a photo of a school bus to such a model, and it will correctly identify it as a school bus. This example makes it tempting to think that the model has developed a concept of what a school bus is, which would be a form of understanding. But adding a tiny amount of noise to the image—so little noise that the image appears unchanged to the human eye—can prevent the model from recognizing the image as a school bus. Thus, Bak argued, the model could not have understood what a school bus was in any standard sense of “understanding.” Similarly, as was discussed during the workshop’s first day, a large language model can provide the rule for multiplying two five-digit numbers, but it cannot accurately follow the rule to do the multiplication. This showed that the model clearly did not understand the rule.
Since understanding plays a key role in the scientific process in such things as hypothesis generation or designing experiments to test a hypothesis, it will be important to have scientific understanding be a part of AI models used for scientific discovery, he said, but he predicted that this problem of understanding will not be easily solved. Thus, it is a grand challenge.
The next speaker, Aisha Walcott-Bryant (Google Research–Kenya), approached the session’s topic somewhat differently. She spoke about scientific challenges—specifically, challenges related to improving the quality of life in Africa—that could be met more easily with the help of AI.
Walcott-Bryant offered three grand challenges concerned with improving health and quality of life in Africa: controlling malaria and other diseases, forecasting floods, and addressing food insecurity. Walcott-Bryant offered examples of how AI methods are already helping with each. In the case of malaria control, for example, she described a method proposed by a group she led (Bent et al., 2018) for searching among various possible malaria control policies to identify those that are most promising. The technique sets forth the optimization question as a stochastic multi-armed bandit problem and then uses three agent-based strategies to explore the resulting space of possible policies. The findings offer information on the cost-effectiveness of various strategies that can be used by policymakers to find the optimal malaria control strategy for a given situation.
Similarly, researchers have developed machine learning models for flood forecasting (Kratzert et al., 2019; Nevo et al., 2022), which can help provide earlier warnings to people in the path of a flood. Finally, Walcott-Bryant suggested two ways in which AI and machine learning techniques could help end hunger and malnutrition in Africa and transform the continent’s food systems: by optimizing networks of smallholder farmers for increased production and by forecasting food insecurity and malnutrition.
The panel’s final presentation was by Tapio Schneider (California Institute of Technology), who used lessons from climate change to offer more general recommendations for how best to use AI for scientific discovery. He began by speaking about the urgency of gaining a better understanding of what changes climate change is likely to bring about and how quickly those changes will occur. Accurate predictions are needed now to inform policies on such things as wildfires and flood control, he said, since currently “America is flying blind.”
Climate researchers have a great deal of data available to them, Schneider said, and these data are used effectively in weather forecasting, but they are not the sort of data generally used in climate models. That is, there is a gap between the data and the climate models. And there is a second gap related to end users and risk-related data from climate models. Both gaps can be bridged by AI tools, he said, and he showed a general model of how that can be done.
The scientific method uses an integrative approach, which Schneider referred to as a “knowledge discovery loop”: Data —> Learn about Model —> Model —> Design Experiment —> Data and so forth, over and over again (NASEM, 2022). Much of the knowledge discovery loop process can now be automated, he noted. Two parts of the process, “Data” and “Model,” will vary considerably depending on the scientific question being asked, but the other two parts of the model, “Learn about Model” and “Design Experiment,” will have many similarities from one question to another; this is where universal AI models can play an important role.
Schneider closed by listing some of the challenges for AI and automation in science and some of the pathways to solutions. The challenges included data that are noisy, are heterogeneous, have missing values, and are rarely labeled; the fact that models can be difficult to differentiate or even be non-differentiable; the need for uncertain quantification; the prevailing
machine learning paradigm that supervised learning is too restrictive; and structural difficulties with incentivizing team science and supporting infrastructure. Possible pathways to solutions, he said, include treating machine learning as an inverse problem, harnessing diverse data, and enabling uncertainty quantification; investing in gradient-free learning methods; developing well-engineered open-source software; and investing in infrastructure for research institutions to have access to computing and human resources, including research software engineers.
The second session, moderated by Yolanda Gil (University of Southern California), explored some promising first steps toward developing AI for scientific discovery, including immediate challenges that can be realistically tackled to demonstrate progress along the way. Gil suggested that one potential first step would be for AI systems to write a scientific paper, which would require representing a great deal of scientific knowledge in the system.
The panel’s first speaker was Steve Chien (NASA [National Aeronautics and Space Administration] Jet Propulsion Laboratory, California Institute of Technology). He began by describing areas where AI is already having a tremendous effect in space research. For example, for several decades machine learning has been used to classify objects observed in the sky through optical and radio telescopes. These telescopes produce so many observations that human scientists cannot examine and classify more than a small fraction of them. Machine learning has been used to index objects seen in images of the surface of Mars and other bodies, such as craters, holes, dunes, and slopes. In one case, machine learning was used to detect fresh impact craters on Mars, which allows researchers to examine the geology underneath the surface before the geology gets covered by weathering (Wagstaff et al., 2022). AI is also used extensively for the automated scheduling of space missions, saving money and making it possible for researchers to get more out of each mission.
Switching to the next steps, Chien sketched out ways that AI may be used to aid the space program in the future. In a proposed mission to
Europa, one of Jupiter’s moons, the lander will dig into the surface, looking for, among other things, signs of potential life. Since it takes several hours to get signals from Earth to Europa, the lander will have to be mainly self-directed, and AI offers the best opportunity for the lander to take into account the data it is accumulating in making decisions about what to do next. Similarly, a proposed mission to Enceladus, one of Saturn’s moons, will have a snake-like robot descending into the moon’s crevasses, and it too will need to be directed by AI. Yet another potential use of AI, Chien suggested, would be to search for ways to connect what is known about Earth’s prebiotic conditions and the origin of life with the last universal common ancestor, helping to fill in a major void in the current understanding of the development of life on Earth. Finally, he concluded that AI could be used to fill in gaps in the understanding of the evolution of the solar system.
Peter Clark (Allen Institute for AI [AI2]) spoke about the task of evaluating AI discovery systems and how to create a suitable grand challenge to motivate the community. As an illustration of designing and executing a challenging task, Clark talked about AI2’s work developing a system that could reason about science, in pursuit of Paul Allen’s dream of building a “Digital Aristotle.”
To turn that quest into a community-based challenge and encourage the community to get involved, AI2 created the challenge of building a system that could pass a standardized eighth-grade science exam (multiple-choice part). The challenge was launched in 2015 on the Kaggle platform with a $50,000 prize. In some ways it was highly successful: There were more than 600 entries from 170 teams, and the competition served to put “scientific reasoning” on the AI research agenda. However, no challenge entrant was able to score more than 60 percent (a “fail” on the test), reflecting how hard scientific reasoning was at that time. Now in 2023, AI has advanced immeasurably, and this test is largely a solved problem; new challenges of explanation and scientific discovery have replaced multiple-choice question answering.
Finally, Clark floated two ideas for a scientific discovery grand challenge. The first is a scientific literature challenge, where a large corpus of papers is seeded with synthetic papers containing hidden “discoveries” to test a system on, such as identifying a paper that contradicts the majority sentiment on the topic. The second is a simulated experimental environ-
ment called Discovery World, in which an autonomous agent has to solve discovery tasks—for example, identifying why (simulated) researchers in the environment are mysteriously getting sick. Finally, Clark concluded with general advice for creating challenges, drawing on AI2’s earlier experience: a good scientific discovery grand challenge should have clear, automated metrics; should have graduated levels of passing; and should be run with both a long lead-time and a long competition-time so that participants could both prepare and participate to the maximum—and also to expect to expend a lot of time and energy.
The panel’s final presenter, Manuela Veloso (JPMorgan Chase), discussed two types of steps toward the future of AI: competitions and simulations. The competition she focused on was the RoboCup challenge. This was an effort that began several decades ago by Hiroaki Kitano to develop autonomous robots that can play soccer. The RoboCup challenge’s goal is to develop, by 2050, a team of soccer robots that could defeat the World Cup champion human soccer team. In response, many types of soccer-playing robots have been developed, and many of these compete in robot soccer leagues. The difficulty at the heart of this challenge, Veloso said, is for the robots on a team to cooperate and coordinate with each other effectively enough to score goals by getting the soccer ball in the other team’s net.
Similarly, other researchers have developed teams of rescue robots that would be able to work together to find and rescue people after a disaster. While researchers remain far from the goal of a robot soccer team that could win the World Cup, the process of working on such robots has been extremely rewarding, Veloso said, as it has not only greatly increased the understanding and capabilities in this area but also worked to create and strengthen a community of researchers interested in the subject, whose members share their ideas and solutions.
In a completely different area, Veloso described the Agent-Based Interactive Discrete Event Simulation (ABIDES) environment, which is used to support agent-based research on economic markets (Byrd et al., 2020). Researchers can run market simulations with tens of thousands of trading agents acting autonomously but interacting with one another in the virtual market. This provides economists and others interested in markets with a tool with which to carry out experiments to examine how market behavior changes with different situations and participants.
One potential use of ABIDES is the development of machine learning algorithms for trading. Looking to the future, Veloso suggested that collaborations of humans and AI could be very effective in solving problems. In one scenario, AI could look at problems and decide whether AI can solve them or not. If not, the problem would be provided to humans to solve, with the solution recorded by AI. Over time, AI would learn more and more by examining human solutions, so it would be able to take on an increasing percentage of problems, freeing humans to focus on the more complex problems that AI is not yet capable of solving.
Gil noted that in the robot soccer challenge that Veloso discussed, it took a significant investment for a team to get started, but Gil said that she sensed that this investment was mitigated somewhat by the sharing of resources; she asked Veloso to expand on this prospect. Veloso agreed that various teams did indeed share resources, and in particular, when a team developed a useful algorithm, they were encouraged to share the algorithm. The models tended to be composed of independent components—a vision component, a kicking component, a positioning component, and a communication component—and teams would focus on improving the components that were of most interest to them. Clark said that for any challenge, not just robot soccer, it is important to provide the resources that allow people to get started; that is what his group did with the science test challenge by providing a database to work with. He added that the modular approach Veloso described is also very useful in challenges and can help the community advance more quickly by breaking the problem into pieces.
Chien said that the success metric may be the most important part of a challenge. He suggested that instead of scoring based on how well a model scores on a test, a new test should be issued and the challenge would be won by the model that scored the highest in 1 week. That is closer to what happens in industry, where speed of development is an important metric.
In response to a question from Gil, Clark said that it is difficult to find a grand challenge that will engage scientists from many different fields, but there are some possibilities. Managing scientific literature is one possibility because people in all fields want to be able to explore and mine the literature. General hypothesis formation is another topic that might hold a grand challenge that would engage researchers from all areas of science.
Moderator and workshop chair Bradley Malin (Vanderbilt University) opened the last panel session by introducing the four panelists, who spoke about how different funding organizations can help enable the use of AI in scientific discovery. Questions for the panelists included the following: What activities are they interested in, and what are the next steps in AI for scientific discovery? What are the big movements in AI scientific research that are occurring right now in preparation for the future? What is missing from research that is needed?
Susan Gregurick (National Institutes of Health [NIH]) spoke first. She listed several AI-related challenges that have shaped NIH’s funding. These include aligning datasets and algorithms to use cases; integrating clinical research, health care, and environment-related data; addressing bias in data; creating an inclusive AI workforce; and creating and improving an AI infrastructure.
NIH has had a rapidly growing budget devoted to AI-related projects over the past few years, Gregurick said. Some of the specific programs being funded include Bridge2AI, which is intended to generate new “flagship” datasets and best practices for machine learning analysis; Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity, or AIM-AHEAD, which is meant to increase the participation and representation of researchers and communities that are currently underrepresented in the development and use of AI and machine learning; the Brain Research through Advancing Innovative Neurotechnologies Initiative, or The BRAIN Initiative; cognitive systems analysis of Alzheimer’s disease; and the 4D Nucleome Program, which is intended to study the three-dimensional organization of the cell nucleus in space and time. There are also collaborations to make data FAIR (findable, accessible, interoperable, and reusable) and ready for use with AI and machine learning.
The next presenter was Andrey Kanaev (Office of Advanced Cyberinfrastructure, U.S. National Science Foundation [NSF]). NSF has sup-
ported AI research for several years, Kanaev said, and its flagship program is its collection of 25 national AI research institutes spread across the United States. However, NSF also funds AI through a variety of other programs.
The focus of his talk was the National Artificial Intelligence Research Resource (NAIRR), whose creation was mandated by the National Artificial Intelligence Initiative Act of 2020 (H.R. 6216, 116th Cong.). The NAIRR will provide a shared research infrastructure that facilitates access to computing capabilities, software, datasets, models, and training and user support for researchers and students. The stated objective of the resource is to strengthen and democratize the U.S. AI innovation ecosystem in a way that protects privacy, civil rights, and civil liberties. Its goals are to spur innovation, increase the diversity of talent in AI, improve U.S. capacity for AI research and development, and advance trustworthy AI. Currently, an interagency group is working to plan and launch a pilot for the NAIRR.
The pilot will involve not just AI researchers but also domain scientists applying AI in their research as well as students and educators. Participants will come from academic institutions; nonprofit organizations; federal agencies; federally funded research and development centers; startups and small businesses; and state, local, and tribal agencies.
Patrick Rose (Federal Agency for Disruptive Innovation [SPRIN-D]) spoke next. SPRIN-D is a limited liability corporation funded by the German government on the model of the U.S. Defense Advanced Research Projects Agency (DARPA). Rose explained that SPRIN-D works to empower innovators with “wild and crazy” ideas that have large potential payoffs. It does this by holding competitions and funding the most promising ideas. The agency also works as a venture capital company, providing initial investments in promising companies.
The goal, he said, is to bring innovations to society that will make a difference. SPRIN-D is trying to understand how to invest most effectively in AI, Rose said, and it is exploring a series of AI challenges that address promising competencies of German/European science, business, and civil society. Besides looking for proposals to create AI scientists, the agency is also thinking about pushing for AI engineers, AI lawyers, AI doctors, and others whose purpose will be to provide support to professionals in those fields.
Achieving true AI, he said, will require not only programming and an extensive database but also what is considered unconventional computing
architecture to train the programs and an open-source platform. A major issue is that the development of machine learning algorithms is starting to outpace the current understanding of science and technology. SPRIN-D is looking to enable the development of algorithms that can deal with little or no data and make cohesive, rational, and explainable decisions. They see the solution as allowing various actors from science, industry, and civil society to participate, thus comprehensively illuminating a field.
Mark Greaves (Schmidt Futures) began his presentation by asking, “What is the role of philanthropy in AI?” Philanthropy will never have the scale of government funding or the budgets of large corporations, he said. Philanthropy thus acts to fill gaps.
In particular, Schmidt Futures—founded by Eric Schmidt, the former chief executive officer of Google—has an interest in AI because of the interests of its founder. It has two types of programs, talent-based programs and project-based programs. Its talent-based programs develop talent by funding postdoc training in AI, whereas its project-based programs look to build something. An example is a program called Future House that seeks to use AI to build better literature searches in biochemistry and, eventually, to create self-driving labs.
AI2050, which Greaves directs, is a 5-year, $125 million program at Schmidt Futures that seeks to solve problems that might prevent AI from being a beneficial technology by the year 2050. The program has identified 10 categories of hard problems that need to be solved, including issues in technical AI, robustness, the alignment of human values, and responsible development. One category asks how AI can be used to address society’s greatest problems, including challenges in health care, sustainability, and basic science. The program also seeks to address more difficult and less well-defined problems such as challenges with AI in the workforce, AI in economics, global access and participation, geopolitics, and stability; how to measure the impact of AI; and what it means to be human in the age of AI.
Malin opened the discussion by asking how AI funding fits into overall budgets. If AI funding is going up, for instance, does that mean that funding for other items goes down? Gregurick said that the NIH’s data
science budget, which includes additional funding for AI, is expected to keep increasing in the future; because it is separate, it does not affect other budget items except to the extent that the overall NIH budget is expected to be flat in 2024. Kanaev said that the increase in funding for AI does to some extent affect funding for other areas. Rose said that because SPRIN-D is funding leading-edge research and development, the success of e-testing and evaluation will inform future funding decisions. Greaves said that many philanthropies that were not previously interested in AI are becoming more interested because of the publicity surrounding the area, and their budgets are reflecting that.
In answering a question from a member of the audience, Gregurick identified AI algorithms and AI computing infrastructure as one area of particular need for funding. Greaves identified AI safety as another area of concern that needs greater attention.
This page intentionally left blank.