News and stories

News and stories U.S. Department of the Air Force Must Invest in Artificial Intelligence Development, Prioritize Test and Evaluation, Says New Report

U.S. Department of the Air Force Must Invest in Artificial Intelligence Development, Prioritize Test and Evaluation, Says New Report

News Release

By Josh Blatt

Last update September 7, 2023

WASHINGTON ― To integrate artificial intelligence-enabled capabilities at the necessary speed and scale, the Air and Space Forces of the U.S. Department of the Air Force (DAF) should commit to making immediate and sustained investments in AI governance, workforce, research and development, digital infrastructure, test and evaluation processes, experimentation, and tailored standards and practices, says a new report from the National Academies of Sciences, Engineering, and Medicine.

Artificial intelligence (AI) has pervasive implications for the DAF. The department will need experts throughout its workforce, as well as AI-specific test and evaluation capabilities, infrastructure, methods, policies, and tools, all of which will be necessary to get a comprehensive initial understanding of the reliability and usefulness of a range of AI-enabled systems, the report says. However, to build and maintain the levels of trust and confidence required for the DAF to adopt these technologies safely and responsibly, testing and evaluation must be built in and ongoing.

AI-enabled capabilities in the future could include reconnaissance and pattern detection; target identification and tracking; detection and prevention of cyberattacks; navigation and obstacle avoidance; weather forecasting; real-time situational awareness sensing and alerts for pilots; training and simulation partners; “sidekick” drones for fighter jets; and autonomous drones and weapons systems.

DAF is still in the early stages of incorporating modern AI technology into its systems and operations, and it does not currently have the capacity or digital infrastructure to support AI development, testing, and evaluation, the report says. Though the department is no further behind than most other organizations and agencies in establishing AI test and evaluation processes and procedures, there are significant risks in not making rapid adjustments.

Examples of the needed test and evaluation capabilities include instruments to record and transmit data that machine learning systems can easily consume, to detect deviations in the performance of AI-based systems and for training and experimentation; synthetic data engines and “digital twins” ― virtual representations used to simulate real-world behavior and characteristics; and the ability to regularly monitor and rapidly retrain and redeploy AI models.

“We expect that AI will be embedded throughout the entire DAF over the next decade, requiring significant operational and cultural changes,” said Thomas Longstaff, co-chair of the committee that wrote the report and chief technology officer of the Software Engineering Institute at Carnegie Mellon University. “Dedicated leadership, continuous oversight, individual responsibility and accountability, and judicious investments will help the DAF take on these rapid and evolving transformations.”

The report recommends that the secretary of the Air Force formally designate a general officer or senior civilian executive as an “AI test and evaluation champion” to address the unique challenges of AI, with all requisite authority, responsibilities, and resources to ensure that AI testing and evaluation are fully integrated and appropriately funded. AI’s integration throughout the DAF also means the AI test and evaluation champion will have to establish a clear governance structure across a range of departments and units. This individual should also lead the DAF’s efforts to implement the other recommendations made in the report.

Training AI requires enormous amounts of data, and how data are collected, managed, curated, and maintained is crucial to the system’s sustainability. However, deployed AI models will inevitably encounter operational conditions not represented in the training data, and could behave in unanticipated ways, the report says. Systems that combine different sorts of software — legacy software without AI, new software without AI, current or legacy software to which AI has been added, and new software with baked-in AI capabilities — add layers of complexity.

Human-system integration will present another set of challenges. The workload distribution between humans and AI systems will change continuously, with responsibilities potentially shifting as human confidence in the systems grows, or, eventually, depending on the real-time situation. AI-enabled machines may operate at speeds greater than what human operators are accustomed to or comfortable with. The report says these challenges underscore the need for continuous testing and evaluation of AI systems to maintain trust, and for extensive human-machine testing and training processes and standards.

“DAF’s safe operations are a testament to 75 years of comprehensive and rigorous service-wide test and evaluation personnel, policies, processes, and practices,” said May Casterline, principal solutions architect at NVIDIA, and committee co-chair. “But demand for AI-enabled capabilities is going to accelerate substantially over the next few years, presenting new challenges, and it is critical that the DAF adapt so that it can develop and field AI effectively, safely, and responsibly.”

The report lays out other steps that the DAF should take to address the implications of AI, including:

Provide AI education, training, and certifications, as applicable, to all personnel through the Air Force Test Center, and institute career-long tracking and management of personnel with relevant skills.
Standardize AI test and evaluation protocols to assess the impacts of major AI-related risk factors.
Establish red-teaming activities focused on AI-based systems, test against threats uncovered, and coordinate investments to address findings and augment private sector research.
Invest in developing and testing trustworthy AI-enabled systems, to ensure service branch members have trust and justified confidence in the systems.

Provide AI education, training, and certifications, as applicable, to all personnel through the Air Force Test Center, and institute career-long tracking and management of personnel with relevant skills.
Standardize AI test and evaluation protocols to assess the impacts of major AI-related risk factors.
Establish red-teaming activities focused on AI-based systems, test against threats uncovered, and coordinate investments to address findings and augment private sector research.
Invest in developing and testing trustworthy AI-enabled systems, to ensure service branch members have trust and justified confidence in the systems.

Test and Evaluation Processes

DAF has a robust set of test and evaluation processes and procedures, used for decades to build sufficient levels of confidence in new aircraft, missiles, and other weapons and support systems, but they do not fully translate to nascent and immature software capabilities, especially the “black box,” self-learning, adaptive, data-centric nature of AI, the report says.

To ensure DAF can build justified confidence in new AI-enabled capabilities, the DAF and its test community need to develop and promulgate AI-specific test and evaluation policies and procedures as soon as possible. Additionally, DAF will have to continuously maintain AI-enabled systems even after they are made operational, integrate test and evaluation processes throughout the entire AI life cycle, and make other significant adjustments to its existing methods.

Though precise predictions are difficult given the accelerating rate at which AI technology is advancing, the report says that the five areas of AI that are likely to pose the most demands on test and evaluation are foundation models, informed machine learning, generative AI, trustworthy AI, and gaming AI for complex decision-making. It recommends that DAF focus its time and investments on these areas.

The study, undertaken by the Committee on Testing, Evaluating, and Assessing Artificial Intelligence-Enabled Systems under Operational Conditions for the Department of the Air Force, was sponsored by the U.S. Department of Defense.

The National Academies of Sciences, Engineering, and Medicine are private, nonprofit institutions that provide independent, objective analysis and advice to the nation to solve complex problems and inform public policy decisions related to science, engineering, and medicine. They operate under an 1863 congressional charter to the National Academy of Sciences, signed by President Lincoln.

Contact:
Joshua Blatt, Media Relations Officer
Office of News and Public Information
202-334-2138; e-mail news@nas.edu

Featured Publication

Test and Evaluation Challenges in Artificial Intelligence-Enabled Systems for the Department of the Air Force

Consensus Study Report

2023

The Department of the Air Force (DAF) is in the early stages of incorporating modern artificial intelligence (AI) technologies into its systems and operations. The integration of AI-enabled capabilities across the DAF will accelerate over the next few years. At the request of DAF Air and Space Forc...

View details