Dropdown items
My Academies

Personal Library

Account settings

Automated Research Workflows for Accelerated Discovery: Closing the Knowledge Discovery Loop (2022)

Chapter: 6 Conclusion

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter Next chapter
Page of 118
Search this publication

Previous Chapter: 5 Overcoming Barriers to Wider Use of Automated Research Workflows

Page 89 Cite Bookmark

Suggested Citation: "6 Conclusion." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows for Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.

6

Conclusion

The tools and techniques being developed under the large umbrella of automated research workflows (ARWs) promise to collapse the centuries-old serial method of research investigation into processes where thousands or even millions of simulations or experiments are iterated rapidly in closed loops, with the analysis of data and even the design of experiments or controlled observations being assisted by machine learning (ML) or optimization techniques. Simultaneously, ARWs provide a way to satisfy pressing demands across fields to increase interoperability, reproducibility, replicability, and trustworthiness by better tracking results, recording data, establishing provenance, and creating more consistent metadata than even the most dedicated researchers can provide themselves. The committee’s exploration of ARWs illustrates that the research enterprise stands at an important inflection point. The scientific revolution of the 17th century ushered in an unprecedented era of human progress, leading directly to discoveries and innovations that have transformed tasks requiring the application of muscle or simple technologies into services performed by ever more effective machines. The research enterprise will need to develop new approaches and tools as it enters an era in which core elements of knowledge discovery itself can be automated and accelerated.

In important ways, this emerging process of innovation and adaptation represents a continuation of the long-standing trend of computational power being harnessed to perform a variety of research tasks. Yet new twists will need to be considered and addressed. Concerns about privacy, ethics, and trust arising in many domains of human activity become even more relevant to the entire research enterprise as we increase use of artificial intelligence (AI)-based technologies.

As illustrated in the use cases examined in Chapter 3, different disciplines of research have very different usage patterns relative to ARWs—in terms of specific

Page 90 Cite Bookmark

tools and platforms and, more generally, propensity to incorporate workflows into their processes in the first place. Costs for equipment, software, staffing, and training may vary by discipline, but the broad need for domain researchers to incorporate new methods and approaches holds across the use cases. In addition, additional specialized expertise in areas such as software engineering, algorithm development, and data science will be required in a number of fields.

Further, several lines of thought that emerged from the March 2020 workshop are germane not just to the task at hand, but more broadly across the scientific enterprise. These themes include the need to break down academic silos, provide incentives for greater collaboration among researchers, ensure greater interoperability across technologies, foster sharing of a broader range of research outputs, and address issues such as striking an appropriate balance between access to and protection of data.

The committee’s findings and recommendations point to promising areas of focus for the research enterprise in facilitating the effective implementation of ARWs. The use cases and supporting literature described in Chapter 3 support all of the recommendations, with Findings A and B and Recommendation 1 in particular flowing directly from examples drawn from a variety of domains. Finding C and Recommendations 2, 3, and 4 are supported in Chapters 4 and 5, which draw on presentations from the March 2020 workshop and other cited literature. Finding C and Recommendation 5 are also supported mainly in Chapter 5, again, with points drawn from the use cases.

FINDINGS AND RECOMMENDATIONS

Finding A: Accelerating Discovery

In many disciplines, the emergence of automated research workflows (ARWs), built upon contemporary cyberinfrastructure, is demonstrating the potential to vastly increase the speed and efficiency of a range of research activities. These include designing and conducting experiments, analyzing data, and observing natural phenomena. These improvements can be realized at scale by implementing infrastructure and practices that facilitate the application of artificial intelligence and machine learning and related technologies to research. Realizing the potential of ARWs could accelerate the pace of scientific discovery by orders of magnitude and thereby expand the research enterprise’s contribution to society.

Finding B: Additional Benefits

In addition to increasing the speed and efficiency of research, the effective development and implementation of the technical and human infrastructure for automated research workflows (ARWs) will contribute to strengthening the research process in other ways. For example, the greater transparency and

Page 91 Cite Bookmark

repeatability made possible by automating and capturing specific steps in the research process—advances that underlie the development of ARWs—can foster reproducibility, replicability, and responsibility in research. Adoption of common and interoperable tools and platforms—which could be accelerated by the advance of ARWs but depends on other developments as well—can facilitate international and interdisciplinary research collaboration. Broader access to research workflows and results and the enhanced ability to uncover and correct errors can contribute to greater confidence in research findings and the research enterprise and reduce redundancy among research efforts. To be sure, issues such as dealing with large amounts of streaming data and complex computational approaches will continue to pose technical challenges to the design and implementation of ARWs. In addition, incorporating emerging principles and guidelines for responsible artificial intelligence and machine learning advocated by various organizations, such as building in human review of algorithms, uncovering and addressing bias, and supporting transparency and reproducibility, will also help to secure the benefits of ARWs.

RECOMMENDATION 1: Design Principles

Organizations that fund, perform, and disseminate research, along with scientific societies, should support and enable automated research workflows (ARWs) that embody the following design principles:

ARWs and the systems, tools, and platforms that comprise them should facilitate openness, reproducibility, and transparency.
ARWs should facilitate the effective use of artificial intelligence (AI) and machine learning (ML) as research tools and incorporate principles of responsible AI and ML to mitigate the risks from various human and technological deficiencies, such as confirmation and sampling biases, inappropriate application of statistics, and challenges to interpretability of results and quantification of confidence and uncertainties when drawing inferences from ML analyses.
The associated research objects (data, code, even entire workflows) for ARWs should be FAIR (findable, accessible, interoperable, and reusable), not only by humans but also by machines, to facilitate automated reuse and collaboration.
ARWs should prioritize reuse and sustainability of existing tools and systems when possible and appropriate, reducing costly duplication efforts and facilitating the extension of capabilities through integration or federation of systems, and agreement on standards. Designs should allow for specialization into specific domains, but avoid unnecessary rebuilding.
While proprietary services and components can enhance the utility of ARWs, key ARW infrastructure should be controlled by and be accessible to the research community itself, with the community developing standards and practices to facilitate this.

Page 92 Cite Bookmark

Finding C: Research Enterprise

Realizing the potential of automated research workflows (ARWs) will require modification of the research enterprise, including sustainable funding for the necessary hardware, software, and human resources, educating the scientific workforce, reporting and sharing research results, and structuring researcher rewards and incentives. Multidisciplinary, multirole collaboration is essential to realize the potential of ARWs.

RECOMMENDATION 2: Infrastructure, Code, and Data Sustainability

Research funders, working with other stakeholders such as societies, research institutions, and publishers, should place greater priority on approaches to ensuring the creation and sustainability of key systems, tools, platforms, and data archives for automated research workflows (ARWs). Priorities include

Funding support for efforts by research institutions and societies to link disciplines so they can share and benefit from the expertise in statistics, machine learning, or data science, and engineering and computer science that is required to build and maintain sustainable infrastructure for ARWs.
Funders and research communities structuring funding for cyberinfrastructure projects such as large scientific instruments so as to maximize the potential for innovation in ARWs and the reuse of data and other outputs.
Funders and research institutions supporting open data standards and open interfaces for scientific instruments.
Funders and research institutions enabling reuse, reproducibility, and long-term sharing of FAIR data and software resources through support of repositories that make archival and updated versions of these resources available within and across disciplines, and providing approaches to sustain those repositories.
Publishers updating their data-sharing requirements by directly associating articles to data in FAIR repositories.

RECOMMENDATION 3: Human Resources

Research funders, higher education, research institutions, and scientific and professional societies should support the development and implementation of educational programs and career pathways aimed at building the workforce needed to develop and utilize automated research workflows (ARWs), including the creation of career tracks that support ARW capabilities. Examples of what is needed include

Programs that foster integration of domain expertise with data science and software engineering skills.
Programs that inculcate data literacy and computational analytical skills in all areas of research.

Page 93 Cite Bookmark

Developing the human resources needed to build, maintain, and operate ARW hardware and software, including hardware and software engineers who build, maintain, and operate automated laboratories and the software needed to learn from data and to design experiments.
Fostering collaborative research that aims at developing and using ARWs and that facilitates sharing workflows, code, data, and data products in ways that respect and protect privacy considerations.

RECOMMENDATION 4: Culture and Incentives

Research funders, research institutions, and disciplines should work to create an automated research workflow (ARW)-friendly culture by making changes in incentive and reward structures aimed at encouraging behaviors that are central to realizing the potential of ARWs. These include

Encouraging team science and multidisciplinary teams.
Using funding support and provisions for data management plans to encourage development and curation of FAIR, responsible, and good-quality data resources.
Developing, improving, and sharing software resources.
Reporting reproducible results.
Helping others adopt ARW practices.
Pursuing international collaboration when possible in order to accelerate progress toward implementing the above changes at scale.

Finding D: Legal and Policy Issues

In addition to barriers to progress that exist within the research process itself, there are legal and policy issues that affect implementation of automated research workflows in specific domains that will require international multistakeholder efforts to address.

RECOMMENDATION 5: Preserving Privacy

Research enterprise funders, performers, publishers, and beneficiaries should work with governments, data privacy experts, and other entities to address the legal, policy, and associated technical barriers to implementing automated research workflows in use-inspired applications in specific domains and explore solutions to make the outputs available through privacy-preserving algorithms, federated learning approaches to using data, and other methods.