Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief (2025)

Chapter: Publishing in the Age of Open Science: Proceedings of a Workshop&#8212in Brief

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
NATIONAL ACADEMIES Sciences Engineering Medicine Proceedings of a Workshop—in Brief

Convened February 21–22, 2024

Publishing in the Age of Open Science
Proceedings of a Workshop—in Brief


On February 21–22, 2024, the Chemical Sciences Roundtable of the National Academies of Science, Engineering, and Medicine hosted a workshop, Publishing in the Age of Open Science, to discuss the various benefits and issues related to open access and FAIR (findable, accessible, interoperable, reusable) data practices for chemistry and chemical engineering research and publications.1 This Proceedings of a Workshop—in Brief summarizes the presentations and discussions that occurred at the workshop.2,3 It is structured to mirror the workshop’s structure with a keynote address, three sessions with panels of speakers, and a closing reflection session.

KEYNOTE ADDRESS

In the keynote address, Maryam Zaringhalam, the assistant director for public access and research policy at the White House Office of Science and Technology Policy (OSTP), described the U.S. federal government’s open science policies. She first provided the U.S. federal government’s official definition of open science: the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity. Advancing open science policies, she said, is critical to achieving many governmental goals, from curbing greenhouse gas emissions to ending cancer, while also driving equitable outcomes for all Americans, bolstering public trust in science, and strengthening the nation’s decision making.

The basic principle that underlies OSTP’s approach to public access, Zaringhalam said, is that research funded by the people should be freely and immediately available to the people. This principle was at the heart of a trans-formative 2013 OSTP public access memo, known as the Holdren memo,4 which instructed U.S. federal agencies with more than $100 million in annual research and development expenditures to develop plans to increase public access to publications and data. In 2022, OSTP issued an updated public access policy guidance in Dr. Alondra Nelson’s memo,5 which removed the 12-month publication embargo of the 2013 memo and strengthened the guidance on the sharing of data and other scholarly material.

__________________

1 Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

2 This Proceedings of a Workshop—in Brief is not intended to provide a comprehensive summary of information shared during the workshop. The information summarized here reflects the knowledge and opinions of individual workshop participants and should not be seen as a consensus of the workshop participants, the planning committee, or the National Academies of Sciences, Engineering, and Medicine.

3 Workshop recordings can be viewed here: https://youtube.com/playlist?list=PLi6VVotVxseDLCFFMb-UcAUeLnKNMOTta&si=oyF4vay1KzmHhfDv

4 https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf (accessed May 3, 2024).

5 https://www.whitehouse.gov/wp-content/uploads/2022/08/08-2022OSTP-Public-Access-Memo.pdf (accessed December 5, 2024)

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

Any decisions about the openness of data must take into account measures to protect privacy and security, Zaringhalam added, and OSTP’s provisions call for researchers to develop data management and sharing plans at the outset of their projects. In addition, the 2022 public access memo includes provisions for ensuring scientific and research integrity through transparency and accountability. Zaringhalam noted that the public should be able to identify which U.S. federal agencies support given investments in science, the scientists who conduct that research, and the extent to which peer review was conducted. The basic mechanism for facilitating transparency and accountability is the use of persistent identifiers (PIDs), which are standardized unique digital identification tags that can be associated with individuals, organizations, documents, software, data, and publications.

Coordination among U.S. federal agencies is critical for the success of delivering America’s research to the public, Zaringhalam added, and it is the task of the National Science and Technology Council’s Subcommittee on Open Science to ensure such coordination.

Zaringhalam then went into details on the implementation of the OSTP’s public access policy, such as its timeline and its areas of focus. In addition to developing open science policies, OSTP has identified four other priority areas: infrastructure development and enhancements, open science training and capacity development, community engagement for broadening participation in open science, and incentives for promoting open research practices. Important actions have been taken in each of these areas, she said.

For example, two key infrastructure investments have been the National Science Foundation’s (NSF’s) Findable, Accessible, Interoperable, and Reusable Open Science Research Coordination Networks (FAIROS–RCD) and the resources that the U.S. Department of Energy (DOE) provides related to PIDs, which are available at osti.gov. In the area of open science skill building, Zaringhalam said, NASA’s Transform to Open Science (TOPS) initiative and the National Institute for Standards and Technology’s (NIST) Research Data Framework are both important projects. To increase community engagement with open science, in the summer of 2023 OSTP hosted a series of four listening sessions on advancing a future of open science with early career researchers. To provide incentive for promoting open research practices, U.S. federal agencies have not only been providing funding for such activities but have been spotlighting stories of open science success, such as with the recent OSTP Year of Open Science Recognition Challenge.

Following her presentation, Zaringhalam answered a number of questions in a discussion session that was moderated by Robert E. Maleczka, Jr., a professor of chemistry at Michigan State University. She spoke, for instance, on ways to ensure equity among those who contribute to the chemical enterprise, with a specific focus on those at under-resourced institutions. She went into further detail on how quickly different types of research data must be released under the current guidance. She spoke about making it easier to find data. She said that, unlike with data, there is currently no general requirement that researchers provide copies of the software used in their work, although some agencies have moved in that direction. She also addressed the issue of how to guarantee the quality of data deposited in repositories.

PUBLISHING RESEARCH ARTICLES

The workshop’s first session, moderated by Leah McEwen, a chemistry librarian at Cornell University, had four speakers who addressed the general topic of publishing research articles. They offered perspectives from U.S. federal funding agencies like the U.S. Department of Energy (DOE), society publishers like the Royal Society of Chemistry publications division and the American Chemical Society’s Chemical Abstracts Services, and major academic research libraries represented by Cornell University.

Increasing Public Access to U.S. Department of Energy Research Results

The DOE’s Office of Science and Technical Information (OSTI) is tasked with fulfilling agency-wide responsibilities to collect, preserve, and disseminate the scientific and technical information emanating from DOE-funded research. Brian Hitson, director of OSTI, spoke about how the DOE’s approach to public access to research has evolved since the release of the Holdren memo in 2013.

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

He began with some background on DOE, including that it is the largest U.S. funder of research in the physical sciences, that it invests $15 billion a year in research and development, and that its research funding results in more journal articles than any agency other than the National Institutes of Health or the NSF. After the Holdren memo was released, OSTI established the Public Access Gateway for Energy and Science, or DOE PAGES, as the repository for DOE-funded journal articles. It now contains close to 200,000 articles and can be thought of as fulfilling a function similar to that of the National Library of Medicine, the Defense Technical Information Center, the National Agricultural Library, NASA’s Scientific and Technical Information (STI) program, and NSF’s Public Access Repository, which OSTI developed and operates.

After describing the development of PAGES, Hitson then detailed some of its effects. One major result is that 80 to 90 percent of the scientific articles published by researchers at the 17 U.S. national research laboratories are now being made freely available. This in turn has led inventors and small firms to cite these papers in their patent applications much more often—42 percent and 49 percent more often, respectively—than they did before the development of PAGES, an indication that making this research freely available is having a significant commercial and economic impact.

Hitson next described how OSTI is responding to the 2022 Nelson memo. It published its Public Access Plan in June 2023, which it developed in consultation with various agencies across DOE, the Subcommittee on Open Science, and external communities such as professional societies, scientific publishers, and libraries. One issue has been the move from a 12-month embargo on scientific articles, which was allowed by the 2013 memo, to the zero embargo specified in the 2022 memo, which has led to concern among publishers about losing income from subscriptions to their journals.6 One solution has been to allow the authors of manuscripts to pay reasonable open access fees from their research funding. Those fees will be monitored over time to make sure that they remain reasonable, although, Hitson said “reasonable” was not defined by the speaker.

After describing OSTI’s timeline for implementing the 2022 memo, Hitson finished by offering a vision of open science that OSTI is aiming for. The goal is ultimately to have open publications, open data, and open-source software all linked together seamlessly so that, for example, someone reading a journal article can click on a link to download the data collected in that study and then perhaps download a software program to analyze the data. The key to this working, he said, will be the development of the relevant PIDs, which is what will make it possible to sync resources effectively for researchers.

Efforts by the Royal Society of Chemistry

Emma Wilson, the director of publishing at the Royal Society of Chemistry (RSC) which is based in the United Kingdom, spoke about the RSC’s efforts, in partnership with researchers and library communities, to move toward open science. The RSC is committed to encouraging the move to open science, which, Wilson said, is about making every part of the research cycle, accessible to all, not just the research outputs. Because such a move involves changes in the way that science is conducted and communicated, openness also involves a change in research culture as well, and it is here that scientific societies can play a particularly important role in encouraging this transition. Wilson said scientific societies are well placed to work with a wide range of shareholders and to work across many different aspects of research culture and open science. In particular, the RSC is focused on research assessment and research culture, open access to publications and data, and peer review.

In Europe as a whole, she said, the shift to open access is currently in progress, but it is not yet evenly distributed. The movement has been driven both by open access mandates from funders operating in Europe and by librarians who have championed open access. The main questions are how the transition will be managed in a fair, equitable, and sustainable way and how quality standards will be upheld.

As the RSC moves to full open access, which it has committed to achieving by 2028, it has a number of current access options. Two examples are the flagship journal, Chemical Sciences, which is free to read and free to publish in, with costs covered by the society, and ChemRxiv, a preprint server for the chemical sciences that is jointly

__________________

6 An embargo might be described as a period when articles are behind a pay wall and can only be accessed through a subscription to the journal.

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

supported by ACS and RSC. Ultimately, she said, the goal is to have full open access without authors paying for it; the costs could be covered by libraries, governments, and funders.

Wilson closed by speaking briefly about open data. She emphasized the importance of the Findable, Accessible, Interoperable, and Reusable (FAIR) principles for data, and she said that a challenge in achieving open data will be transitioning from only supporting information being open to full data publication and citation. Currently the RSC strongly encourages authors to deposit the data underlying their published results in appropriate repositories.

Efforts by the American Chemical Society

Parallelling Wilson’s presentation, Sarah Tegen, the senior vice president and chief publishing officer for the American Chemical Society (ACS) Publications, described the efforts by the ACS to promote open access and open science.

She began by offering some context. The number of chemistry articles published annually has been growing by 5 percent per year, she said, with nearly all that growth happening outside of the United States and Europe. The growth is mostly in China, which now accounts for more than one-third of the papers published each year. Similarly, China has been increasing its support for basic research and is now spending about five times as much as it did in 2000, she said.

Turning to the question of research data sharing, Tegen said that all ACS journals encourage data sharing. Some of them, primarily organic chemistry journals, now require a data availability statement which will eventually be required for all journals. Finally, some of the journals require certain types of data to be shared publicly, such as crystallographic information files.

ACS recognizes that it has a role to play in helping to facilitate the deposition, storage, and retrieval of data in a way that also helps authors obtain credit for their work. To that end it is developing the Chemistry Databank as the site for researchers to deposit digital data relevant to chemistry, including developing tools for finding, accessing, and working with the data.

Concerning open access to publications, Tegen said that ACS Publications has been a supporter of such access for nearly two decades. It now has 17 fully open access journals as well as 70 hybrid journals, supported by transformational agreements or by article publishing fees. Use of open access in ACS journals has increased rapidly in the past 3 years, with a 38 percent increase in the number of open access authors, a 46 percent increase in open access articles, and a 64 percent increase in usage, she said. Currently more than a quarter of papers published by the ACS are open access. Concurrently there has been a rapid increase in read-and-publish agreements,7 with more than 1,000 having been signed in 35 countries.

Finally, Tegen described some of the other efforts ACS has undertaken to promote access to chemistry publications. For example, it pays to provide open access to ACS journals for researchers at primarily undergraduate institutions that are subscribers to ACS Publications. In conjunction with five other chemistry societies, it also funds ChemRxiv to publish preprints at no cost to authors or readers.

In closing, she said that scholarly societies play a unique and critical role within the publishing ecosystem, with goals that are different from those of commercial publishers. They reinvest in their community, help set norms for the discipline, educate the next generation, are committed to protecting the integrity of the literature, and educate scientists on good research practices, including the sharing of data and results.

Research Libraries

Elaine Westbrooks, the Carl A. Kroch University Librarian and vice provost at Cornell University, began by listing a number of issues facing academic research libraries that may prohibit successful participation in open science. The scholarly communications system continues to be inequitable, unaffordable, opaque, closed, and unsustainable, she said. She spoke briefly about some of the challenges facing universities in general, pointing in particular to compliance issues relating to various laws, policies and regulations, standards, governance, and

__________________

7 Read-and-publish agreements occur when a publisher and an institute have an agreement in which there are fees that cover the open access publishing costs in open access journals and there is also a fee for any paywalled content or subscription content. It is a way of moving and transitioning subscription spending that includes open access services as well.

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

transparency. On the other hand, she said, faculty members and administrators seem to be more aware of the issues facing scholarly communications, even if they are not sure how to solve them.

Of these issues, she continued, the most concerning are the budgetary pressures. Budgets are flat, while the costs for scientific journals continue to rise and now account for about 85 percent of her budget, up from 50 percent in the not-so-distant past.

Looking to the future, she said academic research librarians are guided by eight values as they build their collections: access, social responsibility, public good, affordability, openness, transparency, sustainability, and research integrity. In response to these values and the increasing fiscal pressures, academic research librarians are focusing more on spreading efforts among many libraries through such programs as interlibrary loans, collective collections, and controlled digital lending.

Turning to open science, Westbrooks said that this topic is very important to libraries and librarians. He went on to discuss that Cornell is committed to supporting it in every way, growing efforts in open scholarship and open science.8 Cornell also advocates for relevant policy changes both internally and externally and supports authors’ rights, particularly by educating authors on how to make good choices. Cornell also takes part in the Higher Education Leadership Initiative for Open Scholarship (HELIOS), which is a group of colleges and universities working to promote a more transparent, inclusive, and trustworthy research system.9

Discussion

In the discussion session following the presentations, the first question concerned how to guarantee equity10 going forward across research sectors, different-sized institutions, different areas of the world, and further afield. It will take a great deal of work to make that happen, Wilson said, but efforts are already ongoing.

Speaking from the perspective of scientific publishers, Jake Yeston, an editor at Science Magazine, emphasized that in the face of open access, the current funding model for scientific publications is not sustainable and said that a fundamental question is how the enterprise can be healthily kept going forward. Stephen Burley, director of the Protein Data Bank, said that people need to think deeply about whose responsibility it is to pay for preserving data and making results available. Some feel it should be the funders’ responsibility, but there does not seem to be a clear consensus among the funders as to how best to proceed. That led to further discussion on funding models.

Mark Jones, an audience member, retired Dow employee, and former CSR member, asked what an open publishing system might look like if it were to be designed from scratch. Westbrooks said she would like to see a system radically different from the one now and added that the current system has too many incentives to not publish negative results. Brian Hitson Director of the Office of Science and Technical Information (OSTI), of the DOE said that a greater emphasis on preprints and grey literature (i.e. technical reports) might help get more negative results reported.

SUSTAINING DATA REPOSITORIES

The workshop’s second session, moderated by Jake Yeston, examined what is necessary to sustain scientific data repositories. Its four speakers came from NIST, the NSF Center for Computer-Assisted Synthesis, the RCSB Protein Data Bank, and nanoHUB, thus offering different but complementary perspectives on the topic.

FAIR Data Repositories

Robert Hanisch, the director of NIST’s Office of Data and Informatics in the Material Measurement Laboratory, spoke about FAIR data repositories. Noting that other speakers had explained the background of FAIR principles, he spoke about unites as another data characteristic that often “flies below the radar”. Without standards for the proper annotation of units, he said, it will not be possible to have machines that can compare and combine different datasets.

One of the keys to FAIR data repositories, he said, is data that are “born FAIR.” Scientists should not have to bear the burden of annotating their data by hand. Instead,

__________________

8 Open scholarship (sometimes called “open science” or “open research”) is an all-encompassing term with regard to widely sharing scholarly work. Definitions obtained from https://www.heliosopen.org/about (accessed December 20, 2024).

9 https://www.heliosopen.org/about (accessed May 21, 2024).

10 The term equity was defined in this session as access to data, papers, and publications, but also in the ability of individuals to contribute to the chemical sciences, particularly those at under-resourced institutions.

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

they could use laboratory information management systems or electronic laboratory notebooks that can extract data automatically from a machine or from a computer simulation. Similarly, it is important to have data models, metadata standards, and open, non-proprietary data formats for research data. Finally, FAIR digital objects are an emerging technology in which digital information is wrapped in metadata with appropriate PIDs so that the objects become machine-actionable pieces of information, making data as widely accessible and reusable as possible.

Hanisch also suggested the Units of Measurement Interoperability Service, a toolkit that allows researchers to encode the units with their data using an established metadata-encoding scheme, and NIST’s Research Data Framework (RDaF), which he described as “a tool to help people assess their capabilities and to improve and to decide where the most important areas are to invest their resources.”

Switching to the topic of funding data repositories, Hanish said that in 2022, U.S. federal agencies in the United States supported $54 billion in university-based research, of which he estimated that about 10 percent goes into assuring that the research results are published by supporting research publication costs, article processing charges, subscriptions, and so forth. Data are an essential component of that research record and quality data are also important for artificial intelligence (AI). Thus, he argued, funders ought to be compelled to set aside long-term support for data repositories. “I see it as nothing short of a moral obligation.” It is important to ensure not only that the interpretation of the data is recorded in papers, but that data and tools used to reach those conclusions also made public and are preserved for posterity.

He concluded by suggesting that 2–3 percent of the U.S. federal research budget could be set aside for establishing public data repositories potentially providing a significant return on the investment. In particular, he suggested a network of interoperable, domain-specific research repositories. “When you have data that is well-curated, well-annotated, well-characterized with metadata, it will be reused and it will be recombined in ways that were not imagined by the people who took the data in the first place,” he said.

Data Challenges in the Age of Artificial Intelligence

Olaf Wiest, the Grace-Rupley Professor of Chemistry and Biochemistry at the University of Notre Dame and the director of the NSF Center for Computer-Assisted Synthesis (C-CAS), spoke about the challenges of collecting large amounts of data in forms that are amenable to AI and other uses, with a particular focus on chemistry data.

He began by describing how C-CAS has been addressing its own data issues. Concerning sustainability, he pointed to the example of the PDB, or Protein Data Bank, and said that the key was developing funding worthwhile models that demonstrate value and need. “If you can demonstrate value,” he said, “people will figure out how to fund it.”

A second data-related issue in the context of AI is data quality, Wiest said. He offered four features that characterize high-quality data: It is trustworthy and transparent, curated, complete and consistent, and has known uncertainties.

Concerning sources of data, Wiest said that C-CAS looks especially to pre-competitive data, that is, data that are of value to many stakeholders but not of the sort of value that makes it useful as intellectual property. It is important to make sure that people understand the value of depositing and using such data and to make depositing the data as easy as possible, perhaps by automating the process where possible.

Finally, it is important to get the “right data,” he said, which involves figuring out which data will be valuable to users and not only including the data that are the easiest to collect.

Another challenge in assembling useful datasets is dealing with what Wiest called “ghosts.”11 Some data of interest are “real, explicit” data, but other data are implicit or even inferred, and these can be difficult to recognize. A second challenge is what he called “trolls,” which he characterized as “ugly, slow-witted, rarely helpful, and even dangerous to humans.” “That is about as good as a definition that I can think of for an elec-

__________________

11 Ghosts were not fully described but loosely defined as datasets that have an invisible present, sometimes translucent and sometimes real. In a sense, data that is implicit or inferred.

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

tronic lab notebook,” he said, since many of them are nonfunctional (e.g. discontinued or proprietary), and the data they contain may be inconsistent, incomplete, contradictory, or be in different languages, units, or formats. Still, these notebooks contain a lot of useful data, so it is important to find ways to collect that data efficiently.

He then described how C-CAS is dealing with some of these challenges. Its vision, he said, is to transform how chemists discover, optimize, interrogate, and apply new reactions to the synthesis of functional molecules through “data chemistry.” He defined data chemistry as using data streams combined with the representation of molecules and algorithms to do chemistry in a new way—in particular, to do optimizations as well as synthesis and reaction predictions in a novel and much more efficient way. C-CAS is committed to releasing all its resources—its data, its representations, its algorithms and source code—freely. Nothing is proprietary.

An important related project is the Open Reaction Database (ORD), which makes reaction data freely and publicly available for anyone to use. The ORD provides a structured data format for those chemical reaction data, and it provides an interface for easy browsing and downloading of data. There are approximately 3.5 million data points in the database now and the system can accept many types of digitized information, such as Ph.D. dissertations, with tools to identify inferred/implicit data and make them explicit, checking for consistency.

This, Wiest concluded, is where the future is going—using modern technology to deal with various data challenges.

Protein Data Bank

Stephen K. Burley, the director of the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), spoke about the PDB’s history and value, focusing on its use in responding to the coronavirus disease 2019 (COVID-19) pandemic as an illustrative example.

The PBD was founded in 1971 as the first open access digital data resource in biology that has served as a research data depository with content deposited by the research community. At the beginning it held just seven x-ray crystallographic structures of proteins. Since that time, it has grown to become a global resource for experimentally determined atomic-level three-dimensional biostructure information, with open access provided for more than 215,000 structures of proteins, nucleic acids, viruses, and macromolecules.

PDB data have proved to be essential for responding to emerging viruses, Burley said, mentioning specifically the severe acute respiratory syndrome coronavirus (SARS-CoV) epidemic of 2002, the Middle Eastern respiratory syndrome (MERS-CoV) epidemic of 2012, and the COVID-19 pandemic. Open access to the structures of the SARS-CoV, MERS-CoV, and SARS-CoV-2 viruses that were held in the PBD—and in particular, the structures of the spike proteins in these viruses—were crucial to the design of effective mRNA vaccines targeted against each of the viruses. The COVID-19 vaccines designed with information from the PDB have been credited with saving tens of millions of lives, he said.

Turning to structure-based drug discovery, Burley said that the PDB currently houses more than 750 crystalline structures of the SARS-CoV-2 main protease, Mpro, which is the Achilles’ heel of the virus. Mpro is thought to be a target for stopping an infection and it is target of Pfizer’s highly effective drug, Paxlovid, which is a fixed-dose combination of nirmatrelvir, the active ingredient, and ritonavir. Paxlovid received emergency use authorization from the U.S. Food and Drug Administration in December 2021, he said, less than 2 years after public release of the viral genome sequence—an unprecedentedly short time.

Burley concluded with a sobering postscript. Given the wide effectiveness of Paxlovid against coronaviruses, with the right incentives Pfizer could have discovered and produced Paxlovid in the wake of the earlier SARS-CoV and MERS-CoV outbreaks, recognizing that there would likely be future coronavirus outbreaks. An investment in the wake of MERS-CoV might have provided an effective treatment that could have potentially saved many lives worldwide during the COVID-19 pandemic before vaccines were developed.

nanoHUB

Alejandro Strachan, the co-director of nanoHUB and Reilly Professor of Materials Engineering at Purdue University, described what nanoHUB is and how it helps

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

advance open science. It is an open online platform that offers free data and simulation tools to users. “What we seek to do,” Strachan said, “is connect research-grade software and data infrastructure with their end users, who are domain experts but not computational experts.” nanoHUB, is available as an app that can be accessed via a web browser. “With a few clicks we have undergrad students running molecular dynamics and learning about materials science, without worrying about the computational intricacies of running the simulation and providing hardware,” he said. Strachan went on to discuss how nanoHUB promotes FAIR data, tools, and workflows. To do this, nanoHUB developers write the tools and workflows in Jupyter and formally declare inputs and outputs in the Jupyter workflow that align to the FAIR data principles. These workflows are published on nanoHUB, where one can query for workflows that will carry out a particular function or have certain inputs and outputs.

FAIR data and workflows accelerate innovation, Strachan said, describing one situation where undergraduate students collected oxidation data from the literature and used a simulation tool which automatically analyzed the data against 42 possible models for oxidation, ranking the best models for describing the data. The data resources at nanoHUB are also valuable for education and are used in core classes at Purdue University to teach fundamental materials science.

He closed by noting that nanoHUB is working to make it easier for researchers and others to find journal papers and other items through use of large language models.

Discussion

Jake Yeston, of Science magazine, began by asking the panelists what can be done to improve data science education and, more generally, motivate departments and students to get more serious about teaching people in various scientific disciplines to be data scientists in addition to their primary area. Strachan answered that the primary goal be is to teach research scientists to become expert users of modern tools rather than “mini data scientists.” Wiest said that educational modules can easily be developed to give students the skills they need to work effectively with data in various ways. Hanisch said that the skills associated with data science, such as understanding statistics and understanding uncertainty characterization, are very important, and if those skills are not being taught along with the skills in the basic sciences, “then we are not doing our duty as educators.”

A participant asked about the private-sector database providers who might be driven out of business once all U.S. federally sponsored research is made freely available. Another question concerned developing standards for metadata. There was a brief discussion about the handling of smaller datasets, which can be important in training AI models. Another short discussion centered on the challenges of reproducibility. Some participants focused on efforts to help policy makers and other U.S. government officials understand the urgency for open data. Several participants said it is vital that the community speak out on this topic in order to support science, possibly benefitted by great communicators who can inspire people with a compelling vision for the future.

SUPPORTING THE RESEARCH COMMUNITY

The workshop’s third session, moderated by Leah McEwen, a chemistry librarian at Cornell University, had four speakers examining different ways to support the research community through the collection, organizing, storing, and sharing of research articles, data, and related information.

Lessons from the Sci-Hub Story

Luis Sanchez, an associate professor at Niagara University, illustrated some of the issues related to piracy of scientific papers by discussing Sci-Hub, a website that provides free access to a large collection of research papers, including many that it does not have permission to provide.

As context, Sanchez said that the so-called “guerilla open access” is part of the reality of open access; most scientific articles are available to the public for free, albeit through unauthorized means. This is not a secret, he continued, simple internet searches describe how to access the materials. Even though using platforms that infringe on copyright or access restrictions has various legal and ethical implications, he said these platforms are widely used anyway, and over the years there have been many different alternative websites that have facilitated this access.

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

According to one 2022 study, Sanchez said, more than 50 percent of academics admit using piracy websites to bypass paywalls for research they want to access.12 According to the Sci-Hub website, it had 126,000 users in a single hour on the day prior to the workshop.

This use of pirated material leads to various legal issues for Sci-Hub. It was taken offline briefly by an injunction, for instance, and its founder was ordered to pay $15 million in damages to the scientific publisher Elsevier, but since it operates outside traditional legal jurisdictions, it is difficult to enforce such judgments. Furthermore, he noted, it has supporters, such as the Electronic Frontier Foundation.

While many people use Sci-Hub because their institutions do not have access to the desired journals, there are many others, particularly in the United States, who are at institutions that do have such access, Sanchez said, and they are using Sci-Hub mainly for the convenience of a user-friendly interface.

Although Sci-Hub, which hosts about 90 million scientific articles, is probably the best-known piracy site, there are many others, including Anna’s Archive, which has about 100 million articles. Some people use the hashtag #iCanHazPDF on social media sites to request a particular article, and they are typically successful, he said.

Many people see this not only as acceptable, but morally correct, Sanchez said. For instance, the Guerilla Open Access Manifesto from 2008 argues that students, librarians, and scientists who have the privilege of having access to knowledge also have a duty to share it with the world. Even people not familiar with the manifesto may come to the same conclusion by themselves, Sanchez observed, especially as it becomes easier to store, organize, and share large amounts of data.

A Perspective from Industry

Danielle Schultz, Director of Discovery Process Chemistry at Merck, offered a perspective on open access and FAIR data practices from the point of view of industry, and observed that despite the common perception, overall, the pharmaceutical industry does publish papers on its research. Indeed, Merck sees it as an obligation and responsibility to share as much of its data from drug discovery as possible, she said.

After a brief overview of the drug discovery process—from drug discovery to clinical trials and then regulatory approval and launch—Schultz said that there are certain points along the way where it makes sense for industry to publish. Papers may get published on novel drug design and discovery, including synthesis, on problems that are commonly encountered in drug discovery, and on in vivo and in vitro studies. In the process chemistry space, companies will often publish about green chemistry and manufacturing processes, novel synthetic methods, and safety and engineering controls.

Schultz offered four main reasons for Merck to publish: It helps advance and influence science, publishing gives Merck the freedom to operate and use its processes anywhere in the world, it promotes the company and its employees and thus helps in attracting top talent, and it serves as a stimulus for scientific collaborations.

On the other hand, there are a number of deterrents to publishing. First, publishing is not the company’s top priority, so it is generally something employees do in their free time. Furthermore, intellectual property issues can make publishing challenging. The end result is that only about 20 percent of the science in patents actually ends up in a journal and only about 5 percent of the data generated by pharmaceutical companies is ever published, she said.

Over time, she continued, publications by pharmaceutical companies have been trending downward, with a published analysis finding that total number of medical chemistry articles published in seven leading journals over the past 20 years declined by about 25 percent.13 Focusing just on Merck, she said that the number of the company’s publications in discovery, biology, pharmacology, and translational medicine is in decline, although the number of publications in the process chemistry space has remained mostly steady. About 40 percent of the discovery papers were in open access journals, while about 21 percent of the process chemistry papers were. She mentioned that Merck has, however, some highly

__________________

12 F. Segado-Boj, J. Martin-Quevado, J.-J. Prieto-Guitierrez, “Jumping over the paywall: Strategies and motivations for scholarly piracy and other alternatives”. arXiv:2212.05965 [cs.DL] accessed (December 20, 2024).

13 R.J.D. Hatley et al., “Writing Your Next Medicinal Chemistry Article: Journal Bibliometrics and Guiding Principles for Industrial Authors” J. Med. Chem. 2020, 63, 14336

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

cited publications that illustrate the impact industry can have when it makes its research more accessible.

Looking to the future, Schultz said that open access in this area could accelerate drug discovery and improvements in human health, but publishing charges could limit pharmaceutical companies’ willingness to publish results in these journals.

Merck engages in various collaborations, both with other pharmaceutical companies and academic institutions, which inevitably involve the sharing of data and techniques. This is sensible as synthetic chemistry is complex, she said, but sharing will require chemists from industry and academia to communicate and collaborate better, understanding each other’s perspectives.

Translating Policies in Chemistry

Ye Li, the librarian for Chemistry, Chemical Engineering, Materials Science, and Engineering at the Massachusetts Institute of Technology (MIT), described efforts at MIT—and in research libraries more generally—to move to open access and FAIR data practices in chemistry. Reaching these goals will require actions in three areas, she said: the development of designated and reliable funds; the construction of an interoperable, adaptable, and automated infrastructure; and the establishment of guidance for good practices that are specific and actionable.

To illustrate, Li offered some brief examples of these efforts. MIT has put into place a faculty-driven open access policy that supports MIT researchers and students to let outside agencies know that their institution requires them to ensure open access to their work. The MIT library is supporting MIT researchers with open access publishing funds and agreements and is investing in open infrastructure and initiatives. Chemistry researchers may still face challenges, Li said, because the provided funds may not be sufficient to cover the full cost of publications in chemistry, which are particularly expensive compared with other fields, and the library’s agreements may not cover all the large publishers in the chemistry domain.

MIT has also developed a framework for publisher contracts, which covers the protection of authors’ rights, value-added services such as computational access and use, and transparent and cost-based pricing models, for example. Of particular importance to chemists, Li said, is what the framework has to say about equitable access, which deals not just with reading access for humans, but also with the ability of computers to access and use the text and data in publications—that is, to conduct text and data mining, enabling machine learning and AI studies.

A prize for Open Data was established by MIT; however, Li noted that extrinsic incentives such as funds and prizes may not be sufficient for chemists to justify the extra time and effort required for curating their data to make it more findable, accessible, interpretable, and reusable. However, she said it could help if the extra effort to curate the data was rewarded by accelerating researchers’ discovery efforts. One way to do that would be to use Machine Learning/Artificial Intelligence (ML/AI) models to turn the data into automated design with what is called “automated research workflow.”

This is already happening in chemistry, Li said, as an MIT group used a robotic system controlled by an ML/AI model to design dye molecules, measure the properties of the molecules, and use those data as feedback to direct the refinement of the molecules.14 But further progress will likely involve more quality experimental data, either curated or generated from FAIR data practices, which in turn will involve greater understanding on the part of researchers concerning the data practices that can make this all possible. Librarians have a role to play here, Li said, in providing guidance and support to researchers in the field.

She closed by describing the community that could be built to make these data practices the standard. Professional societies, standards organizations, foundations, certification organizations and others in a community of practices could engage in a communal efforts to train researchers, data curators, and others in what they need to know to make open science a reality.

Research Data Management and Sharing Support for Chemistry Researchers

Shannon Farrell, the research data services lead and director of the data repository at the University of Min-

__________________

14 B.A. Koscher et al. “Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back”. Science 382, eadi1407(2023). DOI:10.1126/science.adi1407 (accessed September 20, 2024).

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

nesota (UM), spoke about how her university supports chemistry researchers on data issues as a potential example for other institutions.

UM’s Research Data Services provides campus-wide education and consultation about data management and data sharing and also runs the university’s data repository. It works with various campus partners, such as the Liberal Arts Technology and Innovation Services, and it takes part in various national and international conversations concerning data management and data sharing, including working with U.S. federal agencies.

Farrell went into some detail on the types of data management instruction that her team offers, which includes large-group instruction (webinars, panels, seminars), small-group instruction (labs, research teams, one-on-one), and course-integrated instruction. The course-integrated instruction generally involves teaching general data management concepts or focusing on particular aspects of data management and sharing, such as data on human subjects. They also offer yearly data management bootcamps for graduate students focused on data management basics, such as file and folder organization, risk management, storage and backup, and how to provide good documentation, as well as various current needs, such as information on U.S. federal data sharing mandates and data publishing.

Next Farrell spoke about UM’s data repository. It is free, open access, and available for depositing data to any University of Minnesota affiliate as long as the data fit within the repository guidelines (e.g. limited to smaller sized datasets and policies on private and sensitive data). The university repository is often a last resort since it urges researchers to put their data into a disciplinary repository if one exists.

UM is also the financial “house” for the Data Curation Network, which is a network of 19 institutions and more than 50 data curators.15 It connects data specialists to knowledge that allows them to support researchers, Farrell explained, and she mentioned its database of primers as an example of what the network offered. The primers, which are all public, help curators understand the common types of data found within different disciplines by describing common file formats, known repositories, recommended open formats, how to convert files, and other topics.

Another of the network’s products is the CURATE(D) model of data curation. CURATE(D) stands for check, understand, request, augment, transform, evaluate, and document, and Farrell said it is the model her team at UM follows when curating datasets.

Farrell closed by offering some lessons learned and identifying future challenges. Among the lessons were that it is possible for researchers to share and disseminate their work or data openly and that it gets easier once a workflow is in place and it becomes habitual. Among the challenges is dealing with big data—data that are 100 of gigabytes or even terabytes in size—which is not something the UM repository is equipped to do. There are also concerns about long-term storage costs and whether the repository can store data with restricted access. Finally, Farrell said there are concerns about future growth as more datasets are deposited in the repository and more researchers ask for consultations regarding curating their datasets.

REFLECTIONS AND LOOKING FORWARD

To close the workshop, two of the workshop’s organizers, Leah McEwen and Jake Yeston, reflected on what had been said and offered suggestions going forward. Yeston said the most important thing will be for the people in the field to keep communicating with each other, sharing ideas, and working together to figure out how to move forward. McEwen identified two topics that may benefit from additional discussion: Industry–academic partnerships and greater emphasis on digital skills development so that researchers are more familiar and comfortable with data science. She also said that the Chemical Sciences Roundtable16 will be focusing on artificial intelligence in a future workshop series, which could build on the discussions of open scholarship and data access from this workshop.

__________________

15 https://datacurationnetwork.org/ (accessed September 20, 2024).

16 https://www.nationalacademies.org/our-work/chemical-sciences-roundtable (accessed September 20, 2024).

Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.

DISCLAIMER This Proceedings of a Workshop—in Brief has been prepared by Robert Pool as a factual summary of what occurred at the meeting. The statements made are those of the rapporteurs or individual workshop participants and do not necessarily represent the views of all workshop participants; the planning committee; or the National Academies of Sciences, Engineering, and Medicine.

*The National Academies of Sciences, Engineering, and Medicine’s planning committees are solely responsible for organizing the workshop, identifying topics, and choosing speakers. The responsibility for the published Proceedings of a Workshop—in Brief rests with the institution.

PLANNING COMMITTEE Brian D. Crawford, Brian Crawford and Associates, LLC; Michael Forster, retired; Robert E. Maleczka Jr., Michigan State University; Leah R. McEwen, Cornell University; Fatima Mohammad Ahmad Mustafa, The University of Texas, San Antonio; Jake Yeston, Science Magazine, American Association for the Advancement of Science.

REVIEWERS To ensure that it meets institutional standards for quality and objectivity, this Proceedings of a Workshop—in Brief was reviewed by Leah McEwen, Cornell University; Mark Jones, MJPhD, LLC.

SPONSORS This workshop was supported by the Department of Energy and National Science Foundation.

STAFF Linda Nhon, CSR Director; Michael Janicke, Senior Program Officer; Darlene Gros, Senior Program Assistant; Kayanna Wymbs, Research Assistant.

We also thank staff member Brittany Segundo for reading and providing helpful comments on this manuscript.

For additional information regarding the workshop, visit https://www.nationalacademies.org/event/40380_10-2023_future-implications-of-open-access-fair-data-practices-on-chemistry-and-chemical-engineering-publications-aworkshop (accessed August 8, 2024)

Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/27876.

Division on Earth and Life Studies

Copyright 2025 by the National Academy of Sciences. All rights reserved.

NATIONAL ACADEMIES Sciences Engineering Medicine The National Academies provide independent, trustworthy advice that advances solutions to society’s most complex challenges.
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 1
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 2
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 3
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 4
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 5
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 6
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 7
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 8
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 9
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 10
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 11
Suggested Citation: "Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Publishing in the Age of Open Science: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27876.
Page 12
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.