Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series (2024)

Chapter: 2 Data Governance Principles for Life Science Research Across the Globe

Previous Chapter: 1 The Life Science Research Landscape in Central Asia
Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

2
Data Governance Principles for Life Science Research Across the Globe

Lusine Poghosyan, Columbia University (United States), and Trisha Tucholski, U.S. National Academies of Sciences, Engineering, and Medicine, introduced the second workshop, which was held on May 11, 2023, and focused on data governance principles for life science research across the globe. Tucholski noted that the rapid increase in life science data presents tremendous opportunities for research insights; these data represent significant financial investments and are sometimes sensitive. She stated that global research collaborations seek to make the best use of life science data to advance science and public health, yet there are challenges in balancing the need for data access and sharing with the imperative to ensure responsible and reliable data protections for those data and the people who contribute them.

To explore these issues in a governance context, participants in the second workshop highlighted global data governance policies, practices, and frameworks; examined international norms and emerging trends in life science data governance; and considered similarities, differences, challenges, and opportunities in existing data governance frameworks.

DATA GOVERNANCE PRINCIPLES FOR LIFE SCIENCE

Yann Joly, McGill University (Canada), introduced global data governance resources and frameworks to set the stage for the workshop discussions. Data governance includes agreed-upon data sharing and interoperability standards that promote open science and global collaboration, accelerate health research, and improve health care, he said. Joly also noted that such governance requires an infrastructure of national and international laws, regulations, and standards that address the ethical, legal, and social issues raised by data collection, such as consent and confidentiality, while ensuring data access.

Joly identified several key organizations and texts that promote open science and facilitate responsible international data sharing. These include:

  • The United Nations Educational, Scientific and Cultural Organization’s Recommendation on Science and Scientific Researchers and Recommendation on Open Science proclaim the right for people to benefit from scientific advancement, which is a foundation of open science (UNESCO, 2017, 2023).
  • The World Health Organization’s Sharing and Reuse of Health-Related Data for Research Purposes is a guide to data sharing and protection, which was adopted in 2022 in part to capture lessons learned from data sharing and sample sharing experiences during the COVID-19 pandemic (WHO, 2022b).
  • The Convention on Biological Diversity’s Nagoya Protocol establishes procedures for international sharing of genetic data; Joly noted, however, that it is generally understood to apply only to nonhuman biological samples and has not been ratified by the United States (Convention on Biological Diversity, n.d.b).
Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
  • The FAIR (Findability, Accessibility, Interoperability, Reusability) Guiding Principles is a framework that has become a community norm for scientific data management and stewardship, as computational systems handle increasingly large and complex collections of life science data (GO FAIR, n.d.).
  • The Global Alliance for Genomics and Health has several products to facilitate responsible access to life science data, including tools for finding data and interpreting the consent associated with them, and a framework for responsible data sharing (GA4GH, n.d.).
  • The Public Health Alliance for Genomic Epidemiology promotes ethical data sharing for nonhuman microbe and pathogen specimens to respond to disease outbreaks (PHA4GE, n.d.).
  • The Global Initiative on Sharing All Influenza Data promotes rapid sharing of influenza and coronavirus genomic data, but it has been controversial and faces an uncertain future (Re3data.org, n.d.).
  • The European Union’s General Data Protection Regulation (GDPR) is a legal framework for the protection of individual privacy that applies within Europe as well as to any situation in which data are shared with people or organizations in Europe (European Commission, n.d.).

This list, while not comprehensive, illustrates the breadth of the efforts to guide data sharing at the international level and some of the common elements among them. In general, Joly said that international data sharing practices are guided by broadly formulated, flexible, and easily adopted community norms and standards. While there is broad agreement on the importance of data sharing in science, he noted that challenges can arise when national privacy laws differ from these norms or when scientists struggle to adhere to them. The implementation and enforcement of these norms and standards occurs largely through voluntary adoption and through their influence on national laws, rather than through international laws, which tend to be difficult to enforce. Finally, he noted that most efforts in this area have been dominated by organizations in North America and Europe, but that is gradually changing as organizations on other continents have become more active in deliberations and activities around data sharing.

DATA GOVERNANCE ACROSS THE GLOBE

John Ure, Access Partnership (Singapore), introduced a discussion among data governance experts from around the world. Panelists from Germany, India, Taiwan, Uganda, and the United States highlighted national data governance laws, tools, and frameworks in their respective countries and explored emerging trends in global life science data more broadly.

U.S. Federal Policies for Managing and Sharing Research Data

Taunton Paine, U.S. National Institutes of Health (NIH), discussed how NIH and other federal agencies in the United States have approached data sharing. The U.S. government declared 2023 the “Year of Open Science,” drawing attention to the goal of broadening access to life science data. Increased data access and sharing accelerates research by enabling the validation of published research, making high-value datasets more accessible

Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

for reuse, and enhancing the rigor and reproducibility of publicly funded research. With appropriate mechanisms, it also increases scientists’ opportunities for citation, recognition, and collaboration. Data sharing also fosters transparency and accountability, demonstrates good stewardship of taxpayer funds, and maximizes the value gained from research participants’ contributions, all of which promote public trust in research (Funk et al., 2019).

Despite the widely recognized benefits of open science, most of the data underlying published research are currently difficult to access (Errington et al., 2021; Gabelica et al., 2022; Tedersoo et al., 2021). In an effort to close this gap, NIH’s primary contribution to the Year of Open Science is its 2023 Data Management and Sharing Policy, a 10-year, multicommunity effort aimed at improving data sharing (NIH, n.d.). Consistent with prior policies, such as the 2014 Genomic Data Sharing Policy, this policy provides supplemental information on sharing human data via a controlled-access framework with appropriate consent, oversight, review, and approval for future use (NIH, n.d.). The policy makes NIH funding contingent on the provision of detailed data management and sharing plans, requiring that researchers address the type of data to be collected and generated; identify the specific tools, software, documentation, or data standards that will be used; provide details about the specific data repository that will be used; outline how and when the data will be findable, protected, and shareable; and delineate future limitations on the use of the data. The policy also prioritizes established data repositories as the preferred method for sharing and requires that data sharing be an early component of research design, not an afterthought, Paine said. While these are not legal requirements, noncompliance can affect funding decisions, which generally exert a strong influence on scientific practice.

Paine also noted that the White House Office of Science and Technology Policy issued a memo requesting that other federal agencies implement policies like NIH’s Data Management and Sharing Policy, suggesting that such requirements and practices are likely to continue to gain steam in the United States in the near future.

Life Science and Biological Data Sharing under the General Data Protection Regulation

Fruzsina Molnár-Gábor, Heidelberg University (Germany), described the data governance framework within the European Union (EU) General Data Protection Regulation. As an EU regulation, the GDPR generally applies in all EU member states without requiring them to take additional legal or regulatory measures to apply it in those countries. She highlighted provisions of the GDPR relevant to the processing of life science and health-related data that have been subject to different interpretations.

The GDPR generally prohibits the processing of personal data, Molnár-Gábor stated. However, as there is no absolute definition of personal data, and given that rights and interests in data protection must be balanced with the rights and interests in data processing, exemptions are granted depending on the context in which the data are processed. Assessing risks to subjects from data processing, she said, particularly the risk of reidentification, is crucial. The idea of anonymized data is often understood differently in different technical and legal contexts, which can complicate research, especially in the context of metadata processing, publication, and possible reidentification through artificial intelligence. In addition, the GDPR requires that data processing must be legally authorized, such

Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

as through the data subject’s consent, including for future and retrospective use. An exception to the general prohibition on processing of sensitive data based on balancing of research interests with data protections is permitted, Molnár-Gábor noted.

Data subjects can consent to the collection and use of their data, and under the GDPR, data subjects have rights that build on the transparency of data processing, such as the right to provide access, delay access, or rectify data. Different roles exist for those who process data, such as controllers, who decide on the purpose and essential means of data processing, and processers, who carry out data processing. In health research, Molnár-Gábor noted, these roles are often shared and exercised jointly, which is acceptable under the GDPR if issues of liability are considered. When research organizations act as data controllers, they are responsible for ensuring the rights of data subjects.

Transferring data outside of the EU presents challenges, said Molnár-Gábor. Under the GDPR, international data transfers require a two-step process that first identifies a legal mechanism for the transfer of data. Next, if the receiving country does not provide protection for the data equivalent to the level of protection provided by the GDPR, data exporters must assess the country’s level of protection and compensate for the lack of equivalence with a transfer mechanism that allows for data exchanges. Secure data transfers could be solved with improved regulatory interoperability between countries, Molnár-Gábor said, although this poses further challenges for regulators.

Finally, Molnár-Gábor said that other developments in EU law, such as the proposed 2023 European Health Data Space regulation (European Commission, n.d.) for the processing of health care and health research data, could simplify the sharing of life sciences data with non-EU countries but also create new compliance challenges. Molnár-Gábor added that while there may be issues with interpretating black-letter laws, the documenting, monitoring, and enforcing of GDPR principles can strengthen public trust by promoting transparency and accountability and empowering research subjects.

Data Governance, Principles, and Structures for Life Science and Medical Research in India

Athira P.S., National University of Advanced Legal Studies (India) discussed steps India is taking to address what she described as the country’s “growing pains” that have accompanied its digital revolution. As many aspects of Indian life have come online, from banking to education to health care, P.S. said that numerous efforts have arisen to provide a vision for governance in line with principles of security, trust, and digital empowerment (India Stack, n.d.). While India currently relies on guidance and regulations, not legislation, to ensure proper data management, she said that as technology advances, there is an increasing need for more inclusive, clear, secure, sustainable, and cohesive structural mandates for data governance that stress personal privacy.

One effort in this space is the proposed federated structure Data Empowerment and Protection Architecture (DEPA), which P.S. described as a collection of legal protections for personal health data that balances portability and access with security and public trust and represents a departure from the GDPR model of regulation (NITI Aayog, 2020). Under DEPA, she explained, each Indian citizen would be given a unique health identifier, linked through multiple channels, that would be protected and anonymized, yet accessible anywhere in the country. Along with the Digital Information Security in Healthcare bill (which also has not yet been enacted), DEPA would bring health data storing and sharing standards

Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

in India in line with international standards and protect research participants’ fundamental rights to autonomy and privacy. These fundamental rights are granted to Indian citizens by the Puttaswamy decision of the Indian Supreme Court (Puttaswamy, n.d.), a decision that also asserted that personal data should be collected for a specific stated purpose, relevant for that purpose, and limited to what is required to achieve that purpose.

P.S. pointed to another relevant mandate under consideration: India’s Digital Data Protection Act of 2022,1 which would cover all digitized personal data, including data collected offline that is later digitized. She said that this legislation includes a concept of “deep consent,” which indicates that personal data can be used without express consent if such use is necessary for national security, public interest, or public order. This legislation also claims extraterritorial operation outside of India; under this framework, P.S. said, approved countries would be considered safe data transfer locations, while transfers to other countries would be prohibited, making legal compliance and data interoperability important concerns.

Regulatory and Governance Frameworks in Human Biological Materials and Data Sharing in Uganda’s National Biorepository

Hellen Nansumba, Ministry of Health of the Republic of Uganda, described existing frameworks for data sharing in her country, along with some of the challenges involved in strengthening equitable data sharing practices. Genomics is a fast-growing field in Uganda, with the scope of work expanding and genomic surveillance accelerating after the COVID-19 pandemic. Nansumba explained that the country is instituting several frameworks for sharing human biological materials and associated data that include ethical requirements, such as informed consent, confidentiality, and impartiality, as well as governance requirements, such as biorepository management and transfer agreements for materials and data.

Uganda’s National Biorepository, which stores samples and data from several researchers and institutions, requires anyone depositing or accessing data to follow a specific workflow and requirements that are set at the point of funding, Nansumba explained. The data are first uploaded into the National Health Laboratory Information Management System, then undergo quality processing, and are then transferred to the biorepository and stored appropriately. To access data from the biorepository, researchers must request approval through the appropriate national Research Ethics Committee, satisfy the data and material transfer agreements, and upload the resulting findings and data back into the repository. “We expect [a] return of research findings, for example, in genotypes and microbiome data,” said Nansumba.

Nansumba suggested that Uganda and other low- and middle-income countries need stronger infrastructure and technology transfer capabilities to participate in more open, transparent, and equitable global data sharing among researchers, health systems, and patients. She noted that many countries do not have the focused information management systems required to store and transfer large amounts of genomic data. Nansumba suggested that collaborators establish a plan to share the costs of sustaining data over long periods of time, after a project has ceased. She also suggested that sharing research findings with

___________________

1 Since the workshop, the Digital Data Protection Act was enacted in August 2023 (see https://www.meity.gov.in/writereaddata/files/Digital%20Personal%20Data%20Protection%20Act%202023.pdf).

Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

patients will be important for improving transparency and strengthening equitable data sharing practices. Finally, Nansumba pointed to a need for building technology transfer capacity to answer questions related to the availability of data for vaccine and pharmaceutical development.

Taiwan’s National Health Insurance Research Database

Ya-Hsin Li, Chung-Shan Medical University (Taiwan) discussed governance approaches used for data in Taiwan’s National Health Insurance Research Database. Taiwan’s national health insurance, paid for by the government, has covered most of the Taiwanese population since 1995, Li explained, making the National Health Insurance Research Database a rich resource for studying real-world health data. This database links more than 400 sources of data covering nearly everyone in Taiwan. It includes health data from cancer registries, cause-of-death records, and many other sources; social data, such as information on factors like smoking or aging; and welfare data, such as information on single-parent families or sexual assault. It can also be used to analyze trends for diseases and treatments or conduct cross-national database comparisons on specific topics.

In Taiwan all the data are encrypted, deidentified, shared, and controlled by the Ministry of Health and Welfare, Li said. Data are linked with demographic variables, such as location, sex, age, diagnoses, prescriptions, and care visit details, but not laboratory tests or medical notes. Applications to work with national health data must be approved by the Ethical Review Board and data can only be accessed within the physical Data Science Centers, which are present in each major city. Li said that the requirement to physically visit a data center to gain access to its data is a strength in terms of privacy protection and works well in Taiwan, but noted that such an arrangement might be less feasible in larger countries.

APPROACHES TO DATA GOVERNANCE

Wei Zheng, Vanderbilt University (United States), moderated a discussion following the panelists’ remarks. Panelists were asked to consider similarities and differences among the laws and policies presented and discuss gaps and challenges associated with creating data governance policies. The discussion centered on data governance challenges and consent issues.

Data Governance Challenges

Panelists reflected on some of the unique attributes and main challenges for data governance in different countries. The NIH has demonstrated leadership in prioritizing data sharing in controlled-access public repositories, Paine stated, but it still faces challenges in balancing open access and sharing with participant autonomy, informed consent, and privacy. Another challenge is obtaining the resources needed to support data access and sharing. “Sharing data is not free, and it does require resources,” Paine said. To address this, NIH has invested significant resources in infrastructure, such as databases for genotyping and phenotyping, and asks researchers to include data management and sharing costs in their grant proposal budgets. Nevertheless, Paine said that cost will remain an important consideration going forward. Finally, Paine stated that providing access to large, complex genomic datasets can pose technical challenges that must be addressed.

Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

Molnár-Gábor identified three key challenges in implementing the GDPR. First, it is not easy to determine exactly which sector-specific data and privacy measures for processing health data are GDPR compliant. Often, this task falls to researchers or data processors, who could benefit from concrete rules, rather than the existing, more general provisions. Second, she noted that it is difficult to translate legal requirements into technical data protection measures. Lastly, many legal mechanisms that allow data transfer outside the EU/European Economic Area require, on the one hand, data exporters to conduct a comprehensive assessment of the level of data protection in the country receiving the data and, on the other hand, data importers to sign up to binding rules that complement their national data protection regulations, making international collaborations more difficult overall, Molnár-Gábor said.

Taiwan’s biggest challenge, Li stated, is ensuring when truly informed patient consent for research purposes has been collected, as much of its national data comes from routine health visits. Nansumba identified Uganda’s biggest challenge as a lack of transparency around the regulatory requirements for data access, especially for secondary use.

Broad Consent

Joly asked about the practice of gaining broad consent, which can eliminate the need to reestablish consent when new research is conducted using existing data. Paine replied that laws vary in the United States, but for genomic data, NIH’s consent framework does cover future use, with potential limitations depending on what the data are and how consent was structured. In the EU, Molnár-Gábor said that the definition of broad consent, as well as detailed rules for its application, are not included in the main body of the GDPR, which creates the potential for different definitions and rules for application in member states. In general, she said, transparent compliance with informational obligations is necessary, regardless of consent as the legal basis for data processing, to enable data subjects to enforce their data protection rights.

Damira Ashiralieva, National Scientific-Practical Center, Ministry of Health of Kyrgyzstan, noted that, in her country, informed consent is taken before sampling, a practice that proved helpful when COVID-19 diagnostics were used for further research. However, she said, the country’s national legislation still needs to be fully harmonized with international requirements. Nansumba stated that Uganda’s national biorepository does use broad informed consent, because most of the samples are from routine health care visits and are used for future research, and researchers do share their findings with interested participants. Clinicians also specify when samples will be used in the future—for example, to study hereditary diseases or for public health initiatives.

SUMMARY

In closing, Tucholski underscored the challenges that researchers, organizations, and countries face in balancing open access to data, data interoperability, and individual privacy rights. The data involved in life science research endeavors are often sensitive, and safeguards are likely important to protect them while they are stored, shared, transferred, and accessed, especially across national borders. She noted that subsequent workshops in the series would further explore benefits and risks involved in sharing data, as well as considerations for achieving equitable data sharing.

Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 13
Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 14
Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 15
Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 16
Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 17
Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 18
Suggested Citation: "2 Data Governance Principles for Life Science Research Across the Globe." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 19
Next Chapter: 3 Opportunities and Challenges for Life Science Data Sharing
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.