Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series (2024)

Chapter: 3 Opportunities and Challenges for Life Science Data Sharing

Previous Chapter: 2 Data Governance Principles for Life Science Research Across the Globe
Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

3
Opportunities and Challenges for Life Science Data Sharing

The third workshop in the series, held on June 1, 2023, delved into issues of ethics and equity in the context of data governance and stewardship. Trisha Tucholski, U.S. National Academies of Sciences, Engineering, and Medicine, and Lusine Poghosyan, Columbia University (United States), provided opening remarks. Building upon the first two workshops, they reiterated that the increasingly large, digitized datasets generated and analyzed by life science researchers present tremendous opportunities for international collaborations to advance innovation and address global health challenges, but such opportunities also come with the responsibility to share data securely and appropriately to protect individual privacy; and government, institutional, private investments; and national security. The third workshop was focused on examining the benefits, risks, and vulnerabilities involved in data sharing; best practices for feasible and equitable data sharing; and ways to navigate policies for authorized access to data while preventing unauthorized access.

INDIGENOUS KNOWLEDGE, BIOLOGICAL STEWARDSHIP, AND COMMUNITY DATA GOVERNANCE

To open the workshop and elicit some of the ethical and equity dimensions of data governance, Krystal Tsosie, Arizona State University (United States), provided a perspective on the complex relationship between Indigenous knowledge systems and Western science in the context of data ownership and stewardship.1 Historically, she said, scientific research has been rooted in the colonial act of “discovering,” collecting, and displaying specimens in museums. However, she characterized these as acts of biopiracy, because while the discoveries are attributed to Western scientists, the knowledge often originated with Indigenous peoples (Das and Lowe, 2018; Davis, n.d.).

Agriculture provides several examples. Indigenous agricultural systems were considered “primitive” by settlers, yet most of today’s cash crops and medicines are derived from colonized land and knowledge. Furthermore, an estimated 30,000 edible plants flourished worldwide during the precolonial period, but the combination of industrial farming and the highly mechanized Green Revolution of the 1970s resulted in a drastic loss of biodiversity and widespread ecosystem degradation, and further entrenched societal inequities. Today, a mere 30 plant species constitute most diets worldwide, and Indigenous agricultural practices are seen as the key to the sustainability efforts needed to rescue degraded ecosystems, Tsosie noted. Heirloom seeds and varietals long stewarded by Indigenous people are being sought to reintroduce biodiversity, yet Tsosie cautioned that new gene editing

___________________

1 According to the National Library of Medicine Data Glossary, data ownership refers to the “legal control of and responsibility for data,” whereas data stewardship “involves ensuring effective control and use of data assets and can include creating and managing metadata, applying standards, managing data quality and integrity, and additional data governance activities related to data curation” (see https://www.nnlm.gov/guides/data-glossary/data-ownership and https://www.nnlm.gov/guides/dataglossary/data-stewardship).

Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

techniques threaten, once again, to enable Western scientists and companies to benefit unilaterally from Indigenous knowledge.

Tsosie argued that the concept of “open data,” which purports that data should be freely accessible to advance life science discoveries for the benefit of all, ignores the fact that Indigenous communities have struggled for centuries with unethical research practices, strained relationships with settlers, and trust eroded by their descendants. Rather than generating benefits for all, these communities have learned from experience that research benefits often flow in only one direction: out of their communities. In one example, scientists developed an antimalarial compound based on information they had gained from Indigenous communities in French Guiana, without attributing that knowledge to the community members in the patent the researchers received (Pain, 2016). In response to their history of such experiences, Tsosie said that some Indigenous communities are now restricting data sharing, and in some cases even forbidding genomic research using biodata originating in their communities.

It is important to recognize that patent systems heavily favor Western forms of intellectual property, Tsosie said. Indigenous practices are often based on generations of stewardship, experiential learning, and hypothesis generation, practices that are not typically granted patents. In fact, the very concept of ownership is often in conflict with Indigenous practices. However, even if Indigenous communities attempt to protect the intellectual property they generate, they face significant inequities in accessing the legal resources to do so, she said. Universities and corporations have powerful legal and financial resources, including specific legal frameworks protecting the right to patent inventions that result from federally funded research at U.S. universities, but most Indigenous communities—which may not even be legally recognized by the federal government—do not. Furthermore, she noted that companies and large research organizations are disincentivized from using knowledge and genomic data from Indigenous and other minority populations to specifically benefit those populations because of the structural context in which discoveries are monetized, which runs counter to equity aims. “When you tie innovation to economic activities, then we are always fundamentally going to disenfranchise the few and the minority, and this is not a definition of equity that we need to be advocating for,” Tsosie said.

Increasing attention is now being focused on the importance of equitable research practices to ensure that benefits are shared with those who contributed data, continued Tsosie. The Nagoya Protocol, a legal framework under the Convention on Biological Diversity, requires fair and equitable benefits sharing. However, the Nagoya Protocol does not provide a detailed roadmap on how to achieve this, nor is the United States a signatory to it, meaning that tribal nations within the territory of the United States cannot operationalize its stipulations or engage in global collaborations. What these communities need, Tsosie said, is intrinsically defined rights to exercise autonomy and protect their genomic data, a concept she called Indigenous genomic data sovereignty.

Tsosie said that one outcome of the growing movement to counter Indigenous disenfranchisement in science is the Native BioData Consortium (n.d.), launched with an aim of empowering Indigenous communities with data sovereignty.2 It is a nonprofit,

___________________

2 As defined by the National Library of Medicine Data Glossary, “data sovereignty refers to a group’s or individual’s right to control and maintain their own data, which includes the collection, storage, and interpretation of data. Indigenous data sovereignty refers to the ability for Indigenous peoples to

Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

Indigenous-led biological data repository that uses digital tools and machine learning approaches, such as blockchain technology and federated learning, to protect Indigenous people’s control of their genomic data and encourage equitable, beneficial research (Boscarino et al., 2022; Mackey et al., 2022). Another example, shared Tsosie, called Local Contexts (n.d.), is a global initiative that provides Indigenous communities with tools for defining data attribution,3 access, and use rights to support data provenance4 and enable greater transparency and integrity in research.

The status quo in scientific practice, and the growing movement toward open science, privileges researchers’ access to data, but Tsosie cautioned that it fails to adequately attend to the equity implications of data access and sharing. She argued that researchers have a responsibility to advance equity not only through the recruitment of study participants, but also through the inclusion of communities as partners in knowledge generation. This means incorporating benefits and ethics into the full data cycle and including community voices at all stages of research, not just at the end or as a means of starting a study (McCartney et al., 2023). She also asserted that communities who participate in research should see benefits of that participation in the near term and not just as some distant outcome that may or may not emerge after a scientific paper is published.

In contrast to FAIR (Findability, Accessibility, Interoperability, Reusability) data principles, which do not explicitly address the inclusion of community members as a part of the research process, Tsosie suggested focusing on CARE data principles, which emphasize Collective benefit, Authority to control,5 Responsibility, and Ethics (Carroll et al.,

___________________

control their data and includes autonomy regarding a variety of data types such as oral traditions, DNA/genomics, community health data, etc. Within the context of transnational indigenous sovereignty and self-determination movements, indigenous data sovereignty can be a powerful tool for those whom the data represents, which claims the rights of Indigenous peoples to use and interpret the data in a way that is accurate and appropriate given their circumstances, customs, and communal way of life” (see https://www.nnlm.gov/guides/data-glossary/data-sovereignty).

3 Generally, data attribution refers to crediting or ascribing the source of data.

4 As defined by the National Library of Medicine Data Glossary, “data provenance, sometimes called data lineage, refers to a documented trail that accounts for the origin of a piece of data and where it has moved from to where it is presently. The purpose of data provenance is to tell researchers the origin, changes to, and details supporting the confidence or validity of research data. The concept of provenance guarantees that data creators are transparent about their work and where it came from and provides a chain of information where data can be tracked as researchers use other researchers’ data and adapt it for their own purposes” (see https://www.nnlm.gov/guides/data-glossary/data-provenance).

5 In this context, authority to control refers to Indigenous peoples’ authority to control and govern their data. “(The) United Nations Declaration on the Rights of Indigenous Peoples affirms Indigenous Peoples’ rights and interests in their data. Recognition of these rights bolsters Indigenous Peoples’ authority to control and govern such data, further affirming the need for ‘data for governance.’ Indigenous Peoples must have access to data that support Indigenous governance and self-determination. Indigenous nations and communities must be the ones to determine data governance protocols, while being actively involved in stewardship decisions for Indigenous data that are held by other entities” (Carroll et al., 2020, p. 6).

Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

2020, p. 3).6 Describing equity as “both a process and an outcome” (NASEM and NAM, 2023, p. 41-42), she emphasized the importance of upholding Indigenous data sovereignty, fighting existing power dynamics, and ensuring full opportunity and access for Indigenous communities as vital to centering equity in scientific decision-making, engagement, and benefits.

CHALLENGES OF TRADITIONAL KNOWLEDGE PRESERVATION AND PROTECTION IN CENTRAL ASIA

Zhyldyz Tegizbekova, Ala-Too International University (Kyrgyzstan), discussed traditional knowledge in Kyrgyzstan and some of the challenges faced in its preservation and protection. The five Central Asian states are rich in traditional social, cultural, and religious customs, which take many forms, including language, food, medicine, stories, games, arts, crafts, music, dance, and poetry. In reflecting a community’s shared values, Tegizbekova said that these customs, which encompass traditional knowledge, distinguish one community from another, provide spiritual meaning, and promote community continuity.

Digitizing traditional knowledge is valuable for knowledge sharing; however, it is also critical to protect traditional knowledge from exploitative commercialization, misuse, or misappropriation, which not only can be culturally offensive but also can cause economic or spiritual damage to a community, Tegizbekova said. She highlighted the example of corporations that may wish to gain traditional knowledge about plants with medicinal properties. “It’s not about keeping these traditional herbs in secret, but it’s about how benefits will be shared among the medical companies that [learn from] traditional communities, which already use traditional herbs in medicine,” she said. Unfortunately, she continued, the existing intellectual property system is insufficient for protecting Central Asian traditional knowledge or for supporting local communities that wish to commercialize their products. Kyrgyzstan has laws preserving and protecting traditional knowledge, as well as an Intellectual Property Digital Library holding more than 1,000 items, but Tegizbekova said that these laws need to be strengthened and their implementation fully funded for them to be effective. She added that the other four Central Asian countries do not have dedicated traditional knowledge legislation, meaning that issues of lawful access to and use of traditional knowledge, as well as genetic resources, are regulated ineffectively.

From a research perspective, Tegizbekova said, traditional knowledge “needs to be recorded and digitalized, but at the same time we have to keep in mind how to respect the IP [intellectual property] rights of Indigenous communities.” In addition to specialized legislation, she believes that traditional knowledge protection in the region could be enhanced through awareness campaigns; developing more precise definitions of traditional knowledge and folklore; using expert assistance in collection and digitization; the establishment of national centers for the collection, popularization, protection, and preservation of traditional knowledge; and adopting of regional treaties, laws, and/or programs that support local capacity building for traditional knowledge preservation.

___________________

6 Here, ethics refers to the centering of “Indigenous Peoples’ rights and wellbeing across data ecosystems and throughout data lifecycles in order to minimize harm, maximize benefits, promote justice, and allow for future use. Paramount to ethics in data practices is representation and participation of Indigenous Peoples, who must be the ones to assess benefits, harms, and potential future uses based on community values and ethics” (Carroll et al., 2020, p. 6).

Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

GBIF: KICK-STARTING THE BIODIVERSITY PUBLICATION PROCESS FOR TAJIKISTAN

Samariddin Barotov, Institute of Botany, Plant Physiology, and Genetics of the Tajikistan National Academy of Sciences, discussed how Tajikistan’s partnership with the Global Biodiversity Information Facility (GBIF) has enhanced digitization and data sharing efforts among members of the country’s biodiversity research community. GBIF is a voluntary, intergovernmental network and research infrastructure that provides free and open access to global biodiversity data (GBIF, n.d.a). It houses more than 2 billion species occurrence records and nearly 80,000 datasets. Researchers in more than 60 countries download 23 billion records per month, and almost 8,000 peer-reviewed papers have been published using its data.

GBIF includes observations, digitized specimens, remote-sensing data, environmental DNA, and other types of data that are linked and shared via common data standards, data indexing, and publishing mechanisms, Barotov said. This makes it a rich resource for biodiversity evidence as to where and when species have lived, information that can be used to guide research goals as well as policies for biodiversity protection. GBIF data have been integral to many publications by universities, museums, governmental agencies and ministries, field scientists, citizen scientists, and businesses.

Barotov explained that, to encourage more publications from Armenia, Belarus, Georgia, Kyrgyzstan, Tajikistan, Ukraine, and Uzbekistan, and also to educate these countries’ researchers about open data and data sharing, GBIF Norway launched the BioDATA program in 2018. As part of this effort, the BioDATA Capacity Enhancement Support Program teaches Tajikistani institutions how to best utilize their biological resources, learn about data collection, and publish through GBIF via proscribed steps that include digitization, registration, and data conversion. Through this program, Barotov (2023) and other researchers from Tajikistan published the data of the Herbarium Fund of the Institute of Botany, Plant Physiology, and Genetics of the Tajikistan National Academy of Sciences.

BALANCING RISKS AND BENEFITS

Vasiliki Rahimzadeh, Baylor College of Medicine, moderated a discussion on the benefits, risks, and vulnerabilities of sharing life science data; best practices for sharing data feasibly and equitably; and navigating policies for authorized and unauthorized access.

Law, Ethics, and Ownership

Rahimzadeh asked panelists to discuss how they reconciled the tensions between intellectual property law and Indigenous views of knowledge, data, and ownership. Tegizbekova replied that she believes that most Indigenous communities are happy to share certain knowledge, because—unless it is specialized knowledge that is needed to help a community survive or thrive spiritually—the issue is less one of intellectual property than one of knowledge in the public domain. However, she said, there is a growing awareness of how destructive the sale of products produced by a community or derived from their knowledge can be. In 2019, for example, an Indigenous community in Panama accused Nike of using its design on a shoe, and Nike stopped producing the shoes. The World Intellectual Property Organization is developing an internationally accepted definition of traditional knowledge, which may also help resolve some of the tension, she noted.

Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

Rahimzadeh added that the considerations around human data and intellectual property are different from those surrounding plant or animal data.

Tegizbekova said that approaches to balancing legal and ethical requirements may vary depending on the research, the methodologies employed, and the institution. In general, however, exposing confidential information is a major violation, she said. Damira Ashiralieva, National Scientific-Practical Center, Ministry of Health of Kyrgyzstan, suggested that all biological material should be assessed for sensitivity and accessibility. Ensuring equity and transparency is more difficult to achieve, especially for international collaborations with multiple experts, as there may be national security interests at stake. It is important to weight the benefits against the risks and consequences, which may get even more complicated with the emergence of new technologies, such as artificial intelligence and synthetic biology. Faina Linkov, Duquesne University, agreed that ethical, accessible, and equitable research requires strong regulations, but cautioned that those regulations must be feasible to follow or there is a risk that researchers will simply ignore them.

Equity Considerations

Shalkar Adambekov, Kazakh National University (Kazakhstan), noted that Western research regulations were adopted in response to a long and well-publicized history of unethical experiments, which have contributed to a history and concept of research ethics that Central Asian nations do not necessarily share. He asserted that equity in research is hard to achieve and posited that it should not necessarily be a priority above other priorities in scientific collaborations. He reasoned that international collaborations are inherently unequal because the needs are unequal: Researchers in developing countries need access to resources, such as computing power, and particular areas of expertise, such as biostatistics, while scientists in more-developed countries need data from less-developed countries. In addition, he suggested that Indigenous communities do benefit from these collaborations through the expertise and financial resources that are shared. “Inequality in science might be a good thing, as long as it’s done ethically and benefits science,” he stated.

Adambekov drew a distinction between life science data and traditional knowledge, which is much harder to understand, quantify, own, and share. Globalization and industrialization have not benefited every country equally, and he suggested that developing nations should balance their desire to protect their cultural identities against the benefits that can be gained through access to new practices and resources. Sharing data or knowledge does not necessarily result in significant loss, nor does it lead to immediate equity, he stated.

Tegizbekova noted that for communities whose rights are weakly protected at the national level, an internationally accepted definition of traditional knowledge will strengthen justice and better protect the rights and culture of Indigenous communities from unethical exploitation by multinational corporations, where priorities often do not align with those of Indigenous communities. Expressing hope that justice could be achieved, she said developing countries must work together to promote benefit sharing, protect genetic resources, and protect the rights of Indigenous communities. Rita Guenther, U.S. National Academies, stated that the type of data being shared or collected—whether large demographic datasets, personal medical data, or data used as building blocks—can influence when and to whom those data should be accessible, and can help determine different approaches for balancing scientific advances with privacy protections and security measures.

Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.

Research on individuals, research involving traditional knowledge, and research pursued to generate profits may raise different considerations depending on the perspective one takes, but regardless, she said that it is important to follow key principles of informed consent, proper preparation, and compliance with institutional and governmental regulations that are constructed to balance benefits and risks. However, these concepts can quickly become challenging in practice. Reiterating researchers’ responsibility to protect the interests of people and ecosystems, she said that it is important to have honest and nuanced discussions to work through these complex issues and inform better data governance frameworks.

SUMMARY

The third workshop addressed complex issues in the ways different communities contribute to and benefit from life science research and innovation. Speakers highlighted how Indigenous communities in particular have been disenfranchised in data exchanges and often lack the legal and financial resources and expertise to navigate intellectual property and patent law. Participants offered perspectives on how this has played out in the U.S. context and in Central Asian countries, where speakers stressed the importance of access to better resources and expertise, as well as stronger legislation to address traditional knowledge protection and preservation. Speakers suggested that, developing a globally shared definition of traditional knowledge and demonstrating a commitment to more equitable research practices, with different benefit and risk considerations for different data types, could help to address some of these issues.

Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 20
Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 21
Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 22
Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 23
Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 24
Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 25
Suggested Citation: "3 Opportunities and Challenges for Life Science Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2024. Engaging Scientists in Central Asia on Life Science Data Governance Principles: Proceedings of a Workshop Series. Washington, DC: The National Academies Press. doi: 10.17226/27156.
Page 26
Next Chapter: 4 Life Science Data Governance in Practice
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.