There is broad recognition that health data are the cornerstone of current and future health research. However, when health data are shared and used, population-level interests in generating new knowledge must be appropriately balanced with the need to protect individuals and groups from risks, such as those that might arise from intrusions of privacy or data misuse.
This statement describes key barriers and solutions for leveraging and sharing health data from the perspective of the research and research oversight leaders working group, including bioethicists, health law experts, and institutional review board (IRB) members (see Box 4-1).
From the perspective of this stakeholder group, the key concerns and existing barriers to advancing data sharing, linkage, and use can be divided into five categories: (1) cultural barriers, (2) ethical barriers, (3) regulatory barriers, (4) financial barriers, and (5) operational barriers.
Different groups—including researchers, clinicians, patients, health care executives, and other stakeholders in the health care system—and members of these groups have different beliefs about whether data should be freely shared and different working definitions of what constitutes “health data.” For example, no standard approach exists regarding how best to involve patients in decision making about the sharing of data—even de-identified data.
These different beliefs arise in part from the shared culture—the set of beliefs, behaviors, and values—of each group. For example, one reason why researchers have been reluctant to share data is that the culture of research rewards scientific productivity and keeping data proprietary rather than making data available as a public good (Kuntz et al., 2019). Even when researchers do engage in initiatives that aim to increase transparency about the ways that health care data are being used, they may encounter resistance from health care executives, patients, clinicians, and other stakeholder groups with differing beliefs.
In the United States there is not a shared vision or an agreed-upon set of ethical principles regarding data ownership, control, and access requirements, in part because of differing ethical convictions about how data should be gathered, maintained, and used for individual or collective good (Haug, 2017). Differences encompass such issues as who should have control over certain types of data, including when and how data are shared as well as the degree of transparency about data sharing and about the linkages among the organizations and individuals responsible for collecting and storing health data. Additional considerations include what rights patients and clinicians have regarding control over data and any financial gain resulting from the sharing and use of data. These issues are especially important with highly sensitive data and with data involving vulnerable populations or communities.
While many data-sharing efforts that aim to improve health and health care will not meet the federal definition of research involving human participants (U.S. Government, 2017), policies and practices regarding data ownership, access, and control have important implications for health care research that leverages electronic health data as well as for human subjects research and privacy protections.
In addition, researchers can encounter concerns among patients and communities regarding potential unintended consequences of data sharing: for example, whether inappropriate use of shared data could lead to care rationing, discrimination, profiteering, or other adverse effects.
While regulations focus on minimizing risk, institutions and states vary in their interpretation of regulations and responsibilities related to health data sharing and use. This has led to variability in data-sharing practices, IRB requirements, privacy offices and privacy officer practices, institutional approaches to disclosing information about data sharing to patients, and the structure of data use agreements, among other things. This variability can create significant impediments to multi-institutional research, which is now more common than when the research regulations were introduced. In addition, the absence of clarity about data ownership leads to variation in legal interpretations about data as intellectual property.
To minimize risk, efforts are made to render data non-identifiable (Emam et al., 2015). However, given the maturation of tools and algorithms to compile, match, and re-identify previously non-identifiable data, it is essential to develop more robust anonymization procedures and create effective and enforceable measures that ensure proper stewardship, access, and use (Na et al., 2018; Rocher et al., 2019). Future uses of data will continue to evolve and will be affected by technological advances, especially as personal data are increasingly generated and shared through new social platforms, online patient communities, mobile and wearable devices, and other means. Data from these platforms could be “cross-walked” with other data sources, imperiling privacy and creating unanticipated discoveries (Parasidis et al., 2019). In addition, the uncertainty about the future uses of data has made it difficult to evaluate potential risks beyond re-identification and to determine what should be disclosed to individuals when informed consent is required (Dove, 2015).
Data have value and can be used for financial gain. For example, technology companies and biopharmaceutical companies are increasingly seeking access to electronic health data for commercial development (Cassel and Bindman, 2019) and are not typically bound by the same covenants as health care professionals to keep data confidential, which increases the risk of data misuse (Parasidis et al., 2019). In general, many kinds of companies see commercial potential in health data, particularly when blended with social media and geolocation data.
Another financial consideration is that the cost of purchasing data for research can be prohibitive. This cost can vary by data holder. For example, the Centers for Medicare & Medicaid Services (CMS) has created a tiered pricing schema based on the size of the dataset (ResDAC, 2018), whereas other privately held data vendors deploy different pricing strategies. Completeness and quality of the data may vary by source. Newer companies are creating a consumer-driven model for individuals to determine the value of their data and make them available to industry or other purchasers. Underlying all of this is the fact that the value of data is relative based on the intended use, and no simple formula exists to help data creators or data purchasers navigate this arena.
As data sharing becomes more widespread, challenges related to data governance, provenance, and quality will intensify. For example, the tremendous variability in the quality of electronic health data creates onerous burdens for validating the data prior to research use (Platt and Lieu, 2018).
The time required to prepare and validate data for research often creates a lag from data generation to availability for research that can diminish the utility of the data for some research. An additional barrier is that in the absence of a national systematic catalog of available electronic health data, individual researchers are obligated to conduct a search each time they seek a given data resource.
From the above description of concerns and barriers, the research and research oversight group prioritized five high-priority issues that are critical to facilitate appropriate and widespread data sharing and use for improving health and well-being. When prioritizing the barriers, the working group members considered which barriers represented the most significant issues preventing widespread
data sharing and linkage to improve patient care and which could be either wholly or partially addressed in the next 2 to 3 years. Some of these priority barriers combine several of the themes from the previous section.
1. Heterogeneity in beliefs among patients, clinicians, and researchers about whether data should be freely shared
The differing beliefs among stakeholders about how health data should be used has slowed progress. The greatest need, which could be met within 2 to 3 years, is to understand more about what these beliefs are and where they coincide and conflict. A literature review and national survey could identify the beliefs of different stakeholders and knowledge gaps. It would also be useful to understand the heterogeneity that exists within specific patient populations. A research organization could conduct such a study, with funding from the National Institutes of Health (NIH) or the Patient-Centered Outcomes Research Institute (PCORI).
Consistency in policies and clearer understanding of the range of beliefs and attitudes would motivate health systems, health insurers, and researchers to share data. A commentary in a high-profile, peer-reviewed journal could describe a model policy to establish common ground that incorporates a commitment to openness and sharing. Existing policies could then be compared with the model policy to improve consistency.
Funders have made major steps in embracing open science and data-sharing policies, which require that researchers make their results openly available (Hrynaszkiewicz and Altman, 2009). However, they have struggled with implementing new standards and with monitoring and enforcing requirements for data sharing. Promising approaches to promoting data sharing include demonstrating and implementing best practices to incentivize desired behaviors and building infrastructure that makes it easier to conduct and track open science (Bierer et al., 2016).
Health systems and health insurers are significant generators and holders of health data, and these data hold tremendous value in the research context. However, apart from a few vanguard health systems and health plans, the motivation to provide these data for use in scientific research is lacking. In particular, open science is conceptually incompatible with the business model for health care providers, who benefit from internalizing benefits gleaned from their patients’ information rather than sharing it.
An important first step would be to create incentives to share data and to learn more about how to encourage stakeholders to take advantage of these
incentives. To lay the groundwork for the development of a set of open science standards, existing policies should be assessed to determine and improve their provisions for openness. Key organizations to engage in this effort include the Department of Health and Human Services (HHS)—in particular, NIH, CMS, and The Office of the National Coordinator for Health Information Technology (ONC)—as well as relevant trade organizations. One useful approach would be to work with the communities and organizations that have made considerable progress on data sharing, such as the pediatric hospital community, the cardiology community, and the Center for Open Science (2020).
Promotion and tenure policies that recognize and reward open science and collaboration would increase the impetus for data sharing (Kuntz et al., 2019; Pierce et al., 2019). In addition, the articulation of ethical principles for data sharing by a multi-stakeholder convening could advance this cause. Organizations to involve in this effort include PCORI, the Office of Science and Technology Policy (OSTP), NIH’s Office of Science Policy, and bioethics groups. This would also need significant buy-in from academia and journal publishers. Once such ethical principles are established, training could be developed and provided through professional channels, such as NIH or the Collaborative Institutional Training Initiative.
2. Lack of shared principles regarding data ownership, stewardship, governance, rights, and responsibilities
Without shared principles, organizations and stakeholders work at cross-purposes and collaboration is difficult. Addressing the lack of shared principles will influence many of the other barriers and action steps identified below.
A critical first step would be the convening of a task force to create a consensus statement—with signatories—that would publicly affirm a set of principles and commitments on the collective benefits of data as a public good. A neutral organization such as the National Academy of Medicine could convene such a group. Stakeholders that should participate in the convening include patients and patient advocacy groups, IRB members, users of health data, medical societies, federal agencies (including NIH and its National Library of Medicine [NLM]), health information technology developers, other technology companies that both produce and use data, and health journalists. As part of this work, current perspectives on data ownership should be identified and compared.
A related action step would be to establish a multi-stakeholder commission to craft a code of conduct for data holders (Sim, 2019) and an accompanying
“patient health data bill of rights” to ensure that both the generators and the users of data have clear understandings and expectations of how data will be held and shared (see, for example, Knoppers et al., 2011). This code of conduct should (1) describe appropriate data stewardship models; (2) establish a fiduciary role for data holders and identify criteria for data sharing and use for those data holders; and (3) define the scope of what needs governance (e.g., the permitted uses of data or the review of data use), who should govern, and the pros and cons of different data governance models. The major standard-setting bodies, such as Health Level Seven International, will need to be engaged to identify optimal governance approaches. The governance system for the All of Us Research Program may be a valuable model. Lessons learned from the implementation of the General Data Protection Regulation in the European Union, which governs data protection, privacy, and portability and applies to all companies processing the personal data of people residing in the European Union, could provide additional guidance (EU, 2020).
A complementary action step is to establish a federal commission that could seek agreement on issues of data control and data protections. Federal organizations that should be involved in this effort include HHS, ONC, OSTP, the Food and Drug Administration, the Office for Human Research Protections (OHRP), the Office for Civil Rights (OCR), and the Centers for Disease Control and Prevention (CDC). It may also be worth including the Federal Trade Commission and the Federal Communications Commission. These two Commissions should be clear about the link between clinical and claims data and other data types/systems. This is a bipartisan issue on which rapid progress could be made, especially with leadership from the White House.
Federal agencies should create clearer guidance concerning data policies, including any legal restrictions. The newly finalized ONC and CMS regulations on data blocking and interoperability will shape this effort. (See Chapter 5 for a description of these rules.)
These action steps could be part of a broader effort to develop a statement or identify existing statements that articulate reasons to share data, such as altruism, community-mindedness, future benefit, and solidarity. As a longer-term opportunity, stakeholders in the health care system could collaborate to establish a national honest broker akin to Medicare to house and share data, to allow merging of data, and to establish uniform coding that would allow data to be anonymized (Boyd et al., 2007; Dhir et al., 2008). As an initial step, it would be useful to speak with CMS’s Virtual Research Data Center and Research Data Assistance Center to assess the time and resources required to establish these capabilities (ResDAC, 2020b).
3. Uncertainty about potential uses of data and accompanying concerns about consequences arising from inappropriate or unauthorized use
Many uncertainties surround potential future uses of data and the ramifications of those uses. Controversial cases include the use of data for competitive advantage or commercial gain, rationing care, and discrimination as more data become available (e.g., genomic data) (Parasidis et al., 2019). A recent example of a controversy involves the transmission of data from machines that people use to treat sleep apnea to insurance companies and to the companies that manufacture and distribute the devices (Marshall, 2018).
The potential value of health data is drawing considerable commercial interest—these data are useful in, for instance, research, product development, and advertising, the latter of which raises concerns (Thielman, 2017). Research on the valuation of health data, on the effects of uncertainty on this value, and on other commercial issues associated with health data will help resolve questions that arise. This research should be informed by discussions about data ownership, stewardship, governance, rights, and responsibilities, as described under the first priority barrier above. Specific questions include the following:
This work likely will require different sources of funding for different questions. For example, the Robert Wood Johnson Foundation, The Donaghue Foundation, or The Greenwall Foundation might support work addressing the question of compensation described above. Improving approaches for communicating with patients in plain language about how their data are collected and used is also essential. Potential approaches include
Disclosures about data use require a substantial amount of context (Paasche-Orlow et al., 2005); CDC has done work on clear communications that could be leveraged (CDC, 2019).
The regulations and associated penalties for unauthorized data sharing and re-identification need to be clarified, as do protections for organizations that experience data breaches despite following existing regulations. In addition, the challenge of reconciling federal laws with state or other laws sometimes results in a no-win situation for organizations that abide by some laws but violate others because there are direct conflicts among them. Further analysis could identify where additional guidance or regulations are needed.
Other countries and industries could provide valuable lessons in how to address data-sharing issues. Examples include social media companies, the aviation industry, the banking industry, and the European Union’s experience with the General Data Protection Regulation.
As a specific proposal, lawmakers could pass a law like the Genetic Information Nondiscrimination Act to enhance governance of the sharing and use of health data, prohibit discrimination and other harms, and increase transparency about the regulations that exist to protect patients and others (Equal Employment Opportunity Commission, 2008). As a first step, several philosophically aligned advocacy organizations, such as Research!America, FasterCures, AcademyHealth, the Electronic Frontier Foundation, and Genetic Alliance, could work with legislators and advise on a course of action. A longer-term opportunity will be to require transparency about the commercial uses of health data.
4. Variability across institutions and states in their interpretation of regulations and responsibilities
Variability in the interpretation of regulations and responsibilities aligns with the first priority barrier, in that shared principles must be developed and used to interpret regulations and responsibilities in a consistent way.
As action steps achievable within 2 to 3 years, the National Governors Association (NGA), along with other organizations, could request clearer guidance from the federal government (e.g., OCR and OHRP) about data policies similar to recommendations put forth in NGA’s report Getting the Right Information to the Right Health Care Providers at the Right Time (NGA, 2018).
In addition, HHS—especially ONC—could develop use cases with associated legal and regulatory considerations akin to ONC’s resources for PCORI data (HealthIT.gov, 2018).
A “help line” to federal agencies (e.g., HHS, OCR, OHRP) or a real-time appeals process could clarify different interpretations to federal policies. Alternatively, the identification of an honest broker (e.g., CMS’s Virtual Research Data Center [ResDAC, 2020b]) could help clarify different interpretations of federal oversight.
Another potential way to address this barrier, building on previous efforts by the Association of American Medical Colleges to reduce variability in the interpretation of regulations (NIH, 2006), could be to convene IRB chairs, privacy officers, regulatory officials, and thought leaders to draft guidance for IRBs and compliance offices.
5. Operational challenges, including uneven data quality, the cost to procure data, and the lag time between when data are collected and when they are available for use by researchers
Even with shared principles in place, operational challenges will need to be overcome. Further work on data ownership, access, and control will inform future efforts regarding this barrier—for example, by pointing toward incentives for institutions to share data.
In the meantime, and as a starting point, academic institutions and health systems should identify and implement incentives that encourage data sharing by researchers and other data holders.
In addition, ONC could take the lead in developing data standards in partnership with clinicians and patients. Specific approaches include
Technical experts could provide possible solutions to data capture issues in ways that improve quality overall.
Continued demonstrations of poor or incomplete EHR data will help convince health care executives of the need to improve EHR quality. New policies and funding could allow federal data holders to provide data more cost effectively and more quickly. Finally, an enhanced technical infrastructure could enable patients to collect and report health data as well as help to educate them about why it is important to provide those data (Califf et al., 2016).
In addition to the major stakeholder groups of clinicians, patients, and researchers, many other groups will be involved in implementing the action steps described above, including
This page intentionally left blank.