Previous Chapter: 4 A Model Framework for Decision Making When Sharing Blended Data
Suggested Citation: "5 Conclusion." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

5

Conclusion

In closing, we emphasize key themes from the previous chapters. First, agencies, policymakers, data users, and data subjects need to recognize that any blended (or nonblended) data release that offers nontrivial usefulness introduces disclosure risks; it is not productive or correct to think of disclosure risks as a “yes or no” feature. Second, once this fact is acknowledged, it is apparent that data-release strategies need to balance disclosure risks with data usefulness. When usefulness is high, stakeholders may be willing to accept greater risks to realize the benefits. Agencies can use various disclosure-protection methods for differing data-analysis objectives, such as tiered access approaches. Third, successful risk management strategies are likely to involve both technical and policy approaches. Some existing approaches can be gainfully applied with blended data, but others are less effective given the magnified disclosure risks in blended data. Fourth, disclosure risk management approaches need to be dynamic, involve stakeholder input, and rely on best practices. These characteristics can help determine desirable balance points in disclosure risk/usefulness trade-offs. Finally, agencies can be (and should be, in the panel’s opinion) intentional in examinations of risks at all stages of the blended data lifecycle.

RESEARCH NEEDS FOR TECHNICAL AND POLICY APPROACHES

Throughout this report, we identify aspects of technical and policy approaches for managing disclosure risks in blended data that could benefit from additional research. Among these, the panel identified three aspects as particularly worthy of timely investigation.

Suggested Citation: "5 Conclusion." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

Accounting for Effects of Composition in Practice

Given current realities, holders of ingredient data files will likely make decisions about data releases with only minimal, if any, coordination with others. This lack of coordination creates additional disclosure risks. For example, data-release policies can change, and if/when they do, policies may not account for the risk-management procedures in previous blending activities. Moreover, burdening data holders with anticipating and managing these situations may disincentivize data sharing. In practice, managing the risks from compositions may necessitate research on new technical approaches (e.g., new considerations or techniques for protecting confidentially and maintaining usefulness) and on new policy approaches (e.g., ways to coordinate across agencies providing ingredient data).

Obtaining Stakeholder Feedback

Many agencies are not set up to easily obtain stakeholder feedback to inform decisions about disclosure risk/usefulness trade-offs. Indeed, in some cases obtaining feedback might be politically fraught. Case studies and best practices for obtaining stakeholder feedback when blended data are desired would provide roadmaps for doing so.

Communicating Disclosure Risk/Usefulness Trade-offs to the Public

It is difficult for data users and data subjects, as well as policymakers, to understand disclosure risk/usefulness trade-offs. The notion that privacy and confidentiality is “all or nothing,” sometimes implied in privacy- and confidentiality-related laws, is misleading. Agencies need better ways to communicate disclosure risk/usefulness trade-offs to their stakeholders.

TWO AREAS DESERVING IN-DEPTH STUDY

In its work, the panel did not cover two important topics in detail, namely the need for a robust research computing and data (RCD) workforce, and informed consent for blended data. It is the panel’s opinion that both topics deserve dedicated, in-depth study.

Informed Consent

Public concerns often exist regarding the use of personal information (Auxier et al., 2019). Informed consent is intended to ensure data subject autonomy, but, in reality, it may not always provide satisfactory agency for individuals or organizations. Communicating risks and benefits can be difficult and complex. For example, Couper et al. (2010) found that

Suggested Citation: "5 Conclusion." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

“[…] specific information about the risk of identity or attribute disclosure influences neither respondents’ expressed willingness to participate in a hypothetical survey nor their actual participation in a real survey. However, when the possible harm resulting from disclosure is made explicit, the effect on response becomes significant” (p. 287).

Informed consent issues are amplified in blended data (National Academies of Sciences, Engineering, and Medicine, 2023a). In the panel’s view, customary informed consent language may insufficiently describe disclosure risks or potential usefulness inherent in blended data. Disclosure risk/usefulness trade-offs are difficult to communicate to respondents at the time of collection (particularly collection of ingredient data) because both attributes can change over time.

Challenges are compounded by differing policies among federal statistical agencies regarding informed consent (e.g., Basic HHS Policy for Protection of Human Research Subjects, 2018; Privacy Act, 1974). Presumed access provisions (Foundations for Evidence-Based Policymaking Act of 2018, 2019) also permit access to previously collected data that are to be subsequently used for statistical purposes. A recently proposed rule describes the responsibility of federal statistical agencies to provide informed consent regarding future purposes and uses of data to be collected, to explain how data will be protected, and to ensure that legal requirements are in place to permit data acquisition (Office of Management and Budget, 2023). However, federal guidance has not yet been issued detailing the informed consent language and standard language for memoranda of understanding for data acquisition across agencies (and the private sector) necessary to implement this provision. Communicating the intended uses of data and determining subsequent acceptable disclosure risks needs to consider the needs and concerns of respondents, while also permitting a practicable approach to the management of blended data.

In the panel’s view, future work is needed in the area of informed consent to improve communication about intended use, disclosure risk/usefulness trade-offs, and potential harm (Kelty, 2020; National Academies, 2023a). Relevant topics could include (a) ways to communicate (future) data use to data subjects (including blending of private-sector data), (b) processes by which persons or establishments can decide which data to share for a (future) purpose, (c) the effects of such decisions on management of disclosure risk/usefulness trade-offs for blended data, (d) the effects of release of personal data on confidentiality of data collected from the data subject’s community, and (e) ways to account for differing privacy preferences.

Research Computing and Data Workforce

Researchers in statistical agencies, government, academia, and beyond increasingly depend on the professional skills of the RCD workforce to

Suggested Citation: "5 Conclusion." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

facilitate the use of vast and ever-evolving technical resources (Schmitz, 2021). RCD professionals work at the intersection of cyberinfrastructure, research, and data. Data blending, and especially privacy and confidentiality protections as part of the blending lifecycle, clearly depends on an adequate RCD workforce.

To fully meet current and future RCD needs in a new data infrastructure, organizations engaged in data blending need to address issues hampering the full development of a stable, competent RCD profession and workforce. Meeting these RCD needs is complex.

On the one hand, agencies need to understand the role of RCD and recognize the growing need for RCD professionals given the volume of data, the rapid evolution of computing resources, and researchers’ general lack of experience or skills necessary to make full use of emerging tools and techniques (Schmitz, 2021, p. 19; Towns, 2023). As examples of issues that agencies need to address, the RCD profession lacks standardized job titles, has poorly defined job descriptions, and typically disperses work across multiple units within resource organizations (Maimone et al., 2022, p. 1). Traditional information technology may not naturally accommodate RCD roles and responsibilities, which can make communicating emerging program staffing needs to human resources departments difficult. Additionally, recruitment, retention, and development of RCD professionals is challenging, in part because clear career paths are not evident and also due to a lack of certificate and degree programs and scalable training (Towns, 2023).

On the other hand, agencies producing government data face several challenges. There is strong competition between the governmental and the private sectors for skilled staff, and starting salary disparities are significant. It is important to identify ways to make government employers more competitive when hiring RCD specialists.

In the panel’s view, cultivating an RCD workforce within government agencies is of major importance for statistical agencies and the contractors who support them. This area deserves extensive, dedicated study. New research on privacy-enhancing technologies makes evident the need for developing specialized techniques, tools, education, workforce training, free software libraries, and applications. Work in this direction has begun. For example, in 2021, the National Science Foundation funded a demonstration pilot to develop an RCD Resource and Career Center that would expand and disseminate resources for RCD professionalization, including support for adoption of an RCD Job Family Matrix to properly classify professional RCD roles (Brunson et al., 2021). Nonetheless, identifying ways to improve the competitiveness of government employers to attract and retain this workforce is also critical.

Suggested Citation: "5 Conclusion." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 93
Suggested Citation: "5 Conclusion." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 94
Suggested Citation: "5 Conclusion." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 95
Suggested Citation: "5 Conclusion." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 96
Next Chapter: Glossary of Selected Terms
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.