Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data (2024)

Chapter: 3 Policy Approaches to Managing Risks When Sharing Blended Data

Previous Chapter: 2 Technical Approaches to Managing Risk When Sharing Blended Data
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

3

Policy Approaches to Managing Risks When Sharing Blended Data

In this chapter, we discuss policy approaches for managing disclosure risk/usefulness trade-offs with blended data. We focus primarily on legislative and regulatory issues, as these are arguably the most consequential for facilitating blended data. As such, we do not discuss the details of other policy approaches, such as policies for secure data enclaves or incentives to ensure blended data are used responsibly and in accordance with laws. Our focus is not intended to understate the importance of such policy approaches; rather, the panel concurs with previous work on these issues (Advisory Committee on Data for Evidence Building, 2021, 2022; National Research Council, 2005). Instead, we focus on remaining gaps in laws and policies that can affect data blending.

Over the course of several decades, many of the levers to address and manage risk for using and sharing blended data have been written into U.S. policy and law. In the panel’s view, several parts of this policy framework are quite strong. Many tools and assets are currently available to help the public, researchers, and policymakers realistically assess the usefulness and potential harms for both current and future data blending. Yet policy gaps—particularly regarding determination of acceptable risk—remain. To address this, the panel believes the data-blending process must be organized, described, and reviewed within a meaningful framework, to effectively create transparency and therefore allow informed determinations of acceptable risk.

In the panel’s view, transparency of the processes used to prepare and analyze blended data can improve understanding of the disclosure risk/usefulness trade-offs for any specific analysis. As noted by others

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

(Commission on Evidence-Based Policymaking, 2017), ingredient data files and outputs of an analysis are often described with little discussion of how data were handled, secured, and managed (including destruction), and how (and how often) organizations or individuals are vetted to access data. Absent clear guidelines and reporting requirements, these decisions may be left to the interacting individuals or organizations. It is especially important to thoroughly document all decision rules used in disclosure risk assessment. Improving the transparency of data processes and access to blended data may bolster stakeholder engagement and enhance confidentiality protections.

CURRENT POLICY FRAMEWORKS FOR BLENDED DATA

The legal and policy framework for managing federal data is extensive (see Commission on Evidence-Based Policymaking, 2017; National Academies of Sciences, Engineering, and Medicine, 2021a for comprehensive reviews). For the purposes of this report, we focus our discussion on the elements of the Foundations for Evidence-Based Policymaking Act of 2018 (hereafter, Evidence Act), Federal Data Strategy, and the Office of Management and Budget’s (OMB’s) proposed “Trust Regulation” most relevant to blended data. This law and subsequent policies provide direction to federal agencies regarding governance and use of data in a range of means. Broadly, this direction includes listing data assets, permitting future data uses, describing coordination mechanisms, determining acceptable risks, and managing access requirements.

Listing Data Assets

Title II of the Evidence Act, known as the OPEN Government Data Act, provided new expectations that federal agencies would publish data inventories, provide metadata, and publish open data assets with increased frequency (Foundations for Evidence-Based Policymaking Act of 2018, 2019). This law codified portions of a preexisting OMB memo on Open Data Policy (M-13-13; Office of Management and Budget, 2013), which required federal data inventories and public listing of federal data assets that could be made publicly available and aggregated on data.gov. M-13-13 also encouraged interoperability across agencies through machine-readable formats, common and extensible metadata, common data standards, data citations, and open licenses—all of which facilitate blended data. These provisions would be implemented partially in the recently proposed “Trust Regulation,” which would require agencies to describe the data product, methods, and procedures used, including coverage and limitations, to data users so that users can evaluate the suitability of the data for a particular

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

purpose (Office of Management and Budget, 2023).1 In the panel’s view, given the volume and breadth of federal data assets involved and the prior implementation experience of M-13-13, coordination will be essential to effective implementation of the OPEN Government Data Act.

Among other provisions, Title III of the Evidence Act, the Confidential Information Protection and Statistical Efficiency Act, also known as CIPSEA 2018, required OMB to establish a standard application process (SAP) “that will be adopted by statistical agencies and units through which agencies, the Congressional Budget Office, State, local, Tribal, and territorial governments, researchers, and other individuals, as appropriate, may apply to access confidential data assets accessed or acquired under CIPSEA by a statistical agency or unit for purposes of developing evidence” (Office of Management and Budget, 2022, p. 1). Guidance for management of the SAP is provided in a 2022 memorandum, M-23-04 (Office of Management and Budget, 2022). Data catalogs describing federal data assets are key elements of the SAP. Per the Evidence Act, these data catalogs are required to “provide core metadata that is standardized across agencies, identify the agency that curates the data, list requirements necessary to be granted access, and include links to the location of complete data documentation about the dataset” (Office of Management and Budget, 2022, p. 9). These elements are essential to identify suitable ingredient files and linking possibilities for blended data. Consistent metadata are necessary to make ingredient data more discoverable, accessible, and linkable for blended data products (Waters, 2023). Given similar efforts in the past (e.g., Inter-University Consortium for Political and Social Research, 2023), maintaining these metadata will require commitment from data holders to ensure accuracy, skill acquisition, and staff training, likely in partnership with third-party facilitators that have resources to manage the extensive and varied files (National Academies, 2022b).

Permitting Future Data Uses

One specific provision of the Evidence Act with the potential for major implications for access and use of blended data is the Presumption of Accessibility in Part D of CIPSEA 2018 (Foundations for Evidence-Based Policymaking Act of 2018, 2019). This provision authorizes federal statistical agencies to use federal program data for statistical uses only, unless directly prohibited by law. The authorization is neither a right nor a guarantee; rather, the provision “to the extent practicable by law” allows for legal and administrative discretion (Foundations for Evidence-Based Policymaking Act of 2018, 2019). If effectively implemented in collaboration with program

___________________

1 See § 1321.6(a)(2).

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

offices, administrative records incorporated into the statistical system will be readily accessible for blending, but blending will shift the nature of risk for both the administrative records and the originating administrative programs. To date, OMB has not issued guidance to implement Part D.

Other provisions of the Evidence Act also have implications for blended data. Under the OPEN Government Data Act, data assets, which could include primary and secondary uses, were explicitly placed under the coordination of a new function in government called a chief data officer—an acknowledgment that the majority of data collected by the federal government are collected outside the federal statistical system. These data are valuable for statistical activities, particularly when blended with surveys or other data assets.

Implementation regulations for these provisions of the Evidence Act are pending. In the interim, guidance on permitting future uses points to the Federal Data Strategy (also known as M-19-18). M-19-18 provides direction to federal agencies regarding broad management activities for navigating future data uses, including principles and practices across the data lifecycle (Office of Management and Budget, 2019). M-19-18 urges federal agencies to “…plan for secondary data uses from the outset through reidentification risk assessments, stakeholder engagement, and sufficient information to assess fitness for use” (Office of Management and Budget, 2019, p. 6). OMB’s recently proposed “Trust Regulation” emphasizes the role of M-19-18 in guiding federal data governance and urges planning for secondary uses (Office of Management and Budget, 2023). As noted in this report’s Introduction, accessing ingredient files for purposes other than specified at the time of collection—whether those purposes were specified formally through informed consent, or through common understanding of administrative record use (such as to receive federal benefits)—can change public expectations about control and use of data. As described in detail later in this chapter, the anticipated presumed access regulation may address some of these concerns, but it is likely that a fuller examination of the issues involved will be needed to maintain and strengthen public trust in the use of reclaimed data.

Coordinating Government Functions

Supporting coordination among data holders is essential for acquiring ingredient data files and producing blended data. Among other provisions, CIPSEA 2018 recognizes the authority of the U.S. Chief Statistician to coordinate statistical data policy across the federal government, including through the establishment of a SAP (Office of Management and Budget, 2022). M-19-18 also provides a framework to connect data functions across agencies by encouraging practices that facilitate cross-government coordination. Such practices include encouraging agency cultures in which

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

data are valued as a key tool for public policymaking. M-19-18 also explicitly encourages federal agencies to share data with and between state, local, and tribal governments, “particularly for programs that are federally funded and locally administered” (Fahey, 2021; Office of Management and Budget, 2019, p. 6).

Risk Assessment

The OPEN Government Data Act affirmed the determination of acceptable risk as a policy choice in the U.S. national legal framework, by directing agencies to consider the risk of disclosure when publishing open data assets (Foundations for Evidence-Based Policymaking Act of 2018, 2019).2 Furthermore, the OPEN Government Data Act directed agencies to consider these risks when developing data catalogs, so that data sensitivity could be consistently and routinely considered, and so information would be transparent to data users. CIPSEA 2018 affirmed that the determination and management of acceptable risk in federal statistical products is the responsibility of federal statistical agencies (Foundations for Evidence-Based Policymaking Act of 2018, 2019).3

The anticipated Presumption of Accessibility rule also has implications for risk assessment. As described in the sections above, the rule would increase federal statistical agencies’ access to administrative data for statistical purposes and statistical activities, subject to administrative feasibility. This provision expands access to data-blending capabilities across the federal government. The rule is also a potential strategy for managing risk in federal statistical products under CIPSEA 2018 policy protections, since it requires federal statistical agencies to conduct risk assessments for data assets acquired or accessed prior to disclosure (Foundations for Evidence-Based Policymaking Act of 2018, 2019).4 As discussed in Chapter 2, it is the panel’s view that such assessments should consider not only reidentification risks but also risks of attribute disclosures and subsequent harms.

Managing Access

CIPSEA 2018 also addresses accessibility of confidential data, assuming protections and safeguards are met.5 These protections pair access to data

___________________

2 See 132 Stat. 5536.

3 See 132 Stat. 5554-5.

4 See 132 Stat. 5554-5.

5 As discussed further in the subsequent section, accountability is essential to ensure the adequacy of protections.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

by trusted intermediaries with articulated principles of confidential access put forth in Statistical Policy Directive (Office of Management and Budget, 2014), also known as the Trust Directive. OMB’s recently proposed “Trust Regulation” would implement the Trust Directive and includes guidance on managing confidential data access, including clarification of roles among chief evaluation officers, chief data officers, and statistical agency heads; planning and compliance requirements; and the roles of parent agencies in supporting their statistical agencies.

The CHIPS Act (2022)6 provides resources to build technical and logistical capacity to coordinate wider access to—and data-blending opportunities for—policy-relevant data, in accordance with federal confidentiality and privacy restrictions. The CHIPS Act directed the National Science Foundation to establish “a demonstration project to develop, refine, and test models to inform the full implementation of the Commission on Evidence-Based Policymaking recommendation for a governmentwide data linkage and access infrastructure for statistical activities conducted for statistical purposes” (National Center for Science and Engineering Statistics, 2023). The intended focus of the demonstration projects is on “[…] novel research collaborations, data linkage methodologies, and privacy preserving technologies and techniques” (National Center for Science and Engineering Statistics, 2023). If successful, the demonstration project will inform whether a National Secure Data Service (NSDS) will be established to provide technical and logistical support for wider data access, including for blended data. To date, 13 demonstration projects have been launched through this mechanism, including projects exploring linking approaches for blended data.

In sum, policy approaches have long provided a foundation for communicating expectations and responsibilities among data holders, data users, and policymakers regarding the management of disclosure risks. Recent legislative and regulatory mandates have facilitated blended data. The relationship of trust that policy approaches provide, which is fundamental to federal data collection, is particularly essential in the acquisition, development, and sharing of blended data. Policy can facilitate the critical coordination among data holders necessary to realize the full potential of blended data.

POLICY MODELS TO MANAGE ACCEPTABLE RISK

The Evidence Act places the responsibility for determining risk with policymakers. In their role as public servants, policymakers are charged with making decisions that consider the interests and needs of a wide range of stakeholders. The means to decide whether disclosure risks (once

___________________

6 See § 10375.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

quantified) are acceptable, given anticipated usefulness and potential harms, are not specified. Potential paths forward include establishing principles and standards for confidentiality protection and data access, defining thresholds of acceptable practice in law, and establishing conceptual paradigms like The Five Safes. While these are useful tools, in the panel’s view, no existing model adequately considers either the higher disclosure risks or the greater potential usefulness of blended data.

Principles and Standards

As mentioned in Chapter 1, for decades, several principles and standards have guided decision making related to managing disclosure risk/usefulness trade-offs associated with federal information collections and statistical products.7 As a decision-making tool, however, principles are limited. They provide aspirational goals but are generally silent about how to achieve goals that appear to be mutually conflicting, such as producing data products with no disclosure risks and high data usefulness. Similarly, standards are challenging in that they describe parameters to assign levels of disclosure risks and steps to be taken to manage those risks; but to date, standards do not account for the anticipated usefulness of data given a specified level of disclosure risks (i.e., the disclosure risk/usefulness tradeoffs summarized in Chapters 1 and 2). Standards are also slow to change in response to changing social conditions. They generally lag behind best practices in the calculation of disclosure risks, statistical disclosure-limitation methodologies, new potential usefulness, or the ways that analysts work with statistical products, including blended data.

Thresholds

Some laws establish acceptable practice explicitly. For example, the Health Insurance Portability and Accountability Act (HIPAA) defines deidentification protocols (the Safe Harbor method) that were set in policy in the mid-1990s to protect health data.8 These protocols are based mostly on suppressing certain data fields and aggregating others. Underlying the

___________________

7 See, for example, the Fair Information Practice Principles (Teufel III, 2008); The Five Safes (Desai et al., 2016; Ritchie & Green, 2020); Statistical Policy Directive 1 (Office of Management and Budget, 2014); De-Identification of Personal Information and De-Identifying Government Datasets (Garfinkel, 2015; Garfinkel et al., 2023); Circular A-130 (Office of Management and Budget, 2016, p. 2); Data Ethics Tenets (General Services Administration, 2020); Principles and Practices for Federal Statistical Agencies (National Academies, 2021a); and Information security, cybersecurity and privacy protection – Privacy enhancing data deidentification framework (International Organization for Standardization, 2022).

8 See 45 CFR § 164.514.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

Safe Harbor rules is the notion that resulting data are too coarse to allow adversaries to reidentify individuals and their health data from data products created with the deidentification protocols.

Threshold models are compelling in that they clearly define acceptable practice. However, threshold models have several limitations in terms of managing risk, especially for blended data. When applying thresholds, it can be difficult to account for various forms of disclosure risks, such as composition effects or attribute disclosures. Thresholds established to prohibit disclosure risks entirely, such as Title 139 and Title 26,10 may be impossible to achieve in the strictest interpretation of the law. Thresholds often rely on vague definitions of what constitutes identifiable information, as evident in the definition in OMB’s recently proposed “Trust Regulation” (Office of Management and Budget, 2023)11 or in the OPEN Government Data Act (Foundations for Evidence-Based Policymaking Act of 2018, 2019).12 The assumptions underpinning these definitions can fail, especially with advances in technology and data availability. In the case of HIPAA, for example, researchers have demonstrated that assumptions regarding the effectiveness of deidentification protocols on coarsened data may no longer be true (Janmey & Elkin, 2018; Ohm, 2009; Sweeney et al., 2017). Finally, thresholds also do not differentiate among degrees of harm (from mild annoyance to life threatening) that could result from disclosure.

Thresholds also fail to reflect the dynamic nature of disclosure risk/usefulness trade-offs. As stated in Chapter 2, no disclosure-protection method offering nontrivial data usefulness ensures zero disclosure risks; some risks need to be permitted to allow even the most responsible sharing. Thresholds that always tip the scale toward privacy and confidentiality protection could potentially sacrifice the anticipated benefits of a blended data product, especially if higher levels of risk would have been acceptable

___________________

9 USC Title 13 § 8(b) and 9 govern the confidentiality of information provided to the Census Bureau and prohibit “any publication whereby the data furnished by any particular establishment or individual under this title can be identified” (U.S. Census Bureau, 2003).

10 USC Title 26 § 6103 governs the confidentiality of information provided to the Internal Revenue Service. With very few exceptions “[r]eturns and return information shall be confidential, and […] no officer or employee […] shall disclose any return or return information obtained by him in any manner in connection with his service as such an officer or an employee” (Internal Revenue Code, 1986).

11 See 1321.2(i): “the term identifiable form means any representation of information that permits the identity of the individual or entity to whom the information applies to be reasonably inferred by either direct or indirect means” (Office of Management and Budget, 2023).

12 See 131 Stat. 5526: “(A) risks and restrictions related to the disclosure of personally identifiable information, including the risk that an individual data asset in isolation does not pose a privacy or confidentiality risk but when combined with other available information may pose such a risk” (Foundations for Evidence-Based Policymaking Act of 2018, 2019).

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

to stakeholders. Thresholds that are too low may not provide adequate privacy and confidentiality protection and, if disclosures or harms result, can erode stakeholders’ trust in agencies.

Even if not intended as such, threshold definitions can impede data blending. For example, the Federal Emergency Management Agency (FEMA) maintains the Individuals and Households Program (IHP) registry, which provides financial assistance and direct services to eligible individuals and households that have uninsured or underinsured necessary expenses and serious needs. To fulfill their mission to protect and reduce harm to the public from disasters, FEMA asked the Census Bureau to prepare a data file that blends administrative microdata with IHP eligibility criteria data. The IHP eligibility data file provides more timely information—needed to respond quickly to disasters—than the IHP registry data file. Because ingredient data files include administrative microdata maintained by the Census Bureau and subject to Title 13, the blending activity is performed by Census Bureau staff. If the program is successful, FEMA could leverage the resulting blended data to communicate with emergency management partners (i.e., federal agencies, partners, nongovernmental organizations, the private sector, the public) to inform decisions. Nonetheless, it is advisable that sharing be done in a way that balances the need for timely and accurate data with acceptable disclosure risks. FEMA has expended considerable effort to develop appropriate workarounds to meet the Title 13 requirement (in this case, arranging the blending activity to be conducted by the Census Bureau and ensuring appropriate levels of data aggregation) when providing fit-for-use data in emergency situations (Waters, 2023). Ultimately, FEMA limited its first-year project with the Census Bureau due to the challenges associated with handling blended data. As a result, FEMA shared subject matter expertise on IHP criteria but no FEMA data.13

Conceptual Paradigms

The Five Safes is an example of a conceptual paradigm to manage—but not eliminate—disclosure risks through policy approaches when sharing and using blended data (Desai et al., 2016; Ritchie & Green, 2020). Although The Five Safes was developed comparatively recently, concepts of the paradigm have been embodied in law and policy in the United States over the past 20 years, including, for example, enactment of CIPSEA 2002, CIPSEA 2018, statistical policy directives, the Federal Committee on Statistical Methodology’s (FCSM’s) Statistical Policy Working Paper #22 (Federal Committee on Statistical Methodology, 2005) and its reframing as the Data Protection Toolkit (Federal Committee on Statistical Methodology,

___________________

13 Personal communication with Julie Waters, November 20, 2023.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

2022), and agency-specific practices and laws. The Five Safes expects agencies to safeguard risks associated with data, people, projects, settings, and outputs. In each of these contexts, agencies in the federal statistical system have some policies and procedures already in place. For example, statistical disclosure-limitation practices are used for safe data and outputs; administrative approaches (such as the Census Bureau’s Special Sworn Status) focus on safe people and projects; and technological or physical approaches can generate safe settings. Secure data enclaves like the Federal Statistical Research Data Centers (FSRDCs), mentioned in Chapter 2, are one way to address three of the five safes in practice. Data enclaves also impact costs, access, convenience, flexibility, and possible uses of data for the evidence-building community.

In the panel’s view, The Five Safes approach should incorporate best practices in disclosure management policy. Routine project reviews or procedures that publicly share active projects, including sharing blended data through public-access portals, provide public accountability and increase usefulness. Protections established through relationships with data holders, such as data-sharing agreements, limited access, and penalties for misuse, reduce incentives for bad behavior; managing individuals who have access to confidential data is the first line of defense against breaches and other misuse. Active vetting of individuals who access data along with strict access approaches offer transparency and thus support auditing by those concerned with data security. Nonetheless, some researchers have noted concerns, as we describe below.

The Five Safes is compelling in its use of plain language to describe data-protection strategies at a high level. In particular, it has been found useful as a way to broaden disclosure-avoidance practice beyond technical solutions alone (e.g., incorporating training; Lane & Schur, 2010; National Research Council, 2014). However, conceptual paradigms are insufficient to manage risk in blended data. For example, researchers have noted that The Five Safes paradigm does not instruct agencies on specific metrics to quantify disclosure risks, including risks from repeated data releases, making it more of a risk-management system than a paradigm that guarantees confidentiality protection (Culnane & Rubinstein, 2023; Culnane et al., 2020). Nor does the paradigm account for quantification of anticipated usefulness and potential harm. As noted in Conclusion 2-1, when considering technical and policy approaches for blended data products, it is important to balance the usefulness of providing the products with the magnified disclosure risks inherent in blended data.

In sum, several federal statutes, regulations, and agency-specific policies and practices describe how agencies are required to determine and manage acceptable disclosure risks. However, these requirements are generally broad and/or vague at the statutory level and may differ in terminology

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

or interpretation at the agency level.14 These differences are due in part to changes in public understanding and preferences related to privacy and data confidentiality over time. Nonetheless, reaching a common understanding of requirements across agencies is a major impediment to preparing and using blended data, and one that is not easily or quickly solvable. Furthermore, policy tools require strong accountability to ensure that disclosure risk/usefulness trade-offs are managed in publicly acceptable ways. Stakeholders have a key role in accountability, but this role has not been formally defined at the federal level.

Below, we describe elements of a framework for managing disclosure risks in blended data. In the panel’s view, this framework would complement anticipated regulatory efforts and aid the development of toolkits to sort out existing (and long-standing) differences in disclosure risk policy across agencies in two ways: first, by promoting transparency and thus improving communication across data holders, data subjects, and data users; and second, by adjusting processes as techniques and policies change to reflect best practice and public preferences.

KEY ELEMENTS OF A MODEL TO MANAGE RISKS IN BLENDED DATA

The Evidence Act assigns federal agencies the responsibility to determine acceptable risks. As discussed in Chapter 2, this determination is advised to consider the interests of a wide range of stakeholders. As previously noted, existing policy models to manage risks can guide decision making but are limited by their aspirational and general nature. These models generally do not reflect current knowledge about disclosure risks; the inevitable compromises between disclosure risk/usefulness trade-offs; and, particularly, the data sharing needed to facilitate blending data. These limitations are especially pertinent because potential disclosure risk/usefulness trade-offs can be complex to communicate with the public and data users, especially for blended data.

In the panel’s view, a model different from those described above is needed for making decisions about acceptable disclosure risks given anticipated usefulness and potential harms of blended data. Important characteristics of this model include transparency in the methods used throughout the blended data lifecycle (see Chapter 2), flexibility to allow for changes in policy and technical approaches (including the anticipated addition of

___________________

14 As described in Chapter 1, examples include the Fair Information Practice Principles, Privacy Impact Assessments, and A-130 as federal regulations that describe the goals and methods of protecting personally identifiable data from unauthorized disclosure but do not describe how acceptable risks should be defined.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

implementation guidance to the Evidence Act), and means for active collaboration with stakeholders.

Adapts to Changes in Policy Approaches

Gaps in managing disclosure risks posed by blended data have long existed in the legal and technical infrastructure. Most policy approaches do not integrate purposefully with technical approaches. For example, the Privacy Act (1974)15 as well as Title II (OPEN Government Data Act)16 and Title III (CIPSEA) of the Evidence Act (Foundations for Evidence-Based Policymaking Act of 2018, 2019)17 require federal agencies to assess and protect against the risk of unauthorized disclosure of personally identifiable information. OMB’s 2007 guidance for implementing CIPSEA 2002 states that “for CIPSEA protected information, the agency as well as any agent accessing the information shall ensure that any dissemination of information based on confidential information is done in a manner that preserves the confidentiality of the information” (Office of Management and Budget, 2007, p. 3371). In addition, as described in the preceding section, several agency-specific statutes use thresholds to describe acceptable disclosure risks, such Title 13, Title 26, the Elementary and Secondary Education Act, and the Family Education Rights and Privacy Act. In these examples, the overall goal and the general process are specified, giving wide latitude to agencies to manage disclosure risks. However, these laws do not acknowledge that any method offering nontrivial data usefulness has nonzero disclosure risks; that is, agencies need to evaluate disclosure risk/usefulness trade-offs when deciding disclosure-protection practices.

Recent changes to the legal infrastructure for statistical products have generated new opportunities for blended data but also new gaps to be filled. The Evidence Act mandated sweeping changes in the ways that federal data could be shared and used to inform policymaking. As noted by the

___________________

15 5 USC 552 (b) “No agency shall disclose any record which is contained in a system of records by any means of communication to any person, or to another agency, except pursuant to a written request by, or with the prior written consent of, the individual to whom the record pertains…” with limited exceptions such as pursuant to conducting a census under Title 13, or law enforcement purposes.

16 44 USC 3504 (b)(6) requires guidance to be issued for agencies to implement the provisions of open government data access in a manner that takes into account “(A) risks and restrictions related to the disclosure of personally identifiable information, including the risk that an individual data asset in isolation does not pose a privacy or confidentiality risk but when combined with other available information may pose such a risk.”

17 44 USC 3582 (b)(3) requires standards to be issued “for each statistical agency or unit to conduct a comprehensive risk assessment of any data asset acquired or accessed under this subchapter prior to any public release of such asset, including standards for such comprehensive risk assessment and criteria for making a determination of whether to release the data.”

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

first report in this series, “[M]any laws and regulations do prohibit federal statistical agencies from using existing data for statistical purposes. [… I]t is assumed that the legislative and regulatory recommendations stemming from the Evidence Act will be initiated, but more work is needed to bolster data safeguards and broaden data access” (National Academies, 2023c, pp. 6–7). As that report described, notable progress has been made in implementing this mandate. Nonetheless, the panel notes that the key provisions of the OPEN Government Data Act and CIPSEA 2018 remain to be implemented, and these relate to opportunities for and management of blended data.

As noted earlier in this chapter, implementation regulations for the Presumption of Accessibility in Part D of CIPSEA 2018 will likely have major implications for access and use of blended data (Foundations for Evidence-Based Policymaking Act of 2018, 2019). For example, this regulation could provide guidance to agencies explicitly acknowledging disclosure risk/usefulness trade-offs when managing data products and, by reference to the OPEN Government Data Act, authorize agencies to set risk parameters that take this trade-off into account. Implementation regulations for Part D could also address concerns regarding informed consent that arise when data collected for a particular purpose are subsequently used for another purpose not previously described to the data subject. In the case of blended data, effectively addressing informed consent is particularly challenging because, in addition to use for expanded purposes, blended data often carry increased disclosure risks.18 To date, OMB has not issued guidance to implement Part D.

As this discussion highlights, legal requirements can and do change.19 Models for decision making need to allow agencies to adapt their data-access management practices to current requirements, while permitting agencies to utilize the best possible technical and policy approaches to accomplish the blending purpose.

Responds to Stakeholders

Like laws and regulations, priorities and values change over time in democratic society. In particular, the social perception and acceptability of disclosure risks as well as societal ideas about usefulness change over time and across contexts. Agencies and data users may be aware of past, current, and future potential risks, usefulness considerations, and harms; however, the way these concepts are prioritized may depend on a range of factors.

___________________

18 The panel identified informed consent in the context of blended data as an area requiring significant further work. See Chapter 5.

19 See Ferencz & Buki (2022) for a review of data access and privacy management methods used by German, French, and Finnish national statistical offices in the European legal context.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

Although a responsibility of statistical agencies, determining acceptable disclosure risk/usefulness trade-offs ideally considers the concerns, interests, and needs of stakeholders.

Responds to Data Subject and Data Holder Concerns

Engaging and maintaining cooperation with data subjects and data holders throughout the data lifecycle is fundamental for information collections, especially for blended data given increased disclosure risks and (often post collection) decisions for data reuse.20 As discussed in the sections above, responsibility for determining acceptable risk is assigned to federal agencies in their roles as data holders. Nonetheless, meaningful and effective community and stakeholder engagement can support dialogues about acceptable risks and anticipated benefits and can potentially enhance data equity. Practices related to engagement as recommended in M-19-18, culturally responsive evaluation techniques, and other inclusive engagement methods may be especially useful, including seeking feedback from marginalized or underrepresented communities and populations with differential risks or unique perceptions of usefulness and harms (Bowen & Snoke, 2023).

In 2017, writing about the evolving risks of reidentification and deidentification of blended data, the Commission on Evidence-Based Policymaking (CEP) appealed for government to “be open and honest with the American public. The public’s trust can be earned only through transparency about risks” (Commission on Evidence-Based Policymaking, 2017, p. 19). Given the availability of data sources for blending and anticipated changes in access, CEP understood that the expected changes in measured and real risks over time would need to be communicated. CIPSEA 2018 recognized this need by requiring the risk-assessment standards and policies used by federal statistical agencies to be transparent, easy to understand, and publicly available (Foundations for Evidence-Based Policymaking Act of 2018, 2019).21 OMB’s proposed “Trust Regulation” requires this content be provided on federal statistical agency websites.

Responds to Data Users’ Needs

Agencies and data users (some of whom may be data subjects) will likely have different priorities in setting acceptable disclosure risk/usefulness

___________________

20 The process of blending itself raises complex issues of informed consent that need to be addressed (Beatty & Scott, 2023). Although not explored in this report, in the panel’s judgment, a future report on this topic would be valuable to the public, policymakers, and practitioners (see Chapter 5).

21 See 132 Stat. 5555.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

trade-offs. A single strategy may not satisfy all parties. Nonetheless, agencies that seek to understand users’ desiderata for accuracy and manner of access have a better chance of releasing blended data products that realize the intended purpose of the blending. Indeed, in its 2022 report, the Advisory Committee on Data for Evidence Building (ACDEB) unanimously issued multiple recommendations to support effective engagement in data sharing. Of these, Recommendation 3.11 explicitly emphasizes the need for OMB to implement enhanced engagement strategies and for the Interagency Council on Statistical Policy (ICSP) to support data-sharing and data-blending activities through the NSDS (Advisory Committee on Data for Evidence Building, 2022). The “Trust Regulation,” proposed in 2023, recognizes the necessity of engaging with stakeholders to maintain and improve the relevance of statistical data, as a core statistical responsibility (Office of Management and Budget, 2023).22

It bears noting that engagement activities are traditionally difficult and costly to implement on a recurring basis. Notably, many of the existing law and policy requirements for engagement apply to certain activities related to open data, program evaluation, and evidence-building activities, but statutory requirements for engagement in statistical data are limited.23

In the panel’s view, engagement with key stakeholders is not only a matter of responsiveness to the public but also an essential matter for identifying, planning, and managing risks with blended data; engagement should be supported by agencies whenever feasible. Engagement also needs to reflect a range of views among stakeholders, including data subjects and data users, as well as the agencies responsible for complying with applicable laws and regulations.

Conclusion 3-1: The effectiveness of a framework for making decisions about acceptable disclosure risks given expected usefulness of data depends on whether that framework is dynamic. A dynamic framework allows for changing policy needs and data availability over time, in a way that accounts for the interests of data subjects, data holders, and data users.

Reflects Differing Levels of Disclosure Risk/Usefulness Trade-offs

The purpose and potential impact of blended data analyses can vary significantly, as can the disclosure risks. Thus, differing levels and modes of data access can be employed to address differing uses of blended data. One policy approach for meeting these varying needs is tiered access, an

___________________

22 See § 1321.5(b).

23 See https://www.datafoundation.org/requirements

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

approach specifically encouraged by CEP and ACDEB. Under tiered access approaches, the agency provides secondary data users access to confidential microdata under certain restrictions, such as the requirement to use a secure data enclave that includes auditing of compliance. In tiered access, the agency also provides forms of the confidential data that have been treated using statistical disclosure-limitation methods or other technical approaches suitable for the purposes of those forms. One perspective is that any data analysis that potentially sheds light on questions of major societal importance (e.g., potentially leading to significantly improved or more equitable economic, educational, environmental, or health policies) should be performed using high-quality confidential data24 rather than a redacted version. So, too, should decisions with large public financial implications. In contrast, exploratory analyses or data-analysis training exercises may be adequately performed with redacted data (e.g., synthetic data) especially when accompanied by verification/validation services as described in Chapter 2. Various tiers of access can form an integrated system. For example, for the blended survey and administrative data from the Survey of Income and Program Participation, users start with and develop code on a synthetic dataset that has low barriers for access, submit requests to the Census Bureau for their code to be run on the confidential data (a validation service), and receive disclosure-protected results. If users wish to perform additional analyses, they apply to use the confidential data directly in FSRDC enclaves.

All principal statistical agencies manage some type of tiered access system in practice. Minimally, they manage a system by which some data are restricted, and some are open and publicly accessible. The open data designation is established under individual statutory frameworks in the United States, and data are specified as open by default under the OPEN Government Data Act for nonconfidential, restricted data in federal agencies, as long as those data are otherwise disclosable under the Freedom of Information Act.

Additional tiers of data access can be established with varying bases in law, regulation, and policy. Altman et al. (2015) proposed tiered access based on the level of harm caused by disclosure. In this view, high-risk data require additional constraints and obligations on data users. These mechanisms can be established in policy and process. Similarly, Sweeney et al. (2015) proposed the data tag system, which allows data to be accessed and handled with a range of security and credential measures. The system contains multiple levels of control and access, accompanied by varying

___________________

24 As described in Chapter 2, even when analyses are done on confidential data, agencies still need to consider disclosure risk/usefulness trade-offs when releasing the outputs. Agencies may need to apply statistical disclosure-limitation techniques to outputs before sharing them with the public.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

levels of security (National Academies, 2023c). Similar tiering systems involving a mix of technical and policy approaches have been recommended (National Academies, 2023b).

CEP also endorsed tiered access as an approach for data minimization—that is, sharing only the data necessary for the data-blending purpose—and specifically for reducing “the risk of unauthorized use and unintended harm to individuals” (Commission on Evidence-Based Policymaking, 2017, p. 38). CEP recognized that multiple agencies were already using some type of tiered access, citing the National Center for Health Statistics as a model. CEP went so far as to provide a model of sensitivity levels that could be implemented by federal agencies (Commission on Evidence-Based Policymaking, 2017, p. 40). This model was integrated into CEP’s recommendation that researchers applying for access to government data apply through a common access point, which would help integrate the framework for applying common sensitivity analysis (Commission on Evidence-Based Policymaking, 2017). That recommendation was enacted as part of the Evidence Act and has begun to be implemented, in part through the SAP (Office of Management and Budget, 2022, pp. 18–19).

CEP also called for increased publication of metadata on government data assets, to support analysis of data sensitivity in context. This provision was also included in the Evidence Act. M-23-04 issues guidance to federal agencies on minimum metadata requirements (Office of Management and Budget, 2022), but more work is needed to implement the provision consistent with a data-sensitivity perspective.

A dynamic framework can specifically incorporate tiered access models to help agencies manage the variety of disclosure risk/usefulness tradeoffs posed by blended data. Various access approaches can be applied to the same underlying data based on specific needs, project usefulness, and potential risks. Importantly, data sensitivity and risks may change over time, so it is the panel’s opinion that tiers and risk assessments should not be static but instead need to be periodically reviewed. Tiered access models can be strengthened with periodic independent audits of data users to assess compliance with confidentiality-protecting procedures, rules, laws, and pledges. The panel further advises that stakeholder feedback be sought on how best to achieve tiered access within a dynamic disclosure risk management framework to better ensure that determinations of acceptable risk are informed by public interests.

Conclusion 3-2: Tiered access for data users and agencies is a key component of a dynamic disclosure risk/usefulness trade-off, framework, to reflect differences in acceptable risks given policy priorities.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

Facilitates Coordination Among Decision Makers

A model to manage disclosure risk/usefulness trade-offs in blended data ideally supports coordination and collaboration across agencies. Careful coordination can enhance data usefulness—for example, by making it easier to share identifiers for record linkage or to pool records from multiple data sources—thereby potentially making disclosure risks worthwhile. Undercoordination, however, can increase risks when deploying disclosure-limitation techniques and conducting risk assessments, including inappropriate assessment of disclosure risk/usefulness trade-offs.

Although there are several well-established coordination mechanisms for federal statistical policy, such as those of ICSP and FCSM (including FCSM’s interest group, the Confidentiality and Data Access Committee), in practice each federal statistical agency establishes its own policy for acceptable risk as well as the methods to manage that risk. The updating and reframing of FCSM’s Statistical Policy Working Paper #22—a cornerstone guidance on disclosure avoidance for federal statistical agencies—as a “living” Data Protection Toolkit reflects the rapid technical and policy changes occurring in disclosure avoidance (Federal Committee on Statistical Methodology, 2022). The landscape is even more complicated by the Evidence Act, which added 14 additional agencies to ICSP and another approximately 100 federal agencies that have some statistical program activities (Office of Management and Budget, 2023).

Agencies’ views are not homogenous or static. Across and within agencies, determination of acceptable data blending and sharing is often approached after considering multiple (and differing) legal requirements and protections. This is evident from the Data Protection Toolkit, which currently functions as a limited repository of federal statistical agency–specific disclosure-avoidance plans25 and training materials rather than as a coordinated approach reflecting commonly held terminology. In addition, these legal authorities also change over time. Furthermore, challenges inherent in coordination of the data sharing and management necessary for blended data are not limited solely to federal agencies. In fact, many federal data products rely on data flows originating from (and subsequently aggregated across) multiple states, localities, and tribes.26 These examples

___________________

25 Disclosure-avoidance plans from the Census Bureau, Statistics of Income, and the National Center for Education Statistics are posted on the Data Protection Toolkit site at the time of this writing.

26 Over decades, the federal statistical system built successful partnerships with these entities. For example, the vital records infrastructure maintained in partnership with the National Center for Health Statistics and the Census Bureau’s Longitudinal Employer-Household Dynamics program are examples of voluntary partnerships between federal statistical agencies and others. Additional examples involving educational data are discussed in Chapter 4.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

underscore the essential nature of strong relationships among multiple data holders for the production of robust federal statistical products.

Coordinates Data Sharing Among Federal Agencies

As described earlier in this chapter, through the Evidence Act, Congress provided authority to blend data under the presumption of accessibility. The presumed-access provisions in Part D of CIPSEA 2018 allow even greater opportunities for data sharing among federal agencies, which is necessary to produce blended data statistical products. However, coordination can present a substantial barrier for data sharing, perhaps even greater than technical approaches or legal authority (Committee on Economic Statistics, 2022). Implementing this mandate requires OMB to coordinate across agencies to manage (at times, differing) technical understanding of disclosure risks; determine acceptable risks among relevant agencies jointly; communicate acceptable risk agreements; and manage complexities of blended data risks, which may include investments in both technical infrastructure and workforce. Clear guidance for these coordination activities, documented by agencies in “[…] Memoranda of Understanding (MOUs) [needs to be developed], approved, and maintained to permit data sharing across federal agencies and with nonfederal entities” (Advisory Committee on Data for Evidence Building, 2022, p. 26).

Among other tasks, Title I of the Evidence Act charged ACDEB with evaluating and providing recommendations on ways to facilitate data sharing, which include reviewing the coordination of data sharing across the federal government. The 2021 and 2022 ACDEB reports called for streamlining the data-sharing process by “[…] developing standard MOUs; clarifying the legal framework for sharing data across states and agencies; expanding the use of existing templates; and [establishing] best practices, including encouraging multi-year agreements that anticipate recurring needs” (Advisory Committee on Data for Evidence Building, 2022, p. 26). ACDEB advised that this work both identify legal and policy impediments to data sharing and develop a consistent data-sharing approach, and that findings from these efforts could form the basis of recommendations for changes in laws and regulations (Advisory Committee on Data for Evidence Building, 2022).

As noted in the first report of this series, coordination could also be strengthened to support data sharing across federal, state, local, and tribal governments (National Academies, 2023c, p. 47). Consider the example of the Longitudinal Employer-Household Database (LEHD). Many states seek to link their data to federal LEHD data. Yet, to the panel’s understanding, each state is required to pursue a negotiated agreement that is specific to its proposed primary use case according to legal infrastructure that may

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

also vary across states. Model agreements, such as those used through the Coleridge Initiative (2023), can help streamline data access, but these alone are insufficient to address demand. Absent further federal regulatory and policy action27 applicable to CIPSEA, Title 13, and Title 26 confidentiality laws, developing streamlined data-sharing processes will continue to rely upon significant engagement and coordination among state, local, and tribal data holders. In its 2022 report, ACDEB acknowledged that differences in funding and power dynamics (originating in statute or other authorities) could affect federal, state, local, and tribal data-sharing partnerships, and thus opportunities for and management of blended data. Accordingly, ACDEB emphasized the need for routine engagement with user communities and partner groups, including but not limited to discussions of the value of data uses and evolving needs of data holders (Advisory Committee on Data for Evidence Building, 2022, pp. 54–55).

Coordinates Data Access for Researchers

Coordination is also necessary for future efforts to streamline and improve access to blended data products for researchers. The SAP (Office of Management and Budget, 2022) required by the Evidence Act is intended to coordinate across statistical agencies to create a uniform process for applying for access to confidential datasets. Consolidating application processes benefits risk management by centralizing information about points of access and specific uses, making it easier to incorporate this information into risk assessments and communicate it to decision makers and the public. As of this writing, only some statistical data assets are included in the SAP, which limits its applicability for blended data across government data assets. In 2022, the CHIPS directed the National Science Foundation’s National Center for Science and Engineering Statistics to establish a pilot project for the NSDS (CHIPS Act, 2022). These two new means for accessing confidential information for statistical purposes are intended to streamline authorized access processes.

Coordinates Risk Management

Managing disclosure risks is particularly challenging for blended data, given the coordination necessary to understand disclosure risks and plan for disclosure protections applied to ingredient data. As noted in Chapter 2, risk assessments by a single agency, without knowledge of actions taken by another agency, may affect the comprehensiveness and accuracy

___________________

27 Such as the anticipated Presumed Access regulation, additional guidance to SAP, and fuller implementation of the NSDS described in the sections above.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

of each individual agency’s assessment. The coordination challenge also has implications for characterizing the usefulness dimension of disclosure risk/usefulness trade-offs. For example, consider an agency that applies disclosure-limitation techniques to one of its datasets that will also be used by another agency in a blended data product. The disclosure treatments to the ingredient file might degrade the accuracy of a tabulation or statistical analysis done with blended data, which could affect the quality of analyses or the representativeness of blended data (Long, 2020, p. 3).

In summary, implementation of the Evidence Act’s provisions provides opportunities for participating agencies to collaborate throughout the blended data lifecycle to anticipate, produce, and reflect on potential blended data needs and opportunities, and to manage associated risks. New opportunities and innovations, such as standard MOUs for data sharing, may emerge from additional coordination and new perspectives.

Provides a Common Lexicon for Effective Collaborations

As part of collaborations to identify best practices, the panel believes that a common lexicon for defining privacy and confidentiality would facilitate disclosure risk management for blended data. Drawing inspiration from the financial regulatory world and other regulatory environments, where the notion of a “standard of care” is explicitly outlined to set objectives and ensure accountability, a similar approach may be applicable to privacy and confidentiality.

Developing a clear and standardized language that accounts for degrees of privacy and confidentiality could provide specific guidance on the standards required for various purposes. Differentiation by situation might not only enhance policy approaches but also facilitate understanding and compliance among stakeholders. A differentiated approach with standardized language could promote transparency, effectiveness, and accountability while acknowledging contextual variations in the need for privacy protection. A common lexicon incorporated into staff training on disclosure avoidance would aid cross-agency discussions, particularly those led by the office of the U.S. Chief Statistician. Promoting a common lexicon does not mean that privacy requirements should become inflexible or mechanized (Nissim, 2023). Rather, standardized language would facilitate integration of privacy policy with technical approaches.

To advance this common lexicon, it may be helpful to assess the specific terms used in federal confidentiality and privacy laws governing federal statistical agencies, possibly by using a matrix of sorts. This assessment could be developed and shared through the FCSM Data Protection Toolkit, which has been revised to serve as a repository to exchange best practices (Federal Committee on Statistical Methodology, 2022).

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.

Conclusion 3-3: A common, cross-disciplinary language and lexicon describing privacy and confidentiality risks and harms, as well as data usefulness, is needed. Interpretable and measurable terms can promote meaningful discussions among stakeholders, including data subjects and decision makers.

Documents Calculations, Assumptions, and Decisions

At its heart, a model to manage risk in blended data ideally supports transparency. Documenting the process of preparing and analyzing blended data facilitates policymakers’, users’, and the public’s understanding of the potential benefits and limitations of blended data products. As noted by others (Commission on Evidence-Based Policymaking, 2017), ingredient data files and outputs of an analysis often come with little discussion of how data were handled, secured, and managed (including destruction), or of how organizations or individuals are vetted to access data. Documenting assessments of disclosure risks and anticipated usefulness, assumptions of potential harms, and decisions regarding acceptable risks provides an opportunity to describe the processes used and to consider how anticipated policy approaches can be implemented in the future. When documented, such processes can serve as tools to communicate with data subjects and data users, and as decision-making mechanisms that can incorporate feedback and address concerns. By clearly delineating assessments of disclosure risk/usefulness trade-offs as well as assumptions of potential harm, a documented process can justify tiered access, including revisions to access if risks, usefulness, or assumptions change over time. Documenting the decision-making process supports communication across agencies and can build trust, particularly when concerns are identified, acknowledged, and (ideally) resolved.

Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 43
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 44
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 45
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 46
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 47
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 48
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 49
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 50
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 51
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 52
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 53
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 54
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 55
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 56
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 57
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 58
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 59
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 60
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 61
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 62
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 63
Suggested Citation: "3 Policy Approaches to Managing Risks When Sharing Blended Data." National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. doi: 10.17226/27335.
Page 64
Next Chapter: 4 A Model Framework for Decision Making When Sharing Blended Data
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.