
Consensus Study Report
NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
This activity was supported by a grant between the National Academy of Sciences and the National Science Foundation (SES-2114583). Support for the work of the Committee on National Statistics is provided by a consortium of federal agencies through a grant from the National Science Foundation, a National Agricultural Statistics Service cooperative agreement, and several individual contracts. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.
International Standard Book Number-13: 978-0-309-71238-5
International Standard Book Number-10: 0-309-71238-6
Digital Object Identifier: https://doi.org/10.17226/27335
Library of Congress Control Number: 20249325
This publication is available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu.
Copyright 2024 by the National Academy of Sciences. National Academies of Sciences, Engineering, and Medicine and National Academies Press and the graphical logos for each are all trademarks of the National Academy of Sciences. All rights reserved.
Printed in the United States of America.
Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2024. Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27335.
The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president.
The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
Rapid Expert Consultations published by the National Academies of Sciences, Engineering, and Medicine are authored by subject-matter experts on narrowly focused topics that can be supported by a body of evidence. The discussions contained in rapid expert consultations are considered those of the authors and do not contain policy recommendations. Rapid expert consultations are reviewed by the institution before release.
For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.
JEROME P. REITER (Chair), Professor of Statistical Science, Duke University
CLAIRE MCKAY BOWEN, Senior Fellow and Statistical Methods Group Lead, the Urban Institute
ALONI COHEN, Assistant Professor, Department of Computer Science and Data Science, University of Chicago
DIANA FARRELL, Independent Director and Trustee (at various institutions, including the Urban Institute and the National Bureau of Economic Research)
ROBERT M. GOERGE, Senior Research Fellow, Chapin Hall at the University of Chicago
NICHOLAS HART, President and Chief Executive Officer, Data Foundation
HOSAGRAHAR V. JAGADISH, Edgar F. Codd Distinguished University Professor and Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science, University of Michigan
DANIEL KIFER, Professor of Computer Science, The Pennsylvania State University
KAREN LEVY, Associate Professor of Information Science, Cornell University
SALOMÉ VILJOEN, Assistant Professor of Law, University of Michigan
MARK WATSON, Vice President and Director of the Center for the Advancement of Data and Research in Economics at the Federal Reserve Bank of Kansas City (formerly)
JENNIFER PARK, Study Director
BRADFORD CHANEY, Senior Program Officer
ANTHONY S. MANN, Program Coordinator
KEVONA JONES, Senior Program Assistant
This page intentionally left blank.
KATHARINE G. ABRAHAM (Chair), Distinguished University Professor, University of Maryland, College Park
MICK P. COUPER, Research Professor, University of Michigan
DIANA FARRELL, Independent Director and Trustee (at various institutions, including the Urban Institute and the National Bureau of Economic Research)
ROBERT M. GOERGE, Senior Research Fellow, Chapin Hall at the University of Chicago
ERICA L. GROSHEN, Senior Economics Advisor, Cornell University
DANIEL E. HO, William Benjamin Scott and Luna M. Scott Professor of Law, Stanford University
HILARY HOYNES, Professor of Economics and Public Policy, University of California, Berkeley
DANIEL KIFER, Professor of Computer Science, The Pennsylvania State University
SHARON LOHR, Professor, Arizona State University (emerita)
NELA RICHARDSON, Senior Vice President and Chief Economist, ADP Research Institute
C. MATTHEW SNIPP, Burnet C. and Mildred Finley Wohlford Professor of Humanities and Sciences, Stanford University
ELIZABETH A. STUART, Chair of the Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
MELISSA CHIU, Director
CONSTANCE F. CITRO, Senior Scholar
BRIAN HARRIS-KOJETIN, Senior Scholar
This page intentionally left blank.
This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process. We thank the following individuals for their review of this report:
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report, nor did they see the final draft before its release. The review of this report was overseen by CYNTHIA CLARK, independent consultant, McLean, Virginia, and KATHLEEN MULLAN HARRIS, Department of Sociology, University of North Carolina. They
were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring panel and the National Academies.
This report, Toward a 21st Century National Data Infrastructure: Managing Privacy and Confidentiality Risks with Blended Data, is the result of contributions from many colleagues, whom we thank for sharing their time and expertise. The panel was funded by the National Science Foundation, and we are especially indebted to Cheryl Eavey.
The panel benefited greatly from the presentations provided during the virtual public workshop held on May 22, 23, and 25, 2023. We thank Dr. Eavey for framing the opening of the public workshop with her reflections of the contribution of the Data Visioning series to the overall program of the National Science Foundation. We also thank the former chair of the Committee on National Statistics, Robert M. Groves, for his leadership and vision enabling the report series and sharing his reflections at the launch of the public workshop.
We also thank the presenters and audience members for their informative discussion.1 (See Appendix A for the workshop agenda and Appendix B for biographies of the workshop presenters.)
The panel could not have conducted its work without the capable staff at the National Academies of Sciences, Engineering, and Medicine. Melissa Chiu, Director of the Committee on National Statistics, and Brian HarrisKojetin (Senior Scholar) provided invaluable support throughout the panel’s activities, and their insightful comments improved the workshop and report.
___________________
1 A video and transcript of the virtual public workshop event can be found at https://www.nationalacademies.org/event/05-22-2023/approaches-to-sharing-blended-data-in-a-21st-century-data-infrastructure-public-workshop-3-days-virtual
Anthony Mann and Kevona Jones ensured the smooth operation of the workshop and other panel activities and assisted with report production. We thank Theresa Patten (Mirzayan Science and Technology Policy Fellow) for her participation. Kirsten Sampson-Snyder and Bea Porter organized the review process, and Susan Debad’s thorough editing improved the readability and accessibility of the report. We are grateful to all of them for their contributions and assistance.
Spark Street Digital supported the technological aspects of the virtual workshop and produced the video of the event. Finally, we thank the members of the Panel on Approaches to Sharing Blended Data in a 21st Century Data Infrastructure, listed on page v. The panel members generously volunteered their broad and outstanding expertise to guide the workshop, gather evidence, and develop the report. (See biographies in Appendix C.) The final report reflects the commitment and expertise of all panel members.
Jerome P. Reiter, Chair
Jennifer Park, Study Director
Panel on Approaches to Sharing Blended Data in a 21st Century Data Infrastructure
PRIVACY AND CONFIDENTIALITY IN DATA USED FOR EVIDENCE-BASED POLICYMAKING
Disclosure Risks and Disclosure Harms
Technical and Policy Strategies for Reducing Disclosure Risks and Harms
Characterizing Usefulness of Blended Data Can Assist Decision Making
EXISTING FEDERAL REGULATIONS AND GUIDELINES
2 Technical Approaches to Managing Risk When Sharing Blended Data
DISCLOSURE RISKS CAN BE MAGNIFIED WITH BLENDED DATA
Naïve Applications of Classical Statistical Disclosure Limitation
Disclosure Limitation Using Formal Privacy
QUANTIFYING AND MANAGING DISCLOSURE RISK/USEFULNESS TRADE-OFFS
ENGENDERING TRUST THROUGH TRANSPARENCY AND COMMUNICATION
End-to-End Data-Quality Studies
Early and Continual Engagement with Stakeholders and the Public
External Analysis of Work in Progress
3 Policy Approaches to Managing Risks When Sharing Blended Data
CURRENT POLICY FRAMEWORKS FOR BLENDED DATA
Coordinating Government Functions
POLICY MODELS TO MANAGE ACCEPTABLE RISK
KEY ELEMENTS OF A MODEL TO MANAGE RISKS IN BLENDED DATA
Adapts to Changes in Policy Approaches
Reflects Differing Levels of Disclosure Risk/Usefulness Trade-offs
Facilitates Coordination Among Decision Makers
Provides a Common Lexicon for Effective Collaborations
Documents Calculations, Assumptions, and Decisions
4 A Model Framework for Decision Making When Sharing Blended Data
1. Determine Auspice and Purpose of the Blended Data Project
2. Determine Ingredient Data Files
3. Obtain Access to Ingredient Data Files
4. Blend Ingredient Data Files
5. Select Approaches That Meet the End Objective of Data Blending
6. Develop and Execute a Maintenance Plan
Case Study A: Blending Federal Data
Case Study B: Blending Federal and State Data
Case Study C: Blending State Data
RESEARCH NEEDS FOR TECHNICAL AND POLICY APPROACHES
Accounting for Effects of Composition in Practice
Obtaining Stakeholder Feedback
Communicating Disclosure Risk/Usefulness Trade-offs to the Public
TWO AREAS DESERVING IN-DEPTH STUDY
Research Computing and Data Workforce
Appendix A Workshop Event Agendas
Appendix B Biographical Sketches of Workshop Event Presenters
This page intentionally left blank.
This page intentionally left blank.
| ACDEB | Advisory Committee on Data for Evidence Building |
| ADRF | Administrative Data Research Facility |
| CDAC | Confidentiality and Data Access Committee (of the Federal Committee on Statistical Methodology) |
| CEP | Commission on Evidence-Based Policymaking |
| CIPSEA | Confidential Information Protection and Statistical Efficiency Act |
| CNSTAT | Committee on National Statistics |
| ED | Department of Education |
| FCSM | Federal Committee on Statistical Methodology |
| FEMA | Federal Emergency Management Agency |
| FERPA | Family Educational Rights and Privacy Act |
| FIPPs | Fair Information Practice Principles |
| FSRDC | Federal Statistical Research Data Center |
| HIPAA | Health Insurance Portability and Accountability Act |
| ICSP | Interagency Council on Statistical Policy |
| IHP | Individuals and Households Program |
| IRC | Internal Revenue Code |
| IRS | Internal Revenue Service |
| KLDS | Kentucky Longitudinal Data System |
| LBD | Longitudinal Business Database |
| LEHD | Longitudinal Employer-Household Database |
| MOU | Memorandum of Understanding |
| NASEM | National Academies of Sciences, Engineering, and Medicine |
| NSDS | National Secure Data Service |
| OLDA | Ohio Longitudinal Data Archive |
| OMB | Office of Management and Budget |
| PIA | Privacy Impact Assessment |
| PII | Personally Identifiable Information |
| PIK | Protected Identification Key |
| PPRL | Privacy-Preserving Record Linkage |
| PSEO | Post-Secondary Employment Outcomes dataset (of the Census Bureau) |
| PSI | Private Set Intersection |
| RCD | Research Computing and Data |
| SAP | Standard Application Process |
| SMC | Secure Multiparty Computation |
| SOI | Statistics of Income Program (of IRS) |
| SSA | Social Security Administration |