Page 15 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

1

Introduction

Concerns about privacy have changed over time, and the Census Bureau has changed its approach in response. The earliest (1790) census takers were required to post census lists in the town square for people to check and revise.¹ Until 1929, census records could be bought and shared. Until 1976, the Census Bureau’s director had the discretion to grant disclosure exemptions.

The Census Bureau first began making promises of confidentiality to businesses as a way of addressing poor response rates in 1840, while the 1850 Census was the first one in which responses were not posted publicly. Still, promises were not always kept. During World War I, census records were used to support the military draft, despite President Taft’s earlier promises of confidentiality. Protections added in 1940 were overturned by the Second War Powers Act in 1942 (though it expired in 1947). As a way of more securely establishing the right to confidentiality, Congress passed Title 13 in 1954 (amended in 1962 and 1990), which both places requirements on the Census Bureau and protects the data from others: that is, “it is against the law to publish any private information that identifies an individual or business such. . .[as] including names, addresses [including GPS coordinates], Social Security Numbers, and telephone numbers.”²

___________________

¹https://www.census.gov/library/visualizations/2019/comm/history-privacy-protection.html

²https://www.census.gov/history/www/reference/privacy_confidentiality/title_13_us_code.html#:~:text=It%20is%20against%20the%20law,Security%20Numbers%2C%20and%20telephone%20numbers

Page 16 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

Over time, the Census Bureau has also changed its statistical procedures for protecting data, starting with the suppression or compression of potentially disclosive data in 1920 to the progressive adoption of techniques such as data swapping, whole table suppression, rounding, top-coding, and bottom-coding. Most recently, for the 2020 decennial census, the Census Bureau adopted differential privacy (Abowd et al., 2022)—a change that has provoked debate.

American society also has changed, with growth in the ways data are collected and used, including the combination of multiple databases to create more complete records. Nissenbaum (2010) notes some of the important changes:

“People going about their daily business in urban settings can expect to have their images monitored and recorded an average of 300 times a day by thirty separate CCTV systems” (p. 33);
Event-data recorders in modern cars record data such as engine speed, safety belt status, status of brakes during a crash, and acceleration;
Every interaction in cyberspace is recorded;
Database technologies have become available to a broad community of individual and institutional users;
Advances in information science allow the search and retrieval of data dispersed across disparate locales; and
Advanced statistical techniques allow meaning to be drawn from data that goes beyond the data’s raw meaning.

The rapid growth of artificial intelligence has the potential to further increase disclosure risks, increasing the capacity to search across multiple data sources in order to find data that will identify a respondent.

These changes place the Census Bureau’s Survey of Income and Program Participation (SIPP) in an increasingly difficult position. The survey collects data that are highly personal and confidential, including data on income, financial assets and liabilities, and household structures. Such data would be of interest to many, including data aggregators, lenders, advertisers, and identity thieves. Based on Title 13, the Census Bureau is legally and ethically bound to protect the confidentiality of SIPP survey respondents. As a practical matter, promising privacy is also a valuable tool for reassuring survey respondents that they may safely provide accurate information, affecting both survey response rates and data quality.

Yet what is needed to protect confidentiality? The simplest and most traditional approach has been to strip clearly identifying information from the data, such as names and addresses. Beyond that, the Census Bureau has used tools such as top-coding, bottom-coding, data coarsening, and

Page 17 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

perturbations to help protect the privacy of the data. Even with these measures, a problem remains, because it is not just single pieces of data that may identify a person but clusters of multiple pieces of information that may collectively identify a person: “it is well known that a small set of attributes can single out an individual in a population” (Privacy Preserving Techniques Task Team, 2023, p. 9). Nissenbaum (2010) writes the following:

Data subjects and third-party harvesters alike are keenly aware of qualitative shifts that can occur when bits of data are combined into collages. This is, surely, one of the most alluring transformations yielded by information sciences and technologies. It is anything but the case that an assemblage of bland bits yields a bland assemblage. The isolated bits may not be particularly revealing, but the assemblages may expose people quite profoundly. (p. 123)

Through re-identification studies of the 2010 Census, the Census Bureau found that one in six individuals in the U.S. population could be re-identified using publicly available data (Bowen, 2022, p. 37) by using the Census Protected Identification Key, block, sex, and age:

Our simulated reconstruction-abetted reidentification attack demonstrated that the tabular summaries from the 2010 Census can be converted into a 100% microdata file with geographic precision to the census block-level. Our simulated attack demonstrated that, depending on the quality of the external data used, between 52 and 179 million respondents to the 2010 Census can be correctly re-identified from the reconstructed microdata (Hawes, 2021).

SIPP differs from the decennial census in many ways, but one core distinction is that it is based on statistical sampling. In a census, it is readily apparent which combinations of characteristics will uniquely identify individuals; in a sample survey such as SIPP, one can use a small set of attributes to uniquely identify individuals among the survey respondents, but it is less clear whether those same attributes are unique within the entire population. The use of sampling does not change whether a particular combination of characteristics is unique, but it changes whether one can know that the combination uniquely identifies an individual in the population. However, there are tools for estimating the likely success of an attempt at re-identification: one study, using generative models, estimated that “99.98 percent of Americans would be correctly re-identified in any dataset using 15 demographic attributes” (Rocher et al., 2019). Furthermore, if external data are available for the entire population, then the potential for determining that an individual is uniquely identified is greatly increased.

Two practical examples illustrate the issues that SIPP presents with respect to protecting confidentiality. In SIPP’s 2020 data, there was only one

Page 18 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

respondent who had the following characteristics: a Black female in Idaho. However, given that 0.9 percent of people in Idaho in 2020 were Black,³ it seems highly unlikely that such a person is unique in the entire population of Idaho. On the other hand, SIPP data also show a household in Florida with a Black male born in 1946 married to, and sharing the household with, an Asian female born in 1941 and with a child born in 1968 in the household; given the multiracial nature of the household and the combination of ages, such a combination might very well be unique in Florida. Moreover, many other types of characteristics might be added in from SIPP, such as occupation, educational level, and homeowner/renter status, to make the set of potential matches for the respondent small. Thus, a household that has not already been uniquely identified in the sample has an increased likelihood of being identified by including data on additional characteristics. This is one way in which SIPP may be more disclosive than the decennial census, despite being based on a statistical sample: it contains a substantial amount of highly detailed information, also including changes from one year to another. The potential ability to precisely identify unique households gives good reason to examine disclosure avoidance protections carefully.

Though SIPP faces challenges with regard to protecting confidentiality, these challenges can be met. The Commission on Evidence-Based Policymaking (2017, p. 1) wrote the following:

Traditionally, increasing access to confidential data presumed significantly increasing privacy risk. The Commission rejects that idea. The Commission believes there are steps that can be taken to improve data security and privacy protections beyond what exists today, while increasing the production of evidence. Modern technology and statistical methods, combined with transparency and a strong legal framework, create the opportunity to use data for evidence-building in ways that were not possible in the past.

STATEMENT OF TASK

To determine the most appropriate disclosure avoidance procedures to use for SIPP, the Census Bureau asked the National Academies of Sciences, Engineering, and Medicine to convene an expert panel consisting of experts in statistics, survey methodology, economics, computer and data science, policy evaluation, and sociology.

The following charge (Box 1-1) was given to the expert panel. Numbering has been added to make it easier to refer to different parts of the

___________________

³https://www.census.gov/library/stories/state-by-state/idaho-population-change-between-census-decade.html#:~:text=Race%20and%20ethnicity%20(White%20alone,%25%2C%20up%20from%2054.9%25)

Page 19 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

BOX 1-1
Statement of Task

The National Academies of Sciences, Engineering, and Medicine will appoint an ad hoc panel to consider the challenges facing the protection and release of public-use data from SIPP. As part of its fact gathering, the panel will consider:

the evolving privacy risks to releasing survey data;
developments in methods for protecting privacy and reducing risks of disclosure, including formal privacy methods being implemented at the Census Bureau;
the dimensionality and longitudinal nature of SIPP data;
the linking of SIPP data with administrative data;
existing SIPP data products and the utility of detailed public-use microdata that enable scientific discovery;
selected other SIPP data products, such as a small area estimates program for key SIPP measures; and
the need for protecting the confidentiality of SIPP data, potentially across multiple data releases, while providing timely access for the many research uses of SIPP.

The panel will produce a report with conclusions and recommendations for disclosure protection and data provision from the SIPP program.

statement of task and is not meant to imply either a sequential process or prioritization of the different elements.

HOW THE PANEL DID ITS WORK

The panel’s first step was to examine the statement of task to determine what information would be needed. The panel first met with the Census Bureau to discuss its view of the task statement. After internal discussions about the statement, the panel decided there were four key questions to answer:

What are the needs of researchers and the general public in terms of making use of SIPP data? This broad question has several components. The panel needed to know which data are used by researchers and what types of statistical analysis are performed. For example, one potential solution might be to divide the data into modules if different researchers only use particular sections of the data, and another solution might to be provide an online table builder, but the value of this solution would depend on researchers’ need to manipulate the data or perform complex statistical

Page 20 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

analyses. This question particularly relates to items 3 (restructuring or dividing the data), 4 (a potential benefit of SIPP being the capacity of its data to be combined with other types of data), 5 (the value of SIPP data), and 7 (supporting the use of SIPP) in the statement of task.
What are the disclosure risks facing SIPP, and how can they best be assessed? If there are no current disclosure risks, then the work of the Census Bureau is greatly simplified, though it still might be wise to plan for those days in which disclosure risks increase as more types of data become available from alternative sources. The Census Bureau will benefit from effective tools to determine which data items present the greatest risk of disclosure and to verify whether its disclosure avoidance practices are effective. This question particularly relates to items 1 (evolving privacy risks) and 3 (challenges presented by the level of detail and measurement of change over time).
What tools are available for addressing disclosure risks in SIPP, and how do they compare? Identifying the problem is only a first step; the next is to develop a solution, or package of solutions, to address the problem. This question particularly relates to item 2 (methods of protection) and how it intersects with items 3 (affecting the complexity of disclosure avoidance), 4 (bringing in additional data, which may increase disclosure risks and involve the need to respect the requirements of other data providers), and 6 (increasing the difficulty in protecting confidentiality).
What is the proper balance between protecting the confidentiality of respondents while supporting the best use of the data? It would be easy to protect confidentiality by denying any outside use of the data, and it would be easy to do the opposite: supporting researchers by providing full access to data. What is wanted, however, is a way of balancing both goals so that both confidentiality and data use are supported. This question particularly relates to item 7 (protecting confidentiality while providing timely access), though it is also a foundational question that affects what tools and data can be developed and used.

The panel next examined what information it needed to address the statement of task. Collectively, the panel members brought experience and expertise in working with SIPP data and in disclosure avoidance approaches (including traditional techniques, controlled virtual access, synthetic data, and differential privacy) with different panel members having different specialties. In part, the panel engaged in internal, mutual instruction, so that all panel members would share a common level

Page 21 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

of understanding; this included briefings and discussions on differential privacy, controlled virtual data access, and balancing privacy and usability.

The panel also required extensive information about SIPP and how it is used. The Census Bureau provided briefings giving the panel an overview of SIPP and describing key topics that would be relevant. These covered how SIPP works, what its major issues are, what decisions about SIPP and data protection had already been made, and what are SIPP’s content, products, and concerns. They also covered SIPP uses of administrative data during production processing; background on SIPP synthetic data and numbers and types of users; SIPP’s current and desired level of security; new developments within the Census Bureau on disclosure limitations; SIPP small area estimation, key estimates, and data quality; and a recent re-identification study conducted by the Census Bureau relating to SIPP.

The panel conducted multiple literature searches to identify the uses of SIPP, focusing on four search methods: (1) a bibliography provided by the Census Bureau on its website of research based on SIPP; (2) a search of publications within the past five years citing SIPP and focused on the topics addressed and the methods used; (3) a search of articles within the past three years on disclosure avoidance, as well as articles on the impact that promising privacy has on survey response rates; and (4) a Scopus citation search of articles listing SIPP in the title or abstract, looking particularly at the 50 publications with the most citations.

The panel downloaded considerable material about SIPP from the Census Bureau website, including the history and design of SIPP, documentation about SIPP’s public-use file, and SIPP’s 2020 public-use file itself. Selected tabulations were produced from the downloaded data file and then reviewed for accuracy by analysts at the Census Bureau.

Finally, the panel issued a call for information, asking SIPP data users to complete a short online questionnaire about how they used the data and what problems they experienced (see Appendix E). These data from 65 SIPP data users cannot be considered to be a nationally representative sample, but they provide a greater level of detail about how SIPP’s data were used than would otherwise be available. All statistical findings were confirmed by a second National Academies staff member.

Table 1-1 provides a list of all the briefings provided to the panel by both outside and internal experts.

In some areas the panel relied on its own internal expertise where judgment calls were required or the information, by its nature, could not be clearly documented. For example, this report states that no software for creating synthetic data files is currently ready for handling a file with the size and complexity of SIPP. Such judgments are noted in the text where they occur.

Page 22 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

TABLE 1-1 List of Briefings Provided to the Panel

Date	Topic	Presenter(s)
6/6/22	Introduction to Survey of Income and Program Participation (SIPP) and Expectations of the Panel	Jason Fields—Census Bureau
6/30/22	The SIPP Synthetic Beta	Rachel Shattuck—Census Bureau
6/30/22	Model-based Imputation and Administrative Records in SIPP Processing	Benjamin Gurrentz—Census Bureau
6/30/22	An Introduction to the SIPP Content, Products, and Concerns	Adriana Hernández-Viver, Robert Munk, Yerís H. Mayol-García—Census Bureau
6/30/22	Small Area Estimates for the SIPP	Benjamin Gurrentz, Sam Szelepka—Census Bureau
6/30/22	SIPP Key Estimates and Data Quality	Ashley Westra—Census Bureau
6/30/22	SIPP’s Current Level of Security	Holly Fee—Census Bureau
6/30/22	Protecting Respondent Confidentiality in the SIPP	Gary Benedetto, Rolando Rodriguez—Census Bureau
9/7/22	Balancing Data Privacy and Usability in the Federal Statistical System	V. Joseph Hotz—Duke University, Robert A. Moffitt—Johns Hopkins University
9/7/22	A Penny Synthesized is a Penny Earned? An Analysis of Synthetic Earnings Using Survey Responses and Administrative Records	Jordan Stanley, Evan Totty—Census Bureau
9/7/22	Synthetic Data and Census Bureau Directions for Privacy Protection	Jerry Reiter—Duke University
10/3/22	Statistics and Privacy	danah boyd—Microsoft Research and Georgetown University
10/3/22	SIPP User Experiences	Bradford Chaney—National Academies
11/14/22	A Modern Container-based Approach for Development of and Access to Confidential Data	Lars Vilhuber—Cornell University
11/14/22	Differential Privacy	Salil Vadhan—Harvard University
12/12/22	Restricted Data Access at Inter-university Consortium of Political and Social Research	Amy Pienta—University of Michigan
3/21/23	SIPP 2014 Panel Re-identification (Re-id) Study Findings and Recommendations	Aref Dajani, Steve Clark, Phyllis Singer—Census Bureau

Page 23 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

The panel divided into three teams to consider what the report’s content should be and draft the report itself. Each panel member served on two teams. The following topics delineated the teams:

User perspectives and the balance between privacy and confidentiality,
Assessment of privacy and disclosure risk, and
Approaches to disclosure avoidance from a SIPP perspective.

Each team reviewed and assessed the report outline to ensure that all important topics were addressed and that the report was well organized. Each team approached its topic by conducting literature reviews to identify the most up-to-date research on it, considering nuances, and conducting assessments of the published research based on analysis of SIPP data. Teams met in separate, closed, remote meetings three times, each meeting followed by a closed panel meeting to review and summarize their progress.

In preparation for the final closed hybrid panel meeting, the teams met a fourth time—making plans to draft their sections of the report and prepare presentations for the panel meeting. At the panel meeting, the draft report was reviewed chapter by chapter, with each team presenting the parts of the report they had prepared. Content was discussed and revisions agreed to. The documents were revised and updated to prepare the first draft of the report.

The panel met again in a closed session to finalize the conclusions and recommendations. After further edits to the text, the report was sent out for review by six independent experts in the following fields: experience with SIPP, disclosure avoidance approaches, demography, and small area estimation. The panel met one final time in closed session to discuss the comments received from the outside review.

OUTLINE OF THE REPORT

This report is organized in the following manner. Chapter 2 provides an overall summary of SIPP, describing how the survey is conducted, what data are collected, how the data are used, and what disclosure avoidance protections are currently in place. Chapter 3 examines how disclosure risks occur and how disclosure risk is currently or should be measured. Chapters 4 through 8 collectively discuss the disclosure avoidance approaches that are available. Chapter 4 provides a general overview of disclosure limitation approaches, discussing what approaches are available and how they might be used collectively as a package of different approaches for different situations. Chapter 5 begins a more detailed examination of individual disclosure approaches, looking at secure online data access, while Chapter 6 discusses partially synthetic datasets, Chapter 7 discusses a table generator

Page 24 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

and remote analysis platform, and Chapter 8 examines the special challenges presented by geographic variables and how they might be addressed.

Differential privacy is both a metric used to guide disclosure avoidance and a framework for developing tools that limit disclosure with respect to this metric. Thus, it is discussed primarily in Chapters 3 (as a measurement tool) and 4 (in the overview of disclosure avoidance approaches), along with references in other chapters as appropriate, and again in Appendix C. Chapter 9 examines ways to create a balance between promoting usability of SIPP data and preserving confidentiality. It also describes the experiences of SIPP users, based on 65 responses from those users who completed a questionnaire about their experiences. Finally, Chapter 10 provides a summary of the panel’s conclusions and recommendations.

Six appendices provide supplementary and generally technical information, particularly including formulas where appropriate. Appendix A provides information on measuring disclosure risk, complementing Chapter 3. Appendix B provides information on making inferences based on synthetic data, as referenced in Chapter 6. Appendix C provides technical information on differential privacy in table generators, as referenced in Chapter 7. Appendix D concerns geography variables, as discussed in Chapter 8. Appendix E provides a description of how data were collected from SIPP users who responded to the call for information and provides more detailed statistical results than are contained in the main text. Appendix F provides a list of references for the literature review that is discussed in Chapter 9 and summarized in Figures 9-1 and 9-2. Appendix G provides biographical sketches on the panel members.

This report does not address how future malicious actions might be anticipated and prevented by identifying suspicious behavior that seems directed toward identifying respondents. It is difficult to identify actions that might be taken with regard to public-use files, since the downloading of multiple files by itself is not a suspicious action; also, the Census Bureau does not require people to register to download a public-use file, and thus does not have records of data users. When operating within a user agreement, such as through a Federal Statistical Research Data Center, there are mechanisms in place to prevent the release of confidential information, though those mechanisms could potentially be expanded, such as through the use of artificial intelligence.

PREVIOUS COMMITTEE ON NATIONAL STATISTICS STUDIES RELATING TO SIPP

The current study is one of several conducted by the National Academies over the past 34 years. Following is a brief description of the earlier studies, along with those conclusions and recommendations from them that are most relevant to the current study.

Page 25 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

An Interim Assessment of SIPP (1989)

Five years after the initiation of SIPP, the Census Bureau and the U.S. Office of Management and Budget asked the Committee on National Statistics at the National Academies to perform an independent study of SIPP. In an initial interim report, the committee examined the goals of SIPP, how well the survey was meeting the goals, and the quality and utility of SIPP data products. The committee found that “SIPP is making a vital contribution to understanding the characteristics and dynamics of the population at economic risk, and the ways in which federal programs meet—or fail to meet—economic needs” and that SIPP “provides data not elsewhere available that is integral to policy analysis of income maintenance programs” (National Research Council, 1989, p. ix). As part of the study, the committee conducted interviews with selected federal agencies, finding that six of them made major use of SIPP: Food and Nutrition Service, U.S. Department of Agriculture (USDA); Census Bureau, U.S. Department of Commerce; Assistant Secretary for Planning and Evaluation, U.S. Department of Health and Human Services (DHHS); Social Security Administration; DHHS Congressional Budget Office; and Congressional Research Service. Eight other agencies made occasional use of SIPP: Economic Research Service, USDA; National Center for Health Services Research, DHHS; Family Support Administration, DHHS; U.S. Department of Education; U.S. Department of Housing and Urban Development; Bureau of Labor Statistics, U.S. Department of Labor; U.S. Office of Management and Budget; and U.S. Commission on Civil Rights. Finally, two agencies were found to have a potential use of SIPP: Bureau of Economic Analysis, U.S. Department of Commerce; and the U.S. Department of the Treasury.

Following are the recommendations from the 1989 report that are most relevant to the current study:

Recommendation 4. The committee recommends that the Census Bureau give high priority to matching SIPP with administrative records and that the Office of Statistical Policy in the Office of Management and Budget encourage and promote interagency matching projects. (p. 72)
Recommendation 5. The committee recommends that the Census Bureau commit itself to ensuring the timely release of high-quality public-use microdata files and improving the comprehensiveness of SIPP documentation. The committee endorses the current effort of the Census Bureau to make SIPP data products available on a more timely basis. (p. 72)
Recommendation 6. The committee recommends that the Census Bureau explore methods to facilitate use of SIPP and put more resources toward efforts to make the data more accessible to users,

Page 26 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

whether those activities take place inside the Bureau or are conducted by private contractors. (p. 72)

The Future of the Survey of Income and Program Participation (1993)

This 1993 study was part of a reassessment and redesign effort conducted by the Census Bureau after roughly nine years of SIPP’s operation (National Research Council, 2009). The panel reviewed the survey’s goals, content, and relationship to other data collections; survey and sample design; data collection and processing; publications and other data products; analytical methods for using the complex longitudinal data; methodological research; and management and oversight of the SIPP program. Much of the report concerned the content and methodology of SIPP, which are outside of the charge to this panel, but a few of the report’s recommendations are particularly relevant:

Recommendation 6-3: The Census Bureau should continue to develop improved microdata. (p. 10)
Recommendation 6-4: The Census Bureau should work to improve documentation and related user information services for SIPP. (pp. 10–11)

Reengineering the Survey of Income and Program Participation (2009)

This study was part of a re-engineering of SIPP started in 2006 by the Census Bureau, with the panel focusing on the linking of administrative records and SIPP data (National Research Council, 2009). Following are some of the most relevant conclusions and recommendations coming from that study.

Conclusion 2-1: The Survey of Income and Program Participation (SIPP) is a unique source of information for a representative sample of household members on the intrayear dynamics of income, employment, and program eligibility and participation, together with related demographic and socioeconomic characteristics. This information remains as vital today for evaluating and improving government programs addressed to the social and economic needs of the U.S. population as it did when the survey began 25 years ago. (p. 2)
Recommendation 3-5: The Census Bureau should request the Statistical and Science Policy Office in the U.S. Office of Management and Budget to establish an interagency working group on uses of administrative records in the SIPP. (p. 5)
Recommendation 3-7: When considering the addition to SIPP of administrative records values for variables that have never been

Page 27 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

ascertained in the survey itself, the Census Bureau should ensure that the benefits from the added variables are worth the costs, such as additional steps to protect confidentiality. The bureau should consult closely with users to be sure that the added variables are central to SIPP’s purpose to provide information on the short-run dynamics of economic well-being and that their inclusion does not compromise the ability to release public-use microdata files that accurately represent the survey data. (p. 6)
Conclusion 3-6: Multiple strategies for confidentiality protection and data access are necessary for a survey as rich in data as SIPP. Public-use microdata files, which are available on a timely basis and in which confidentiality protection techniques do not unduly distort the relationships in the data, are the preferred mode of data release. Some uses may require access to confidential data that at present can be provided only at one of the Census Bureau’s Research Data Centers. (p. 6)
Recommendation 3-8: The Census Bureau should develop confidentiality protection techniques and restricted access modes for SIPP that are as user-friendly as possible, consistent with the bureau’s duty to minimize disclosure risk. In this regard, the bureau should develop partial synthesis techniques for SIPP public-use microdata files that, based on evaluation results, are found to preserve the research utility of the information. For SIPP data that cannot be publicly released, the Census Bureau should give high priority to developing a secure remote access system that does not require visiting a Research Data Center to use the information. The bureau should also deposit SIPP files of linked survey and administrative records data (with identifiers removed) at all Research Data Centers in order to expand the opportunities for research that contributes to scientific knowledge and informed public policy. (p. 6)
Conclusion 4-7: The release of SIPP data is often not timely. Data from the 2004 SIPP panel were generally released more than 2 years after being collected. Other panel surveys have more timely data release, often within a year of data collection, which enhances their usefulness to external users. (p. 9)

The 2014 Redesign of the Survey of Income and Program Participation: An Assessment (2018)

Following a redesign of SIPP in 2014, this study was designed as an independent evaluation comparing the new design and old design by comparing key estimates, evaluating the content, evaluating the impact on

Page 28 Cite Bookmark

Suggested Citation: "1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

respondent burden, and considering content changes for future improvement of SIPP (National Research Council, 2009). Some relevant recommendations follow.

Recommendation 5-2: The Census Bureau should continue to investigate effective use of administrative records, such as Internal Revenue Service data mentioned in this report, to enhance the quality of data collected in SIPP. This research should include identification and correction of both false positives and false negatives. Efforts should continue to obtain administrative data for important state-run programs such as the Supplemental Nutrition Assistance Program. (p. 6)
Recommendation 9-1: The Census Bureau should release SIPP data within 1 year of the end of data collection. (p. 7)
Recommendation 9-2: The Census Bureau should commit to developing a formal release process and timetable for SIPP, including release of data and publications. The publications should contain key estimates and data characteristics, and they should be a formal and publicized part of the process. Timetables should be met. (p. 7)
Recommendation 9-3: The Census Bureau should continually improve the documentation available to data users. (p. 8)
Recommendation 9-5: The Census Bureau should formally solicit data users’ feedback, input, and ideas after those users have had time to work with wave 1 and wave 2 datasets from the redesigned SIPP. (p. 8)