Intergenerational mobility is challenging to measure for at least six reasons: (a) Mobility is a multidimensional process that plays out across a wide range of interrelated outcomes (e.g., earnings, income, wealth, occupation, class, prestige). Although this challenge is typically skirted by focusing on just one type of mobility, doing so ignores trade-offs and other ways in which different types of mobility are related to one another. If one sought to instead analyze mobility holistically, the data demands quickly become complex. (b) Mobility is affected by continuous changes across the life course, but measures occur at moments in the life course (e.g., mobility between “parent’s occupation at age 45” and the child’s “first occupation after they complete schooling”). If one wants to examine multigenerational mobility (e.g., mobility between grandparents, parents, and grandchildren), these continuous changes become yet more complicated to characterize. (c) Familial resource-sharing dynamics complicate measurement; such dynamics include rapid changes in nonmarital cohabitation and childbearing, new patterns of divorce, and evolving norms of resource-sharing across multiple partnerships and extended families. Because these changes imply fluidity in resource-sharing arrangements, they complicate the measurement of socioeconomic status. (d) Mobility is highly variable and differs across neighborhoods, gender, race and ethnicity, nativity, and other subpopulations. Because the effects of these different dimensions on the mobility process may interact, a very large sample is often needed to properly examine cross-group differences in mobility. (e) Mobility has a wide range of effects on many dimensions of human well-being (e.g., health, happiness, family formation) that are themselves challenging to measure. (f) Mobility has varied
causes—both individual-level and institutional causes—that interact with one another in complicated ways. For all of these reasons, large samples that cover long periods are typically needed to measure mobility and the wide range of mechanisms that potentially affect mobility.
This chapter begins by summarizing how the current data infrastructure meets these many challenges. It first discusses some of the most frequently used survey data as well as new administrative datasets that will make it increasingly possible to meet these challenges. Next, it reviews some of the ongoing data infrastructure initiatives that will soon bear fruit. Although these initiatives open new opportunities for research, much additional data collection and data-sharing are still required, including the ability to combine ongoing survey data with administrative data. The concluding section provides recommendations for improving accessibility and dissemination. The chapter text is followed by tables outlining surveys and datasets pertinent to economic and social mobility, as well as a glossary of relevant terms.
Annex Table 6-1 lists some of the survey data that have been used to study intergenerational mobility in the United States. This section offers brief comments on their relevance to mobility research.
The Panel Study of Income Dynamics (PSID) and the National Longitudinal Surveys of Youth (NLSY) have long been relied upon because they have evolved to include long panels on the adult children of the original survey respondents. In addition, the PSID and NLSY contain child and youth supplements that can be used to further assess mobility, and the NLSY is currently undergoing planning for a new cohort in 2027. The Wisconsin Longitudinal Study (WLS) also offers a very long panel, with rich and extensive follow-ups as the cohort moved through early adulthood, middle age, and old age. Whereas the WLS follows a birth cohort that is now very old, the Future of Families and Child Wellbeing Study (FFCWS) and National Longitudinal Study of Adolescent to Adult Health (Add Health) follow more recent birth cohorts and have thus become increasingly attractive for contemporary mobility analysis (see James et al., 2021, for FFCWS, and Harris et al., 2019, for Add Health). Although the FFCWS was fielded to study the effects of family structure and Add Health to monitor health outcomes, both provide exhaustive coverage of life experiences. The Survey of Income and Program Participation does not have an explicitly intergenerational design in recent waves but matches to administrative earnings records facilitate its use for intergenerational analyses. The Health and Retirement Study (HRS) covers an extensive range of birth cohorts, includes
retrospective data on childhood family life, and can be linked to earnings records and other administrative data. Because respondents do not age into the HRS until they are approximately 50 years old, recent trends in mobility are not immediately detectable in the HRS.
Whereas most of the preceding longitudinal household surveys lack adequate samples of immigrants (e.g., PSID, Add Health, FFCWS, NLSY97), two influential longitudinal studies—the New Immigrant Survey (NIS)1 (whose sample consists of new permanent residents) and the Children of Immigrants Longitudinal Study (CILS)2—are designed to study the intragenerational mobility (NIS) and intergenerational mobility (CILS) of immigrants. These surveys have generated much important research on immigrant mobility; unfortunately, the last wave of NIS data was collected in 2007–2009 and the last wave of CILS data was collected in 2001–2003. The available samples sizes in these surveys are also quite small (see Annex Table 6-1).
The decennial censuses and large population surveys, by contrast, have sizeable samples of immigrants (e.g., Current Population Survey [CPS], American Community Survey [ACS]), but lack the longitudinal information required to assess inter- or intrageneration mobility (but see the discussion on opportunities for linking Census and administrative data). Both the CPS and ACS contain birthplace, citizenship, and year of entry to the United States; however, only the CPS (March supplement) includes information about birthplace of parents, which most studies use to derive estimates of intergenerational mobility (see National Academies of Sciences, Engineering, and Medicine [National Academies], 2015). The ACS, which provides timely information about inter-Census changes in population characteristics, lacks this information. The committee concurs with Recommendation 10-1 in a report by the National Academies (2015) to include new question on parental birthplace in the ACS.
The next set of panel studies in Annex Table 6-1, all of which are fielded by the National Center for Education Statistics (NCES), provide samples of successive cohorts of middle- or secondary-school students (see Annex Table 6-1). The High School and Beyond study represents high school sophomores and seniors in 1980; the National Educational Longitudinal Study of 1988 represents eighth graders in the 1988 academic school year; the Educational Longitudinal Study of 2002 represents high school sophomores in 2002; and the High School Longitudinal Study of 2009 represents ninth graders in 2009. Because the parents of these students were interviewed in all four of these NCES studies, and because extensive follow-up data on student employment outcomes were also collected in all four studies, these
___________________
data provide valuable descriptive evidence on early life course mobility (and thus usefully complement the HRS’s late life course design).
The remaining surveys in Annex Table 6-1 are cross-sectional and rely on retrospective questions about parental education, occupation, and income to allow for intergenerational studies. The Occupational Changes in Generation Surveys cover relatively old birth cohorts, while the General Social Survey covers over a century of birth cohorts. The latter has been especially useful in monitoring trends despite the small per-year sample sizes. The American Voices Project (AVP) is a probability sample of qualitative interviews that has not yet been extensively used but may prove helpful in understanding the long arc of intergenerational mobility.
The surveys described above will no doubt continue to be important resources for research addressing many of the mechanisms that lie behind intergenerational mobility. At the same time, other important research questions cannot readily be answered with surveys, as the available sample sizes are too small to disaggregate by neighborhood, detailed occupation, detailed racial/ethnic groups, and other important sources of variability in mobility processes. For this reason, it is likely that future research will increasingly rely either on (a) linked administrative datasets and (b) survey data linked to administrative data (Grusky et al., 2015; Johnson et al., 2015).3 Several new linked datasets deserve mention because they offer important opportunities for mobility research.
The recently released IPUMS Multigenerational Longitudinal Panel (MLP), which spans 9 Census years (from 1850 to 1940), contains more than 700 million individual records and 200 million links (Helgertz et al., 2022). The full-count historical data with names have been digitized by Ancestry and FamilySearch.org and have been assigned a historical identification key by IPUMS. This panel will continue to expand as newly identified data are released. In 2022, the 1950 full-count Census with names was released. IPUMS worked with Ancestry to digitize most of these names, and recently the 1950 Census was linked to the MLP.4 The resulting dataset will be an important asset for mobility analysis because it covers a long time
___________________
3 Two recent reports by the National Academies (2023a, 2024a) emphasize the need for better data to measure intergenerational mobility.
4 Individual- and household-level Decennial Census data are released only after 72 years. For information on the MLP, see https://usa.ipums.org/usa/mlp/mlp_versions.shtml.
period, allows for multigenerational analysis, and offers very large sample sizes. In addition, MLP was constructed using highly reliable linking methods, resulting in very low error rates.
While MLP pioneered methods of linking women across censuses within their birth family or within their marriage family by using coresidence with other individuals, these data have known limitations. The most obvious problem is that, because women typically change their name at marriage, it is difficult to track them from their birth to marriage families. This means that the MLP only represents a selected set of women who remained in the same household across Census years. Another problem is that, prior to 1940, Census data contained information about occupation, industry, and literacy, but no information on wage earnings or educational attainment. The 1940 Decennial Census was the first to include information about wages and education. A third issue with linking historical Census records is that new immigrants appear only in the recent Census data and cannot be linked to censuses before they resided in the United States. In addition to naturalized citizens and legal permanent residents (i.e., green card holders) who are immigrants, the foreign-born population includes a broad range of temporary visa holders, such as students, diplomats, and visitors, as well as undocumented residents who either entered without inspection or overstayed visas. Visa status information is essential to exclude nonimmigrants from assessments of immigrant mobility. To do so, administrative data on visa type collected by U.S. Citizenship and Immigration Services needs to be made available for researchers to link with Census and administrative data. The committee concurs with Recommendation 10-6 in a report by the National Academies (2015) to allow administrative data on visa status information to be available to researchers in secure data enclaves.5
The MLP is nonetheless valuable because it includes rich sociodemographic information—such as age and birth cohort, race and ethnicity, country of birth, and region—that facilitates subgroup analyses. While still preliminary, one potential avenue for improving the cross-Census matching of women from their birth family to marriage family is to use vital statistics data, as in the Longitudinal, Intergenerational Family Electronic Micro-database (LIFE-M; Bailey et al., 2022) or Census Tree projects (Price et al., 2021).
___________________
5 The committee’s focus on immigrants who may be identified by visa status will exclude undocumented immigrants unless they are linked to data sources that include place of birth. Even then, the data would need information about the place of birth for parents of U.S.-born children to describe intergenerational mobility. Taking account of the full population of immigrants—documented and undocumented—is a challenge.
The LIFE-M project links all birth certificates for the states of Ohio and North Carolina to forbears and children and grandchildren (Bailey et al., 2022). The project’s 2022 public release6 contains 15 million individuals born from 1841 to 1968 belonging to more than 4 million families and spanning four generations (Bailey et al., 2022). Because birth certificates contain women’s birth (“maiden”) names, LIFE-M follows women at roughly the same rates as men. In addition to the variables available in MLP (LIFE-M links to these files through an identification code), LIFE-M contains information on health (date and place of death, cause of death), birth family characteristics (parity, sibling sex composition, age differences, twinning, number of siblings), marriage family characteristics (age and place of marriage, married name, spouse name and spouse background characteristics from the Census), own births (number of children, mortality of own infants and children, timing of births, sex composition, and twinning), and lifetime mobility (geographic location—town or county—at birth, marriage, Census enumeration through 1940, and death). Efforts to link more state records and the 1930 and 1950 censuses are ongoing. However, as noted, projects like LIFE-M also have limitations related to the Census data to which they link.
The Census Tree project combines Census information with a diverse set of records on FamilySearch.org, one of the largest user-created genealogical platforms (Price et al., 2021). FamilySearch.org information is largely generated by its users, who search the website’s trove of information (e.g., vital records, newspapers, cemetery documents, Census records) to link their family’s ancestral records. The Census Tree combines these user links (which its creators estimate to be correct 95% of the time) with machine links (which its creators estimate to be correct 86%–89% of the time) to produce a large intergenerational database containing both men and women. The project’s public release boasts an overall match rate of 62–65 percent for almost 89 million matches and a false positive match rate of around 6–7 percent (Price et al., 2023). This dataset is still growing and developing, and ongoing efforts will continue to add more information and continue to improve the data. In addition to those related to the Census data (as noted above), a limitation of user-generated data is that the family trees will represent the set of users on the FamilySearch.org site.
___________________
For Census Bureau products that do not meet the 72-year rule, the Census Bureau allows researchers to apply for access to linked administrative datasets that are then analyzed in Federal Statistical Research Data Centers (FSRDCs) by researchers with Special Sworn Status (with disclosure avoidance review undertaken to ensure that publicly released statistics do not allow for reidentification). There are, for example, ongoing research projects entailing the analysis of the 2000 Decennial Census linked to the 2005–2023 ACS. The household roster in the 2000 Decennial Census can be used to identify parent–child pairs (among young children who are living with their parents), and the adult occupations and income of those children can then be identified for the subset of such children who show up in subsequent versions of the ACS.
One existing rich source of individual-level data is the Census Bureau’s Longitudinal Employer–Employee Household Data (LEHD; Graham et al., 2022). The LEHD are administrative data consisting of individual-level records on quarterly earnings and employment based on the information that employers report to state unemployment insurance agencies. The Census Bureau links individuals to their employers and assigns a protected identification key (PIK), a unique identifying code, which allows these data to be linked to other Census data sources, including characteristics such as race, sex, years of education, and occupation. For records that cannot be linked, the LEHD imputes this information. These data can also be linked to Census residence files so that residence-level employment and earnings records can be formed. The availability of the data varies by state (with most states making data available starting in the early to mid-1990s). These data can be linked to Decennial Census data (as described below) to produce a more complete picture of households at a point in time, which allows researchers to follow individuals in the future when they enter the labor market.
Another source of data for studying intergenerational mobility is individual federal tax data. These data have provided the foundation for very important recent studies of economic mobility (e.g., Chetty et al., 2018). Currently, the Census Bureau has some information on tax filers’ 1040 forms from 1969, 1974, 1979, 1984, 1989, 1994, and 1995 onwards, but only the data for years prior to 1995 are available to researchers outside the
Census Bureau (Alexander et al., 2024). Although only limited tax information is available currently, the expectation is that more tax information (as well as other data) will be made available soon. Another limitation is that only the 1994 data include identifiable information for dependents. For all other years, dependents in the household cannot be followed over time. Even with these data limitations, there is potential for more in-depth intergenerational research using these data. For example, of the more than 212 million individuals in the 1994 filers’ data, 179 million can be linked to the 2020 Decennial Census data. Furthermore, of the 53.4 million individuals in the 1940 Census that have PIKs, 23.7 million can be linked to the 1994 tax data.
The datasets described above allow for analyses of individual- or household-level records. Although these are critical for many types of mobility analyses, it is also important to provide aggregated mobility statistics that, like other aggregate labor market statistics (e.g., measures of employment and unemployment), can assist with research and inform the public and policymakers. Annex Table 6-2 lists some of the most important sources of such aggregated mobility data. The table also lists datasets that provide other types of contextual data pertaining to spatial units (e.g., neighborhoods, counties, states), educational institutions (e.g., primary, secondary, and postsecondary schools), labor market institutions (e.g., firms, industries, occupations), and policy (e.g., local government finances, laws pertaining to structural racism, other state laws). The latter types of data are not “mobility data” in and of themselves but may be useful in understanding institutional sources of mobility.
The first set of data sources in Annex Table 6-2 pertain to intergenerational mobility and other data (e.g., poverty, employment) aggregated up to different types of spatial units (e.g., neighborhoods, states). The Opportunity Atlas is a leading source of mobility data at the Census tract level, which is developed from links between federal tax data, 2000 and 2010 Decennial Census microdata, and microdata from the 2005–2015 ACS. Social Explorer is a leading source of demographic data aggregated up to block groups, zip codes, congressional districts, and states. The remaining spatially aggregated data sources in Annex Table 6-2 pertain to measurements of residential segregation (i.e., the Segregation Explorer), population flows (Safegraph Places Dataset), and justice outcomes (i.e., Justice Outcomes Explorer).
The next set of data sources in Annex Table 6-2 pertains to data aggregated at the school level (i.e., primary, secondary, and postsecondary schools). These datasets provide information on the characteristics of
schools (e.g., tuition, admission rates, racial and gender composition), their educational output (e.g., test scores, degrees conferred), and their labor market correlates (e.g., earnings). The main sources of these data are the Stanford Education Data Archive, the Integrated Postsecondary Education Data System, the Common Core of Data, the National Student Clearinghouse, and the LEHD.
The third set of data sources in Annex Table 6-2 are aggregated up to key labor market institutions (e.g., occupations, firms, industries). The Occupation Information Network provides measures of occupational activities and skills; the LEHD provides measures of hires and jobs at the firm level; and the Bureau of Labor Statistics (BLS) provides a host of labor market measures (e.g., employment, compensation, productivity) at the industry, occupation, state, and national levels. The final set of data sources in Annex Table 6-2 provides measurements of policy, government finances, structural racism, and state laws.
The Census Bureau’s Mobility, Opportunity, and Volatility (MOVS) project is not listed in Annex Table 6-2 because only the initial year of data has been released.7 Once completed, MOVS will provide an especially important source of aggregated mobility indices that could be conceived as the analogue to, say, the U1–U6 measures of unemployment. This project involves integrating 1040 tax data from the Internal Revenue Service (IRS), Social Security Administration data from the Census Bureau Numident file (containing individual birth date and place, sex, citizenship, and date of death information), and demographic data collected in Census Bureau censuses and surveys. The goal is to produce public-use, aggregated statistics for individuals and households measuring income growth, income volatility, and economic mobility. These statistics, many of which will be disaggregated by race, ethnicity, and other demographic characteristics, will allow researchers to explore income mobility patterns in richer detail. By providing a full suite of regularly released measures of mobility at varying levels of geography, the MOVS project is an important step toward regularizing the reporting of real-time mobility data.8
A related project to MOVS is the Income Distributions and Dynamics in America project, which uses the same data as the MOVS project to create publicly available measures of income inequality and mobility for detailed racial/ethnic groups by geographic location.9
___________________
7 https://www.census.gov/library/stories/2024/05/movs.html
8 There is the possibility that the linked individual- and household-level data will be made available to researchers in the future.
9 https://www.minneapolisfed.org/institute/income-distributions-and-dynamics-in-america
Important individual- and household-level data projects on mobility are underway. Although the MLP, LIFE-M, and Census Tree projects described above make it possible to characterize the mobility processes of early U.S. birth cohorts, no administrative datasets are available to scholars seeking to characterize the mobility processes of cohorts born after 1940 and before 1970.10 Two new projects develop intergenerational linkages for more recent cohorts. The American Opportunity Study (AOS) will allow for multigenerational analyses after 1950 (whereas 1950 is the latest Census year available for MLP, LIFE-M, and the Census Tree), while the State Longitudinal Data Systems (SLDS) initiative provides linkages between educational and earnings data for recent cohorts.
The AOS11 builds from the Decennial Census Digitization and Linkage Project (DCDL) at the Census Bureau (Genadek & Alexander, 2019).12 The purpose of DCDL is to connect the 2000, 2010, and 2020 censuses to the ACS using PIKs, while the purpose of the AOS is to extend the DCDL initiative by allowing for such linkages to earlier decennial censuses. Because names in the 1960–1990 censuses are handwritten and saved in microfiche, the Census Bureau—in collaboration with Opportunity Insights, Brown University, the Institute for Social Research at the University of Michigan, and the Stanford Center on Poverty and Inequality—has developed new methods for digitizing these data.13 After digitization, the Census Bureau will add a PIK to each individual record in the 1960–1990 censuses, IRS tax data, and data from the Social Security system, which will transform these records from large cross-sections into large longitudinal datasets of individuals. When the linking is completed (expected in 2026), Decennial Census data from 1850 through 2020 will be longitudinally linked, to the extent possible.
Of course, not all individuals will be linked. Although PIK rates are extremely high and well validated, linked data tend to be unrepresentative (Bailey et al., 2022; Ruggles, 2006). This means that—as well as adding a richer set of questions—surveys will play an important role in validating the quality and representativeness of administrative data. To protect the confidentiality of individuals in the Census, these linked longitudinal data will again be available only to a select set of researchers who can obtain
___________________
10 There have, however, been internal Census Bureau analyses of these birth cohorts.
11 https://www.census.gov/about/adrm/linkage/projects/aos.html
12 https://www.census.gov/about/adrm/linkage/projects/aos.html
Special Sworn Status and conduct their research in FSRDCs. While these restrictions limit who can access the data, the Census Bureau and other agencies are working to lower the cost of access while still protecting respondent confidentiality.
The AOS can be used to evaluate the short- and long-run effects of thousands of social programs implemented in the last 70 years and can enable a host of related sociodemographic analyses. Because it is, in effect, a massive individual-level longitudinal panel dataset, other datasets (e.g., program data, education data, past experiments) may be combined with this population-level infrastructure to learn about intergenerational mobility.
The SLDS initiative, which is sponsored by the U.S. Department of Education, provides funding for states to construct datasets linking individual students from preschool through secondary and postsecondary school and then to quarterly earnings records obtained from the unemployment insurance system in the state (which captures all formal employment). These data also include information about schools, including certification and licensure of teachers, the percent of students receiving free or reduced-price meals, and scores on standardized tests for all students in a building (Bloom-Weltman et al., 2021). Several states are also supplementing this information with vital records data, data on participation in transfer programs, corrections data, and driver’s license data. While these data typically have information only back to the early 2000s, their longitudinal and generational coverage will grow with time. In addition, with state approval they can be linked to administrative data on intergenerational mobility to construct more granular measures of school quality and provide better information on participation in transfer programs, compared with data available from other federal agencies.
The SLDS initiative also has a cross-state component. To track migration of students and to address the lack of a national student-level data system, states have begun to connect their data systems across states with interstate data sharing agreements. Two important examples of cross-state initiatives are (a) the Western Interstate Commission for Higher Education’s Multistate Longitudinal Data Exchange, and (b) Coleridge Initiative members who are enabling interstate data sharing via the Administrative Data Research Facility.
The sources of longitudinal data described above form the core of the data available to study intergenerational mobility, but they can be
supplemented with outcomes from different sources. Annex Table 6-2 provides, for example, a description of some contextual datasets that could be linked to the ACS.
A range of “newer” data sources also provides new data linkage opportunities. For example, efforts to add PIKs to criminal justice data, data about veterans from the Department of Defense, geolocational data from cell phones, and credit reports and other data from private companies are expanding research opportunities to study intergenerational and intragenerational mobility. Prominent examples of such “linkable data” include commercial data on housing (e.g., Zillow, Black Knight/Core Logic), spending information (e.g., Affinity data in OI), and combined credit and spending information (e.g., JPMorgan Chase Institute). Also, JPMorgan Chase data have been used to examine income and spending volatility and mobility over a short time period, and Vanguard data have been used to evaluate retirement choices over time. Because researchers have only recently begun to use these data, their strengths and weaknesses are still being discovered.
Another notable linkage effort is the Census Bureau’s 2020 administrative record Census simulation (Brown et al., 2023). In this project, records from the Census Bureau with PIKs and linked administrative records from 31 federal and state agencies, including information from the Centers for Medicare & Medicaid Services, the Department of Housing and Urban Development (HUD), the Federal Housing Administration, Immigration and Customs Enforcement, Indian Health Services, IRS, U.S. Postal Service, U.S. Citizenship and Immigration Services, and U.S. Customs and Border Protection. The Census Bureau has proposed an annual administrative record census. These administrative data from nonstatistical federal agencies are important because they contain information on issues such as immigration, health, and support programs (which are often unavailable or poorly reported in common survey data), and because they are large enough to allow one to identify small populations (e.g., recent immigrants, homeless population). Combining administrative records with other longitudinal data using PIKs will allow research on many new outcomes, mechanisms, and contextual factors shaping the transmission of intergenerational mobility.
In summary, substantial headway is being made to address data challenges that have limited the study of intergenerational mobility in the United States. The AOS, in combination with historical data, will allow an examination of mobility over the long sweep of history across multiple generations, within the context of complicated family structures, at the granular neighborhood level, and across many different subgroups and types of mobility (e.g., occupation, income). When combined with the LEHD, the AOS will have individual and household earnings records for many individuals and households starting in the mid-1990s, while the tax data available prior to 1995 will provide a more complete description of
the household. The SLDS initiative does not cover as long a sweep of history, but it includes additional data on program participation, educational outcomes, and much more. Although the transition to administrative data analysis has been underway for many decades, the initiatives described above will likely accelerate this transition.
Even with the substantial data infrastructure projects mentioned above, data for studying intergenerational mobility remain incomplete. Some important challenges include the comprehensiveness of available data, the quality of available data, and the infrastructure for accessing data in the United States. This section addresses each in turn.
Mobility research hinges on measuring a variety of resources (e.g., income, occupation, wealth) across two or more generations, making comparisons within and across subpopulations, examining the mechanisms for transmitting advantage and disadvantage (e.g., families, educational systems, labor markets, neighborhoods), and understanding the effects of programs and policies on opportunities. In attempting to meet these objectives, existing datasets fall short in seven ways.
One prominent data gap is around measures of wealth. The Federal Reserve’s Survey of Consumer Finances is an important source of wealth data, but its sample of families is too small (6,500 families were interviewed in the most recent survey) to conduct in-depth analysis across key subgroups (e.g., race, ethnicity) and across the wealth distribution. The new Wealth and Mobility (WAM) study, which is being conducted at the Stone Center for Income Dynamics at the University of Michigan (in partnership with the IRS), has the potential to produce more detailed data on wealth, but only aggregate data will be released (and hence linkages to existing household- or individual-level administrative data will not be possible). It is important that efforts such as WAM continue to receive support and that opportunities to undertake analyses with the individual or household data (in FSRDCs or equally secure facilities) are opened more broadly to qualified scholars. The committee agrees with the conclusions in a report by the National Academies (2024a) stating that agencies need to collaborate to produce a data infrastructure that (a) includes both income and wealth (and consumption) data that are consistent with each other, and (b) includes
demographic and geographic variables for the analysis of both inequality and mobility.
Although it is no easy task to infer wealth from tax data, the most fundamental problem is that of data access. To date, only a small number of research teams have access to the data needed to build unit-record wealth data, a limitation that undermines basic science, policy evaluation, and the development of new policy. In addition, researchers need to develop ways to better measure wealth among the hardest-to-measure groups—those at the top and bottom of the wealth distribution—and ensure that these data are available to other researchers. Most of the new data infrastructure initiatives that are discussed above focus on making individual-, household-, or tax reporter–level data available to qualified scholars in a secure environment, an approach that would also be appropriate for wealth data (especially as it accesses tax records).
The U.S. Census Bureau and Statistics of Income are currently collaborating to develop new occupational coding algorithms for the occupational write-ins on Form 1040 that will be comparable to the occupation codes available in the ACS and decennial censuses. These data should improve accuracy of occupational information available in more standard data, such as the ACS or CPS, as well as with the tax data. This will make it possible to carry out data-intensive analyses of neighborhood effects, the extent to which detailed occupational transmission is a key mechanism of mobility, and to incorporate more complicated familial processes (e.g., occupational assortative matching). These new linked data will also support studies of occupational mobility within a person’s lifetime and across generations. It remains unclear whether these new occupational data will be made widely available to all qualified scholars working within secure computing facilities.
It has been difficult to examine the role that education plays in promoting intergenerational mobility in U.S. history, because education was not asked as part of the Decennial Census until 1940. However, the 1940 Census contains information on educational attainment for all individuals alive in 1940, thus making it possible to construct statistics on educational attainment back to cohorts born in the 1880s (recognizing that the trends may be impacted because educational requirements increased over time and differential mortality occurred for less-educated groups). In addition, as of 2010, measures of educational attainment are available only in the ACS
and CPS—not the Decennial Census. However, approximately 20 percent of the population has been part of the ACS since 2005, and the ACS and CPS Annual Social Economic Supplement data have been assigned PIKs, allowing completed education to be linked to decennial censuses.14 Other ways to capture educational information for individuals include linking student records obtained as part of the Post-Secondary Employment Opportunity project (Annex Table 6-2) in the 28 states currently participating in the project. The availability of education data could also be improved if the BLS worked with the Census Bureau to assign PIKs to all CPS monthly files (as far back as is feasible). Encouraging the Census Bureau and the Department of Education to work together to link (using PIKs) FASFA15 filers and students that receive federal financial aid to the state’s SLDS records would enable these data to be linked to the AOS, ACS, LEHD, and other administrative data.
Additionally, information on the quality of schooling, standardized test scores (ACT, SAT, or other achievement test scores), and vocational certifications are largely absent from the administrative data infrastructure, although they are available in many surveys. Allowing these surveys to be linked would allow deeper understanding of the joint processes of educational and economic mobility by assessing, for example, how education “mediates” economic mobility (e.g., parents’ income is correlated with adult children’s education, which in turn shapes children’s earnings and income). Finally, the ability to link the longitudinal SLDS data to both federal records and longitudinal data on the parents of these students would further enhance research on the educational pathways for mobility.
Obtaining consistent measures of race and ethnicity is critical in examining social and economic mobility. Surveys contain a variety of questions to obtain race and ethnicity from respondents, and the recent changes in Statistical Policy Directive No. 15 (see U.S. Office of Management and Budget [OMB], 2024) to include multiple responses for race and ethnicity may complicate the ability to create consistent estimates over time (Marks et al., 2024). In addition, most administrative data (e.g., tax data) do not have measures of race and ethnicity. To overcome this problem, the Census Bureau has created a Title 13 Race and Ethnicity File, denoted the “Best race and ethnicity file” in Deaver (2021), using multiple decennial censuses to obtain consistent measures. This file can be linked to tax records, as recommended in a report by the National Academies (2024a), to evaluate
___________________
14 Approximately 1 percent of the population is included in the ACS each year.
15 FAFSA stands for Free Application for Federal Student Aid.
disparities in income, consumption, and wealth. Alternatively, researchers at the U.S. Department of the Treasury have developed a method for imputing race to the tax data (see Cronin et al., 2024). To facilitate research, these linked data need to be included in the expanded set of restricted data available to researchers (as discussed in the recommendations). Data equity also requires that these linked and imputed data need to be evaluated for errors and misrepresentation (as discussed in National Academies, 2023d).
Many income measures only capture labor market earnings, and even those that capture income from sources other than jobs ignore government transfers (e.g., via Supplemental Nutritional Assistance Program [SNAP], formerly known as the Food Stamp Program), housing subsidies, health insurance from Medicare and Medicaid, or the earned income tax credit. Given that existing research highlights the important role monetary income plays in intergenerational mobility, it would be useful to know the impact nonmonetary sources have on mobility, especially for low-income groups.
The main government funding agencies (e.g., National Science Foundation, National Institutes of Health) support the development and maintenance of a bevy of quantitative datasets, whereas very little funding goes to support the development of qualitative datasets. The vast majority of mobility analysis is, therefore, undertaken with quantitative measurements of the underlying resources (e.g., earnings, income, wealth, occupation) and observed mediating variables (e.g., networks, education, incarceration). Although field experiments and quasi-experiments have improved our capacity to make causal inferences with such quantitative data, our understanding of underlying mechanisms could be further enhanced by coupling such research with a more robust program of qualitative research (DeLuca, 2023). This is because, even when a field experiment or quasi-experiment identifies a likely causal effect, it often remains unclear why that effect is generated. When a quasi-experiment reveals, for example, that a particular program (e.g., a childcare program, housing voucher program, affordable housing program) increases mobility, our understanding of the sources of such effects is often complicated by the large bundle of social processes that a given program typically entails. Without knowing which part of this bundle is delivering the desired effect, commitment to programs must be somewhat blind (e.g., a “sociology of programs” rather than processes) and
cannot be adjusted easily as circumstances change. This problem can be overcome, in part, by combining field experiments and quasi-experiments with the targeted qualitative research that can help us to ferret out the mechanisms lying behind black box program effects.
AVP—the country’s first nationally representative, large participation, public-use qualitative dataset—is a trial initiative that can also assist in this buildout of qualitative research (Edin et al., 2023). The AVP asks people to “tell the story of their life” through a series of semistructured prompts about their work life, family life, friends, religion, and much more. The resulting narratives about individual life arcs can assist in (a) discovering possible pathways through which mobility trajectories unfold, and (b) identifying the many critical junctures that might be targeted by new interventions. These life arc narratives reveal, for example, the prominent role of family traumas (e.g., sexual assaults), a “helping hand” (e.g., the help of a friend), personal struggles and problems (e.g., a drug addiction), and all manner of other pathways and junctures. Because this type of qualitative data can help researchers identify emerging problems, develop hypotheses about important mediators (which can be subsequently evaluated using both survey and administrative data), and otherwise engage in “discovery work,” a strong case can be made for building a permanent platform for collecting nationally representative qualitive data and linking these data to administrative data when possible.
The introductory chapter to this report stresses the many pathways through which various institutions, policies, and programs affect mobility. Given the important role of these institutional forces, more comprehensive data on policies and institutional practices need to be assembled and combined with microdata on individuals and families. As indicated in Annex Table 6-2, several sources for policy data exist (e.g., Agénor et al., 2021; Bailey & Duquette, 2014; Hendren & Sprung-Keyser, 2020; Torche & Rauf, 2021), but more support is needed for developing these sources.
Issues with data quality also limit studies of intergenerational mobility.
The administrative datasets currently being built would ideally represent populations that are often hidden from view. However, some administrative datasets exclude or underrepresent individuals who are undocumented
or unhoused; who recently immigrated; or who work in informal sectors of the economy (e.g., childcare). In addition, some individuals cannot be linked for a variety of reasons (e.g., premature death, do not include the proper name in the Census, name changes across censuses). These problems can be addressed in part by building more comprehensive population indices that combine “conventional” sources of administrative data (e.g., tax data, Census data, ACS, birth records) with the point-in-time counts, social program data, administrative data on education, and other sources. The Census Bureau’s initial effort to conduct an administrative records census demonstrates the potential value of administrative data and the ability to identify difficult-to-reach or small populations, such as noncitizens. At present, however, these populations are not sampled with sufficient frequency or at high-enough rates or are excluded from administrative data altogether, making them difficult or impossible to study. Because these populations may be highly upwardly mobile (e.g., they migrated from difficult circumstances to the United States) or downwardly mobile (e.g., being unhoused), their omission from data resources limits the study of important and often vulnerable populations.
Because of the problems with administrative data discussed above, surveys remain crucial to studying mobility and its mechanisms. Surveys must therefore continue to be supported, and new solutions need to be developed for emerging problems with nonresponse and attrition. In many commonly used surveys, declining willingness to report income or respond at all is a growing concern, particularly since the COVID-19 pandemic (Rothbaum & Hokayem, 2021). The National Experimental Wellbeing Statistics project is one effort being conducted by the U.S. Census Bureau to address these problems, but such efforts need to be expanded to other agencies and data sources. At the same time, increasing reliance on administrative sources of data generates other types of reporting problems, such as “autofill” algorithms, upon which respondents may sometimes rely. It is increasingly common for statistical agencies to fill in missing data by borrowing from other administrative sources, another practice that tends to lead to overstated persistence (because those sources inevitably draw on past—rather than present—circumstances). These types of errors are likely to be systematic and need to be addressed through improvements in statistical inference, in algorithmic borrowing practices, and through new data collection.
Sample coverage is an important aspect of data quality and can have important implications for studying the mechanisms that affect mobility. In the FFCWS, for example, some two-thirds of hospitals did not allow mothers or fathers younger than age 18 to be included in the study (Reichman et al., 2001). The FFCWS sample therefore partially excludes a policy-relevant portion of the birth cohort, those with teenage parents. Moreover, fathers, especially noncustodial fathers, have lower response and retention rates than mothers; fathers’ response rates become even lower after the early interviews, so that the fathers who remain in the sample are a select group that are more involved with their children and the children’s mothers. More generally, survey attrition is a problem that needs to be considered when using longitudinal data to study intergenerational mobility (see, e.g., Schoeni & Wiemers, 2015).
It will be difficult to solve many of the foregoing problems without addressing legal obstacles restricting data access. Although the 2018 Evidence Act mandated that federal agencies increase data sharing and researcher access to these data, only a few agencies other than the U.S. Census Bureau are providing access. For example, the BLS only makes a small subset of their data available to researchers in the FSRDC, while the Department of Education and the Centers for Disease Control and Prevention have refused to put most restricted-access data in the FSRDC system.16 In addition, IRS data are largely available only to researchers within FSRDCs when working on a project with Census Bureau or IRS employees. Finally, many of the agencies that provided records for the 2020 administrative records census are unwilling to provide these data on an ongoing basis. This reluctance or resistance to sharing data raises the burden on other administrative agencies and limits research on intergenerational mobility.
In other cases, the law itself is outdated and needs to be changed. Several laws place significant limits on building or sharing data infrastructure. U.S. Code Title 13, which governs access to U.S. Census Data, and U.S. Code Title 26, which governs access to IRS data, have been interpreted as requiring that any research done with administrative data must improve the operation of the agency, regardless of the wider social benefits of this research (National Academies, 2023b). In addition, only one part of the
___________________
16 The BLS does have its own process for accessing confidential administrative data (https://www.bls.gov/rda/), but because it operates separately from the more widely used FSRDC system, it is more difficult for researchers to link administrative data from other agencies, and it adds unnecessary costs to the federal system for accessing administrative data.
originally proposed Confidential Information Protection and Statistical Efficiency Act (CIPSEA), which governs access to administrative records at other federal statistical agencies, was enacted. The remaining portion, which would have formalized the requirements for sharing data across federal agencies and with nongovernmental researchers, has never been enacted by Congress. Consequently, this type of sharing currently requires formal memoranda of understanding every time agencies share data with other agencies or with outside researchers, significantly raising the cost of using the data. In addition, part of the reauthorization of the Higher Education Opportunity Act (2008) prohibits the federal government from maintaining a national database of student records. Many of these laws were written more than 20 years ago and prior to the growth of new computing capabilities, the development of new technology to protect data, and the growth in the use of administrative data in research. Conducting large-scale studies of economic and social mobility requires that Titles 13 and 26, and CIPSEA be modernized, and that the federal student record ban be repealed.
Much headway has been made over the last quarter-century in building powerful new datasets, principally via cross-sectional and longitudinal linkages of administrative resources. These new datasets promise to revolutionize the study of mobility. Although, as discussed above, these data construction efforts might be supplemented and expanded in many ways, a larger challenge is ensuring that these new blended data continue to protect privacy while being securely available for a wide variety of researchers. It is fitting, therefore, to conclude this chapter with a discussion of the many dissemination and access challenges that remain.
Combining multiple data sources (including Census data, administrative and tax data, birth records, educational data, and family and wealth information) that may be linked both within a period and longitudinally raises the risks from disclosure. It is essential that these data be protected and accessed in a secure research environment, which is feasible using existing protocols and safeguards. For example, protections could follow the Five Safes discussed in National Academies (2024a,b): safe data, projects, people, settings, and output. The protections criteria could be guided by a risk-utility framework using both traditional disclosure and privacy-enhancing techniques (see National Academies, 2024b). Additional methods are in development at the federal statistical agencies (IRS, Census Bureau FSRDCs). Research from the National Secure Data Service demonstration
project (NSDS-D) may inform the development of a mobility data infrastructure. The committee agrees with the National Academies (2024b) report stating that the “framework for making decisions about acceptable disclosure risks given expected usefulness of data depends on whether that framework is dynamic,” and also with that report’s Conclusion 3.2, in that tiered access is a “key component of a dynamic disclosure risk/usefulness framework, to reflect differences in acceptable risks given policy priorities” (p. 43). In short, this report argues that increased data are required to conduct research on intergenerational mobility. Given this priority, multiple methods exist for managing the risks of disclosure when accessing data.
Because linked administrative data are typically analyzed in secure facilities, FSRDCs are a crucial component of the data access infrastructure. FSRDCs provide researchers secure access to restricted federal data outside of the Census Bureau. Currently, there are 33 FSRDCs located around the country, and the Census Bureau plans to add three more in 2024 and possibly another four in the near future. Nagaraj and Tranchero (2023) show that economic researchers closer to FSRDCs are more productive, underscoring the value of the FSRDC network. Efforts to include more data from federal statistical agencies in the FSRDC network would further enhance access and productivity. This would likely require additional funding to defray the costs of including data in the FSRDC network. Efforts to expand and simplify access to blended data are especially important for democratizing data access to junior and underresourced scholars.
One cost-effective alternative to building more FSRDCs is increasing virtual FSRDCs. This method of access was popularized during the CO-VID-19 pandemic out of necessity, and its success demonstrated the feasibility of doing so on a larger scale. With the end of the pandemic, virtual access to tax data has been rolled back. Increasing virtual FSRDC access gives many more researchers access, enhancing the benefits to the statistical agencies that share the data and to the research community (see National Academies, 2023b).
At the same time, many of the new nonfederal data sources (e.g., credit data, cell phone data, qualitative data) require the same level of protection, and increasing access to them requires FSRDC-like protections. This requires building facilities but also new procedures for reducing the costs of time-intensive disclosure avoidance review. Financial support for these activities—both for researchers submitting these disclosure requests and federal agencies reviewing the requests—is desperately needed. Enhanced technology, by improving access while protecting the confidentiality of the data, could reduce the time costs for researchers and reviewers alike (see National Academies, 2024b).
Before the data can be analyzed, researchers need to apply for Special Sworn Status—a time-consuming process. The Standard Application Process (SAP) allows researchers to request access to confidential data from federal statistical agencies using a single application portal. An oversight board is needed to monitor the introduction of the SAP to ensure that it is well supported, that information about it is well disseminated, and that it is otherwise institutionalized successfully. It is important to incorporate administrative data from nonstatistical federal agencies into the SAP system. In many cases, key nonfederal mobility datasets (e.g., the AVP) likewise are accessed via application, but the application procedures are not standardized and are often difficult to navigate.
Recent improvements in data access reduce the need for individual researchers to negotiate their own data use agreements. Although researchers interested in carrying out research with these new datasets (e.g., AOS) still have to apply to use the data via SAP, they will not have to negotiate one-off interagency agreements themselves. There will, however, inevitably still be boutique cases in which these new datasets are inadequate to the task and researchers thus need to negotiate new one-off agreements. The NSDS, which is designed to help facilitate access, linkages, and data sharing across federal agencies, has received pilot funding for this purpose (see National Academies, 2024b). The main goal of NSDS is to make it easier to secure access and carry out analyses and thus reduce time and cost for delivery. To ensure that these goals are met, it is important that the NSDS continue to report measures of time to delivery.17
An additional benefit of having a centralized data access system is the ability to archive data for use in replication. Archiving data in the FSRDCs is encouraged, and researcher requests for access are feasible, but they require a separate application and justification.
A related development at the subfederal level is the Coleridge Initiative, which facilitates data sharing across states and local agencies. To date, Coleridge’s efforts have focused exclusively on state data, so it is important to develop procedures that will allow the linking of federal and state data. The processes for combining privately owned data (e.g., cell phone data, credit reports) with state and federal datasets (see National Academies, 2023a,b, 2024b) need to be simplified. For example, a recent report on protecting privacy in the Survey of Income and Program Participation
___________________
recommended creating a secure online data access mechanism (see National Academies, 2023b).
As the discussion in this chapter makes clear, the United States is moving toward a modern integrated data system based on linked administrative data, a system that may ultimately rival the systems in European countries. Developing sustainable structures that ensure increased and equitable access to the new data resources remains the central challenge ahead. This will require cooperation among the many stakeholders in the data ecosystem, including the executive and congressional branches of government, federal statistical agencies, state and local government officials, the research community, potential private funding organizations, and the general public.
This chapter also makes clear that, while linked administrative data form the backbone of data resources used to study intergenerational mobility, research still requires survey data to understand factors affecting mobility that are not found in administrative data or are not made available for research purposes. Surveys provide information regarding attitudes, behaviors, and contexts not contained in administrative data. Other key factors (e.g., parenting practices) are not contained in administrative sources at all, and surveys will remain critical for ascertaining them. It is important that extensive surveys continue to be funded and that cost-effective ways be explored to alter survey methodology to leverage administrative data. Although the committee’s recommendations focus on the importance of increasing access to administrative data for research purposes, continuing to support survey work is just as important. It is critical to develop ways to combine administrative and survey data to enhance the value of both sources.
Conclusion 6-1: Research on mobility-relevant programs and policies requires the use of blended, multigenerational data for multiple domains—especially family, place, education, and wealth—including both surveys and administrative data (e.g., tax and benefit data). Also needed is a process for ensuring that qualified researchers can access these blended data within a secure environment; this process needs to explicitly recognize both the risk of using confidential administrative data in research and the benefits to society that can be produced from this research.
Mechanisms to allow agencies to share data and allow researcher access to the blended data may be restricted because the laws governing the
access are outdated and need to be changed. Several laws place significant limits on building or sharing data infrastructure. U.S. Code Title 13, which governs access to U.S. Census data, and U.S. Code Title 26, which governs access to IRS data, have been interpreted as specifying that research done with administrative data must improve the operation of the agency, regardless of the wider social benefits of this research. In addition, only one part of the originally proposed CIPSEA, which governs access to administrative records at other federal statistical agencies, was enacted. The remaining portion, which would have formalized the requirements for sharing data across federal agencies and with nongovernmental researchers, has never been enacted by Congress. Consequently, this type of sharing currently requires formal memoranda of understanding every time agencies share data with other agencies or with outside researchers, significantly raising the cost of using the data. In addition, part of the 2008 reauthorization of the Higher Education Act prohibits the federal government from maintaining a national database of student records. As discussed in Chapter 5, however, education through the postsecondary level is a key domain for economic and social mobility research. These laws were written prior to the growth of new computing capabilities, the development of new technology to protect data, and the growth in the use of administrative data in research. Large-scale studies of economic and social mobility would benefit from modernizing Title 13, Title 26, and CIPSEA and repealing the federal student record ban. In revising these statutes, the committee agrees with the recommendations in the Year 2 report of the Advisory Committee on Data for Evidence Building (2022) and the National Academies (2024a) report, which state that Congress should adopt standards that recognize the tradeoffs between the risk of disclosure of administrative data and the usefulness of research using these data.
Given the importance of using income to measure economic mobility, well-measured income for individuals and households is needed. As discussed in recent National Academies (2023c, 2024a,b) reports, improving income measurement requires both administrative and survey data. For example, comprehensive measures of household income and wealth will need information on Schedule C income; all income reported on 1099 forms; other business income, transfer payments, alimony, and child support; tax credits; and long- and short-term capital gains. For a complete income measure, it would be helpful for state agencies to provide benefit data to the U.S. Census Bureau, including state data from SNAP, the Temporary Assistance for Needy Families program, the HUD Section 8 housing voucher programs, the unemployment insurance system (including both recipient information and quarterly wage record data), postsecondary student financial aid programs, and other data included in a state’s SLDS. Such data sharing
would allow the data to be assigned PIKs, integrated into the administrative data infrastructure, and provided to researchers working in the FSRDC environment; states would require funding so that they can adopt systems and processes to comply with reporting requirements.
As discussed in Chapter 2, information about important events such as births, deaths, marriages, and divorce is essential in evaluating mobility. The data infrastructure for studying economic and social mobility will be strengthened if state agencies provide to the National Center for Health Statistics (NCHS) both current and past data that are part of the National Vital Statistics System. This would include the data already being reported to NCHS, as well as data on dates and locations of marriages, divorces, births, and deaths. States would require funding for adopting systems and processes to comply with the reporting requirements. The committee agrees with Recommendation 11-2 in a National Academies report (2023d) that calls for expanding the ability of agencies to share data across agencies and with researchers.
Recommendation 6-1: Building on the Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act), the chief statistician of the United States should work with federal agencies to advise legislators and policymakers to address the need for revisions to regulations to improve data sharing across federal statistical agencies, including
- revisiting U.S. Code Title 26 to allow the Internal Revenue Service to expand the ability to share tax data with the U.S. Census Bureau that are needed to create more comprehensive measures of household income and wealth;
- issuing the Presumption of Access rule that was part of the 2018 Evidence Act, thus ensuring that federal agencies share data among themselves; and
- requiring nonstatistical federal agencies to provide annual data to the U.S. Census Bureau so that it can conduct an annual administrative record census.
Conclusion 6-2: The data structure for studying economic and social mobility will be strengthened if state agencies provide (1) the U.S. Census Bureau with data on all programs that receive federal funding, and (2) the National Center for Health Statistics with both current and past data that are part of the National Vital Statistics System. States would require funding to adopt systems and processes to comply with reporting requirements.
Conclusion 6-3: The data infrastructure for studying economic and social mobility will be strengthened if the ban prohibiting the federal government from tracking students and from maintaining a national database of student records is repealed.
Beyond sharing across agencies, it is important to improve data access for researchers. Data access laws, such as U.S. Code Titles 13 and 26, need to be modernized to facilitate research, as they have proven to be inadequate amid the recent growth of research using linked administrative data. Furthermore, the application process to obtain these data in secure facilities (FSRDCs) is cumbersome and time consuming. Introducing the SAP could reduce the burden involved in applying for Special Sworn Status, but oversight is needed to ensure its effectiveness. Finally, a key advantage of emerging linked administrative datasets is the option for researchers to obtain data from multiple sources and agencies. Efforts such as the NSDS facilitate interagency data sharing and reduce the need to negotiate individual data use agreements. Streamlining processes for combining private data with state and federal datasets will also be crucial.
Recommendation 6-2: In order to evaluate the policies surrounding economic and social mobility, researchers require tiered access to new blended data. The chief statistician of the United States should work with the federal agencies to review and revise policies concerning external data sharing with the broader research community, including (1) revisiting the missions of federal statistical agencies to formally acknowledge the need for data sharing and the broader benefits to society of research itself; (2) using tiered access to support access for qualified external researchers; (3) expanding the Standard Application Process to ensure that all proposals are evaluated within 3 months; and (4) providing remote access to Federal Statistical Research Data Centers to facilitate data sharing with more researchers. The National Secure Data Service should also work with federal agencies to ensure that these improved data access, analysis, and linking mechanisms are implemented.
Recommendation 6-3: To increase the value of data for studying economic and social mobility, federal agencies should collaborate with the National Secure Data Service to improve the data acquisition and linking process by assigning protected identification keys (PIKs) to federal surveys:
- The U.S. Census Bureau should improve the person identification validation system (PVS), which should serve as the
- standard for linking all individual-level data held by all federal agencies.
- The U.S. Bureau of Labor Statistics and the U.S. Census Bureau should assign PIKs to all records that are part of the (i) monthly Current Population Survey back to 1963, (ii) Consumer Expenditure Survey, (iii) American Time Use Survey, and (iv) National Longitudinal Surveys.
- The U.S. Department of Education and the U.S. Census Bureau should assign a PIK to individual Free Application for Federal Student Aid forms, as well as to individual-level administrative data on federal financial aid receipt.
Some impediments to data access are neither legal nor regulative, but financial in nature. It is important to adequately fund agencies to allow them to provide data. Current funding is inadequate in a variety of areas, which impedes the development of evidence-based policy on mobility. Although the United States has multiple longitudinal surveys, they are not refreshed with new cohorts frequently enough because funding is difficult to secure. This means that recent changes in the composition of the U.S. population (e.g., immigration-induced changes) cannot be studied adequately. The NLSY is addressing this data problem (as well as many others) with new data collection efforts and plans for a new cohort in 2027. Understanding intergenerational mobility in the future requires new surveys or significantly enhanced versions of existing surveys to address the changing population of the United States. Resources are also required for (a) streamlining the data application process; (b) improving the person identification validation system (PVS) and allowing it to serve as the standard for linking all individual-level data; (c) subsidizing the institutional costs associated with having an FSRDC, which would allow researchers at a wider range of institutions to access restricted administrative data; and (d) expanding support for qualitative research on intergenerational mobility and incorporating this information into the FSRDC infrastructure.
Conclusion 6-4: In order to facilitate data access for studying economic and social mobility, funding is required for streamlining the data application process, improving linking, supporting Federal Statistical Research Data Centers, enhancing the survey infrastructure, and expanding qualitative research.
The U.S. research infrastructure (e.g., National Science Foundation, National Institutes of Health) does not single out mobility research as an important zone of inquiry that—like defense research, climate research, or cancer research—deserves special organizational support or funding.
Because research on mobility is central to understanding the functioning of the U.S. economy and society, this committee believes that it is important that the United States commit resources to the task of monitoring mobility and facilitating research to understand the processes of intergenerational mobility.
The United States does have many government agencies, research centers, and research initiatives that are relevant to research on mobility and opportunity (e.g., U.S. Department of Health and Human Services [HHS], Institute for Research on Poverty, Upward Mobility Initiative). The HHS (via the Office of the Assistant Secretary for Planning and Evaluation) currently funds a national research center—the National Center on Poverty and Economic Mobility—that fuses the objectives of reducing poverty and increasing mobility. These two objectives are, however, very distinct and may be secured through very different types of initiatives. It is important to recognize this distinction by setting up two research centers—one that addresses poverty and another that addresses economic and social mobility and opportunity.
A new National Mobility Center (NMC) would be designed to assist and organize the country’s research on economic and social opportunity. The NMC’s constituency would be, first and foremost, mobility researchers. The United States already has important and very successful organizations focused on assisting local leaders (e.g., the Upward Mobility Initiative), as well as organizations focused on issues relevant to service delivery (e.g., HHS). However, the United States does not have an organization targeted to support and organize research on trends in mobility, the causes of mobility, and the most effective interventions affecting the amount and type of mobility.
The purpose of the NMC would thus be to assist researchers in meeting the many challenges laid out throughout this report. The NMC would help researchers identify the key unresolved research questions in the mobility field, identify the existing programs and new interventions that would most benefit from a careful evaluation, sort through the hundreds of available datasets and settle on the most useful ones, garner access to datasets that are currently very difficult to access, negotiate the organizational complications of carrying out research in the secure settings (e.g., FSRDCs) that are increasingly central to the field, master the many discipline-specific models and measures for examining the many different types of mobility (e.g., earnings, income, wealth, occupation), and access the various tools (e.g., occupation crosswalks, contextual data) needed to carry out mobility research. For each of these challenges, the NMC would take on the short-term goal of assisting researchers with overcoming them, and the long-term goal
of working with agencies and constituencies to reduce the number of such challenges and thereby build a better research infrastructure for the future.
The main functions of the NMC may be divided into six zones:
Because the mobility field remains balkanized into disciplinary subfields focusing on particular types of mobility (e.g., economic, occupational, educational), it is difficult for researchers to understand the cross-cutting challenges facing the field as a whole. The first goal of the NMC would be to overcome these barriers by identifying unexploited opportunities for high-quality monitoring, model-building, and program evaluation. This requires distinguishing among zones (a) in which much research is available and further effort is probably unnecessary, (b) in which much research is available but key questions remain unresolved, and (c) that are wholly unexplored but might well warrant exploration. The field relies increasingly on quasi-experimental methods, and the results pertaining to the delimited zones in which natural experiments happen become a complicated patchwork. The goal of the NMC, in part, would be to ferret out when limited-scope experiments and quasi-experiments provide strong and compelling evidence and when additional work is needed before one can stitch together a compelling model. The simple goal is to direct energy and effort (especially among new entrants to the field) toward solving the field’s most important unresolved problems.
The NMC would also identify zones in which new pilots are warranted. The present report has noted, for example, that some of the most promising systemic and institutional interventions have been deprioritized because they are not amenable to randomized controlled trials, because of the lure of cheap nudges, or because no program already exists to render them testable via quasi-experiments. The NMC would be charged with identifying all interventions—systemic and otherwise—that are promising and would benefit from pilots.
The NMC would provide a one-stop center for scholars seeking to learn about available surveys, administrative data, proprietary data, and qualitative datasets. It would also facilitate the process of accessing these data. As has been documented throughout this report, research on U.S. mobility has been hampered by impediments to accessing, linking, and analyzing data. The NMC would provide workshops and training on overcoming these impediments and work closely with the NSDS in reducing them. The NMC would additionally identify surveys and other datasets that need to be updated as well as new types of mobility data that are currently undersupported (e.g., qualitative datasets on mobility).
To carry out mobility research, scholars rely on a host of resources that are widely scattered, such as occupational crosswalks,
contextual datasets, calculators for adjusting for inflation (e.g., CPI-U-RS,18 PCE,19 chained-weighted CPI), and much more. The NMC would provide links to these widely used resources for mobility research. It could also become a hub for rigorous qualitative research of mobility—training students and junior scholars in qualitative methods from formulating questions to sampling, data collection, and analysis.
The NMC would also bring together the many disciplines engaged in mobility research by providing workshops on key topics of interest (e.g., data access, new statistical quantitative methods, qualitative research methods) and hosting interdisciplinary conferences on key developments in the mobility field.
With the emergence of new linked administrative data, it is now possible to deliver real-time annual reporting of trends in mobility, much like trends in the Official Poverty and Supplemental Poverty measures are monitored annually in the Poverty in the United States series (authored by the U.S. Census Bureau). The NMC would, for example, collaborate with the NCES and MOVS initiatives to provide annual reports on (a) the effects of family economic background on college access (for the incoming cohort of college entrants), (b) the effects of family economic background on first jobs (for the incoming cohort of labor market entrants), and (c) the effects of family economic background on transitions to subsequent jobs (again by cohort). These reports would, just like the existing annual assessment of poverty, identify points of progress as well as problem areas that need to be targeted or watched.
The NMC, via these six functions, would not just have a symbolic function (i.e., symbolize the commitment of the United States to promoting opportunity) but would also assist in developing an evidence-based road map for promoting opportunity. The NMC, which would be a nongovernmental research center, could be funded by government agencies and private foundations.
Conclusion 6-5: The United States lacks an institutional body charged with ensuring that the country’s commitment to equal opportunity is properly considered when policy is developed and evaluated. A National Mobility Center could serve as a key resource for facilitating data access, reporting on current mobility statistics and analyzing trends, identifying promising systemic and institutional interventions, developing viable approaches for evaluation, and building an interdisciplinary research community to study economic and social mobility.
___________________
18 Consumer Price Index research series using current methods.
19 Personal Consumption Expenditures Price Index.
ANNEX TABLE 6-1 Surveys Frequently Used for Mobility Research
| Survey | Type | Birth Cohorts | Sample Size | Key Assets for Mobility Research | Key Liabilities |
|---|---|---|---|---|---|
| Panel Study of Income Dynamics (PSID) | Panel | Children of original PSID households: 1950–1968 birth cohorts | ~5,000 households (in original PSID sample), many supplemental samples | Sample grows “naturally” as children and grandchildren from initial families form their own households; rich and extensive data collection and data supplements; multigenerational interviews (i.e., grandparent, parent, adult child) | Small sample size |
| National Longitudinal Survey of Youth (NLSY79 and NLSY97) | Panel | NLSY79: 1957–1964 birth cohorts; NLSY97: 1980–1984 birth cohorts | NLSY79: ~13,000 young adults; NLSY97: ~9,000 young adults | Rich and extensive follow-up surveys; new NLSY planned for 2026 | Limited number of birth cohorts covered; small sample size |
| Wisconsin Longitudinal Study | Panel | 1938–1940 birth cohorts | ~10,000 Wisconsin high school graduates | Long-running coverage of full life course of Wisconsin high school graduates | Limited geographic coverage; limited number of birth cohorts covered |
| Future of Families and Child Wellbeing Study | Panel | 1998–2000 birth cohorts | ~5,000 children born in large cities | Rich and comprehensive interviews with parents and children | Small sample; limited number of birth cohorts; young age of children |
| National Longitudinal Study of Adolescent to Adult Health | Panel | 1975–1983 | ~20,000 adolescents | Initial wave in 1995 secured educational, income, and occupational data for one of the parents of each of the adolescent participants; comprehensive health data (including genetic markers) | Limited coverage of birth cohorts |
| Survey | Type | Birth Cohorts | Sample Size | Key Assets for Mobility Research | Key Liabilities |
|---|---|---|---|---|---|
| Survey of Income and Program Participation (SIPP) | Panel | 1910–present day | ~14,000–52,000 households per year | Comprehensive measurement of program participation; SIPP survey data can be matched to administrative earnings records (to carry out intergenerational analyses) | Intragenerational panels are relatively short in duration (up to 4 years) |
| Health and Retirement Study (HRS) | Panel | 1890–1971 (new birth cohorts added every 6 years) | ~40,000 respondents (~5,000 cases added every 6 years) | Extensive coverage of birth cohorts; retrospective data on childhood family life of HRS respondents; linked to earnings records, Medicare records, and other administrative data; extensive genetic data | Delayed availability of evidence on trends (respondents do not “age into” HRS until they are ~50 years old) |
| New Immigrant Survey (NIS) | Panel | 1938–2004 | First wave: ~9,000 adults and 1,000 children | Nationally representative samples of adult immigrants admitted to legal permanent residence (as well as child supplements) | Only provides information on intragenerational mobility |
| Children of Immigrants Survey (CIS) | Panel | 1977–1978 | First wave: ~5,000 8th- and 9th-grade students | Sample of second-generation immigrant children attending 8th and 9th grades in Miami/Ft. Lauderdale and San Diego | Only pertains to two birth cohorts and two metropolitan areas |
| High School and Beyond | Panel | ~1962–1965 (high school sophomores and seniors in 1980) | ~58,000 high school students | Rich coverage of secondary and postsecondary experiences (including transcripts and financial aid records); parent questionnaires available for sample of parents; extensive follow-up data | Limited coverage of birth cohorts |
| National Educational Longitudinal Study of 1988 | Panel | ~1974–1975 (8th graders in 1987–1988 school year) | ~25,000 8th graders (with subsequent sample freshening) | Rich coverage of middle school, secondary, and postsecondary experiences; students’ teachers, parents, and school administrators were also interviewed; extensive follow-up data | Limited coverage of birth cohorts |
| Educational Longitudinal Study of 2002 | Panel | ~1986–1987 (high school sophomores in 2002) | ~15,000 high school sophomores | Rich coverage of secondary and postsecondary experiences (including transcripts); students’ teachers, parents, and school administrators were also interviewed; extensive follow-up data | Limited coverage of birth cohorts |
| High School Longitudinal Study of 2009 | Panel | ~1994–1995 (9th graders in 2009) | ~23,000 9th graders | Rich coverage of secondary and postsecondary experiences (including transcripts); students’ teachers, parents, and school administrators were also interviewed; extensive follow-up data | Limited coverage of birth cohorts |
| Occupational Changes in a Generation (OCGI and OCGII) | Cross-section | OCGI: 1898–1942; OCGII: 1908–1953 | OCGI: ~20,000 adults; OCGII: ~34,000 adults | High-quality intergenerational occupation data (i.e., parental occupations secured retrospectively) | Only available for men |
| General Social Survey (GSS) | Cross-section | 1897–2005 | 1972–1993: ~1,500 adults/year; 1994–2004: ~3,000 adults biannually; 2006–present: ~4,500 adults biannually | Standardized interview protocol delivered biannually via face-to-face interviews; high-quality occupational information (parental occupations secured retrospectively) | Small per-year sample size |
| American Voices Project | Cross-section | 1944–2003 | ~2,700 adults | Nationally representative immersive interviews on one’s “life arc” | Small sample size |
ANNEX TABLE 6-2 Illustrative Aggregate and Contextual Datasets
| Name | Context | Data Type | Unit of Analysis | Illustrative Variables |
|---|---|---|---|---|
| National Neighborhood Data Archive | Spatial | Physical, economic, demographic, and social attributes of neighborhoods | Wide range of neighborhood-level spatial units (e.g., Census tract, zip code, county) | Health care, housing, partisanship, public transit, education, demographics, social services, stores, traffic, crime, civics |
| National Neighborhood Data Archive | Spatial | Physical, economic, demographic, and social attributes of neighborhoods | Wide range of neighborhood-level spatial units (e.g., Census tract, zip code, county) | Health care, housing, partisanship, public transit, education, demographics, social services, stores, traffic, crime, civics |
| Opportunity Atlas (opportunityatlas.org) | Spatial | Tract-level social mobility (and a range of other neighborhood characteristics) | Census tract | Adult income of children raised in neighborhood (by race, gender, parent income) and many other neighborhood characteristics (e.g., incarceration rate, poverty rate, job density) |
| Social Explorer (socialexplorer.com) | Spatial | Social and demographic attributes of neighborhoods and other spatial units | Wide range of spatial units (e.g., Census block group, zip code, congressional district, state) | Population, income, occupation, poverty, marital status, age, race, education, house value, crime, health, immigration status (for block groups, school districts, and other spatial units) |
| Segregation Explorer (edopportunity.org) | Spatial | Racial and economic segregation | X | X |
| Safegraph Places (safe-graph.com) | Spatial | Human mobility within towns, cities, and rural areas | Neighborhoods, parks, restaurants, and other “points of interest” | Number of visitors to points of interest (at a given time) |
| Justice Outcomes Explorer Within the Criminal Justice Administrative Records System (cjars.org) | Spatial | Criminal justice outcomes by state, county, and commuting zone | States, counties, commuting zones | Per capita rate of prison inmates, misdemeanor charge rate, annual employment rate of felony defendants, Medicaid take-up rates of parolees |
| Stanford Education Data Archive (edopportunity.org) | Educational institution | Test scores and changes in test scores | Schools (aggregated by school district, county, state, and other attributes) | Average test scores by county and school type (e.g., regular, charter, magnet) |
| Common Core of Data (nces.ed.gov) | Educational institution | Descriptive data on public elementary and secondary institutions | Elementary and secondary schools (aggregated by school type, state, and other attributes) | School enrollment, graduation rates |
| College Scorecard (collegescorecard.ed.gov) | Educational institution | Performance assessments of postsecondary institutions | Postsecondary educational institutions | Graduation rate, median earnings of attendees, other variables (many drawn from Integrated Postsecondary Education Data System) |
| Integrated Postsecondary Education Data System (nces.ed.gov/ipeds) | Educational institution | Descriptive data on postsecondary institution attributes | Postsecondary educational institutions | Tuition, admission rates, enrollment, financial aid, degrees conferred, student success, institutional resources |
| National Student Clearinghouse (nscresearchcenter.org) | Educational institution | Descriptive data on postsecondary institution attributes | Postsecondary educational institutions | Enrollment, program of study, retention rate, transfer rate, completers, time to credential, age, gender, race and ethnicity |
| Post-Secondary Employment Outcomes (https://lehd.ces.census.gov/data/pseo_experimental.html) | Educational institution | Descriptive data for postsecondary institutions, degree levels, and majors | Postsecondary educational institutions, degree levels, majors | Earnings and employment outcomes (via experimental tabulations developed by the Longitudinal Employer-Household Dynamics program at the U.S. Census Bureau) |
| Name | Context | Data Type | Unit of Analysis | Illustrative Variables |
|---|---|---|---|---|
| Bureau of Labor Statistics (bls.gov) | Labor market | Labor market conditions | Assortment of labor market contexts (e.g., national, industry, occupation, state) | Employment, compensation, productivity |
| Occupational Information Network (O*NET) (onetonline.org) | Labor market | Occupational activities, skills, values, and conditions | Detailed occupations | Working conditions (e.g., hazards, scheduling, arduousness), writing skills, educational requirements |
| Longitudinal Employer-Household Dynamics (lehd.ces.census.gov) | Labor market | Employment conditions and outcomes at the firm level | Firms (aggregated spatially or by firm characteristics) | Hires, separations, job creation, earnings (by detailed firm characteristics) |
| Veteran Employment Outcomes (https://lehd.ces.census.gov/data/veo_experimental.html) | Labor market | Employment outcomes for veterans at the industry and spatial level | Industries and states | Earnings (by years of service, military occupation, Armed Forces Qualification Test range, and more) |
| Fiscally Standardized Cities (lincolninst.edu) | Policy | Local government finances | Large U.S. cities (~200) | City-level revenues, expenditures, debt, assets |
| State Laws Related to Structural Racism (https://doi.org/10.1177/0033354920984168) | Policy | State laws affecting health of racial and ethnic groups | State (i.e., state-level laws linked to structural racism) | Stand-your-ground laws, mandatory minimum sentencing laws, voting rights laws |
| Correlates of State Policy (ippsr.msu.edu/public-policy/correlates-state-policy) | Policy | State laws relevant to fiscal policy, elections, criminal justice, education, welfare, health, labor, and environment | State | Minimum wage rates, concealed carry laws, environmental building standards, discrimination law, homeschooling law |
Established as part of the Foundations for Evidence-Based Policymaking Act “to review, analyze, and make recommendations to the White House Office of Management and Budget (OMB) Director on how to promote the use of federal data for evidence building” (Advisory Committee on Data for Evidence Building, 2022, p. 1).
When at least two data assets are combined to produce statistical information. Careful blending of data from multiple, complementary sources, such as combining or linking statistical surveys and censuses with data from administrative agencies, offers a way to generate more detailed, timely, and useful statistical information than is currently available (see National Academies, 2023a).
The Evidence-Based Policymaking Commission Act created the Commission on Evidence-Based Policymaking (CEP). The commission was tasked with examining ways to increase the availability and use of government data to build evidence while protecting data privacy and confidentiality. The commission’s report (CEP, 2017) informed the Foundations for Evidence-Based Policymaking Act (2018).
Combined sources of previously collected data, including data in non-tabular formats. The Confidential Information Protection and Statistical Efficiency Act (CIPSEA) was first enacted as Title V of the E-Government Act and was recodified as part of the Foundations for Evidence-Based Policymaking Act (2018). CIPSEA provides a strong statutory basis for the statistical system with regard to confidentiality protection and data sharing.
Includes data assets; the technologies used to discover, access, share, process, use, analyze, manage, store, preserve, protect, and secure those assets; the people, capacity, and expertise needed to manage, use, interpret, and understand data; the guidance, standards, policies, and rules that govern data access, use, and protection; the organizations and entities that manage, oversee, and govern the data infrastructure; and the communities and data subjects whose data are shared and used for statistical purposes and may be impacted by decisions made using those data assets (National Academies, 2023c).
The principal U.S. federal statistical agencies are Bureau of Economic Analysis (Department of Commerce); Bureau of Justice Statistics (Department of Justice); Bureau of Labor Statistics (Department of Labor); Bureau of Transportation Statistics (Department of Transportation); Census Bureau (Department of Commerce); Economic Research Service (Department of Agriculture); Energy Information Agency (Department of Energy); National Agricultural Statistics Service (Department of Agriculture); National Center for Education Statistics (Department of Education); National Center for Health Statistics (Department of Health and Human Services); National Center for Science and Engineering Statistics (National Science Foundation); Office of Research, Evaluation, and Statistics (Social Security Administration); and Statistics of Income (Department of Treasury). There are also three recognized federal statistical units: Microeconomic Surveys Unit (Federal Reserve Board), Center for Behavioral Health Statistics and Quality (Substance Abuse and Mental Health Services Administration, Department of Health and Human Services), and National Animal Health Monitoring System (Animal and Plant Health Inspection Service, Department of Agriculture).
Partnerships between federal statistical agencies and leading research institutions that provide secure environments to support qualified researchers using restricted-access data while protecting respondent confidentiality.
This statute requires agency data to be accessible and requires agencies to plan to develop statistical evidence to support policymaking (Foundations for Evidence-Based Policymaking Act, 2018).
An international consortium of more than 810 academic institutions and research organizations that maintains a data archive of more than 350,000 files and provides leadership and training in data access, curation, and methods of analysis for the social science research community.
Provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community context. Data and services are available free of charge.
Recommended by the Commission on Evidence-based Policymaking to facilitate access to data for evidence-building while ensuring privacy and transparency in how those data are used. The NSDS is envisioned as an added capacity for the federal statistical system to support (not supplant) ongoing work within the individual agencies and to provide a system-wide capacity to aid coordination, data sharing, data linkage, shared research and development, and other functions.
The Census Bureau’s Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, Census, and survey data to facilitate linkages across and within files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person.
A unique identifier for a person in a dataset, the protected identification key (PIK) is an anonymous identifier as unique as a Social Security number. The PIK links across all files that have been processed using PVS.
A uniform method for accessing federal confidential data assets to systematically provide permission to use protected data from any of the 16 federal statistical agencies and designated units for evidence-building.
Includes the major legal provisions related to the Census Bureau, including strict provisions for protecting the confidentiality of population and business information.
Applies to the statistical work conducted by the Census Bureau’s collection of data from the Internal Revenue Service (IRS) about households and businesses. Title 26 provides for the conditions under which the IRS may disclose federal tax returns and return information to other agencies, including the Census Bureau.