Completed
Topics
This workshop will explore the implications of using multiple data sources for major survey programs, including linking data to survey responses to make surveys less burdensome and more efficient. The existing data infrastructure for the social sciences relies heavily upon statistics collected in surveys and censuses. However, there are many other sources of data that increasingly available, including administrative data from state and local governments, institutions, and businesses, and 'big data' from the internet, connected devices, and sensors.
Featured publication
Consensus
·2023
Much of the statistical information currently produced by federal statistical agencies - information about economic, social, and physical well-being that is essential for the functioning of modern society - comes from sample surveys. In recent years, there has been a proliferation of data from other...
View details
Description
The National Academies of Sciences, Engineering, and Medicine will appoint an ad hoc committee to produce three complementary reports on topics that will help guide the development of a vision for a new data infrastructure for federal statistics and social and economic research in the 21st century. The topics the committee will explore include the following:
Report 1: The components and key characteristics of a 21st Century Data Infrastructure including:
- The challenges and opportunities related to data infrastructure governance;
- The skills, capabilities, techniques, and methods required by the new data infrastructure; and
- Issues related to sharing non-traditional data assets, including state and local government, institutional, private sector, and sensor data;
Report 2: The implications of using multiple data sources for major survey programs, including:
- Addressing changes in measurement with new data sources;
- Approaches for linking alternative data sources to universe frames to assess and enhance representativeness; and
- Implications of new data sources for population subgroup coverage, and life course longitudinal data;
Report 3: Protecting privacy, confidentiality, and avoiding harm throughout the lifecycle of blended data. The report will:
- Identify aspects of sharing and analyzing private and confidential data that can be addressed by technical solutions, and identify research gaps;
- Identify aspects of sharing and analyzing private and confidential data that may require policy solutions; and
- Develop frameworks for designing and evaluating integrated technical and policy solutions that can guide best practices for sharing, using, and analyzing blended private and confidential data. The frameworks will be illustrated using selected case studies and assessed for fit to intended outcomes of the new data infrastructure . We will also identify areas where solutions are not known as part of the report.
The committee for each report will convene a 1.5 day virtual public workshop for each topic to seek input from key stakeholders and external experts relevant to the specific charge. Each committee will issue a report that summarizes the committee’s findings and conclusions from the workshop and other information gathered relevant to the charge, as appropriate. These reports will help inform a vision for a new data infrastructure and will not include recommendations. The three reports will follow institutional guidelines and be subject to the National Academies review procedures prior to release.
Collaborators
Committee
Chair
Member
Member
Member
Member
Member
Member
Member
Member
Member
Brian Harris-Kojetin
Staff Officer
Sponsors
National Science Foundation
Staff
Krisztina Marton
Lead