A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation (2024)

Chapter: Appendix B: Inferences Based on Multiple Synthetic Data

Previous Chapter: Appendix A: Technical Details on Measuring Disclosure Risk
Suggested Citation: "Appendix B: Inferences Based on Multiple Synthetic Data." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

Appendix B

Inferences Based on Multiple Synthetic Data

Denote the number of synthetic datasets by m, the estimate of the parameter of interest θ is θ ^ ( j ) and the within-set variability of θ ( j ) ^ by w(j) (Liu, 2022). Let W = m 1 j = 1 m w ( j ) (average within-set variability) and B = j = 1 m ( θ ^ ( j ) θ ^ ) 2 / ( m 1 ) (between-set variability).

Fully synthetic data: The final estimate of θ over m synthetic sets is θ ^ = m 1 j = 1 m θ ^ ( j ) and its estimated variability is given by T = (1 + m−1) BW. Hypothesis testing and confidence interval construction are based on the asymptotic assumption of T 1 2 ( θ ^ θ ) ~ N ( 0 , 1 ) .

Partial synthetic data with or without differential privacy (DP): The final estimate of θ over m synthetic sets is θ ^ = m 1 j = 1 m θ ^ ( j ) and the variance estimator is T = W + m−1B. Hypothesis testing and confidence interval construction are based on the asymptotic assumption of T 1 2 ( θ ^ θ ) ~ t v ( 0 , 1 ) , where the degrees of freedom are v = ( m 1 ) ( 1 + w m 1 B ) 1 . Though the inferential approaches based on multiple synthetic datasets are the same with or without DP, what is captured in the between-set variance component B is different between the two. For DP data synthesis, B has the extra variability in the synthetic data due to the employment of randomized mechanisms for achieving DP guarantees, compared to the case without DP.

Suggested Citation: "Appendix B: Inferences Based on Multiple Synthetic Data." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

This page intentionally left blank.

Suggested Citation: "Appendix B: Inferences Based on Multiple Synthetic Data." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.
Page 211
Suggested Citation: "Appendix B: Inferences Based on Multiple Synthetic Data." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.
Page 212
Next Chapter: Appendix C: Technical Details for Differential Privacy Table Builder
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.