Dropdown items
My Academies

Personal Library

Account settings

Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers (1991)

Chapter: Conditional Independence

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter Next chapter
Page of 351
Search this publication

Page 74 Cite Bookmark

Suggested Citation: "Conditional Independence." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.

record with the larger weight and it was included as another potential record for matching. (For the precise algorithm used, see Springs and Beebout [1976].)

STATISTICAL MATCHING: ADVANTAGES AND PROBLEMS

The Advantages of Statistical Matching

The greatest advantage of statistical matching in comparison with other techniques (mentioned below) is probably the great flexibility it provides to data users. As imputation provides data users with a rectangular data file that can be input directly into most statistical software packages, statistical matching creates a file on which a variety of analyses, often unanticipated, can be performed. Thus, if one would use iterative proportional fitting for some purposes, covariance matrices for another, etc., it does seem easier to simply create a statistically matched file, especially in those cases for which the analysis cannot be anticipated. If the conditional independence assumption is warranted, or is roughly valid, the creation of a statistically matched file is very convenient for most data users and one that should provide reasonable results. Statistical matching also allows considerable reduction in respondent burden and reduces the opportunity for data disclosure.

Problems Associated With Statistical Matching

Conditional Independence

As pointed out by Sims (1972), statistical matching assumes that Y and Z, given X, are independent. Records from the two files are matched or not matched on the basis of the values of X(A) and X(B). Therefore, there is no additional information in the matched file about the relationship between Y and Z that is not explained by the relationships between X and Y and between X and Z. That is, the approach assumes that if one were to regress a Y_i on X(A) and Z, and then regress Y_i on X(A), the multiple correlations in the two regressions would be identical.

Technically speaking, the procedure assumes that Y conditioned on X and Z conditioned on X are independent, or that the partial correlation between a Y_i given X(A) and a Z_j given X(B) is equal to 0 (which are equivalent notions if one assumes multivariate normality). It is important at this point to consider the mathematical definition of conditional independence. The partial correlation between Y_i and Z_j conditioned on X is equal to

Next Chapter: Limitations in Modeling

Subscribe to Emails from the National Academies

Stay up to date on activities, publications, and events by subscribing to email updates.