Previous Chapter: AFDC-Census Match
Suggested Citation: "Conditional Independence." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.

record with the larger weight and it was included as another potential record for matching. (For the precise algorithm used, see Springs and Beebout [1976].)

STATISTICAL MATCHING: ADVANTAGES AND PROBLEMS

The Advantages of Statistical Matching

The greatest advantage of statistical matching in comparison with other techniques (mentioned below) is probably the great flexibility it provides to data users. As imputation provides data users with a rectangular data file that can be input directly into most statistical software packages, statistical matching creates a file on which a variety of analyses, often unanticipated, can be performed. Thus, if one would use iterative proportional fitting for some purposes, covariance matrices for another, etc., it does seem easier to simply create a statistically matched file, especially in those cases for which the analysis cannot be anticipated. If the conditional independence assumption is warranted, or is roughly valid, the creation of a statistically matched file is very convenient for most data users and one that should provide reasonable results. Statistical matching also allows considerable reduction in respondent burden and reduces the opportunity for data disclosure.

Problems Associated With Statistical Matching
Conditional Independence

As pointed out by Sims (1972), statistical matching assumes that Y and Z, given X, are independent. Records from the two files are matched or not matched on the basis of the values of X(A) and X(B). Therefore, there is no additional information in the matched file about the relationship between Y and Z that is not explained by the relationships between X and Y and between X and Z. That is, the approach assumes that if one were to regress a Yi on X(A) and Z, and then regress Yi on X(A), the multiple correlations in the two regressions would be identical.

Technically speaking, the procedure assumes that Y conditioned on X and Z conditioned on X are independent, or that the partial correlation between a Yi given X(A) and a Zj given X(B) is equal to 0 (which are equivalent notions if one assumes multivariate normality). It is important at this point to consider the mathematical definition of conditional independence. The partial correlation between Yi and Zj conditioned on X is equal to

Suggested Citation: "Conditional Independence." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.
Page 74
Next Chapter: Limitations in Modeling
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.