Previous Chapter: Limitations in Modeling
Suggested Citation: "Reweighting of File B Data Resulting From Statistical Matching." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.

applied without consideration given to the matching that was used to create the merged file. For example, Klevmarken (1982) has shown that the parameters of a regression model of the form

where X1 indicates a subset of the matching variables and Y1 indicates a subset of the variables in the first file, using a statistically matched file, are not estimable unless the number of variables in Y1 is fewer than the number of matching variables excluded from X1.

Error Resulting From the Distance Between X(A) and X(B)

Another problem with statistical matching is the failure of the matched two records to have identical values for the matching variables, that is, the failure for X(A) to equal X(B). It is obvious that these two vectors will not necessarily agree. This disagreement adds an additional assumption that an analyst must rely on: that the relationship between Z and X is smooth. The discrepancy between X(A) and X(B) is, of course, largest when matches are hardest to find, namely the sparse regions of X-space. These records will find matches generally closer to the center of the data set, adding a bias to the statistical match. One way to remove or reduce this bias is to use a form of parametric statistical matching, for example, through the use of regression.

Sims (1978:175) warns: “In sparse regions we are almost bound to distort the joint distribution in synthetic file formation, unless we go beyond ‘matching’ to more elaborate methods of generating synthetic observations.” To check the effect of imperfect matching, Sims (1978) suggests the following procedure. Perform the regression Z1 equals b X(B) for some variable Z1 contained in Z. Then compare the output generated from the file [(X(A), Y, Z)] and the file {X(A), Y, Z+b[X(A)−X(B)]}. If the inference is similar, it is likely that matching bias has not affected the data set appreciably. However, if the two data sets produce substantially different results, some accounting for the effects of “far” matches is needed. In a related idea, Sims (1974) suggests only matching in areas where the data are dense. Otherwise, regression models could be used, but adjusted by the difference between the regression model and the matched value for the nearest “matchable” points.

Paass (1985) suggests that one choose a small number of X(A) variables to reduce the size of this bias, since matches are then easier to find. However, this approach will reduce the correlations between the matching variables and the singly occurring variables.

Reweighting of File B Data Resulting From Statistical Matching

A related problem concerns an additional impact of a statistical match on the

Suggested Citation: "Reweighting of File B Data Resulting From Statistical Matching." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.
Page 76
Next Chapter: Iterative Proportional Fitting
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.