Dropdown items
My Academies

Personal Library

Account settings

Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers (1991)

Chapter: Reweighting of File B Data Resulting From Statistical Matching

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter Next chapter
Page of 351
Search this publication

Previous Chapter: Limitations in Modeling

Page 76 Cite Bookmark

Suggested Citation: "Reweighting of File B Data Resulting From Statistical Matching." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.

applied without consideration given to the matching that was used to create the merged file. For example, Klevmarken (1982) has shown that the parameters of a regression model of the form

where X₁ indicates a subset of the matching variables and Y₁ indicates a subset of the variables in the first file, using a statistically matched file, are not estimable unless the number of variables in Y₁ is fewer than the number of matching variables excluded from X₁.

Error Resulting From the Distance Between X(A) and X(B)

Another problem with statistical matching is the failure of the matched two records to have identical values for the matching variables, that is, the failure for X(A) to equal X(B). It is obvious that these two vectors will not necessarily agree. This disagreement adds an additional assumption that an analyst must rely on: that the relationship between Z and X is smooth. The discrepancy between X(A) and X(B) is, of course, largest when matches are hardest to find, namely the sparse regions of X-space. These records will find matches generally closer to the center of the data set, adding a bias to the statistical match. One way to remove or reduce this bias is to use a form of parametric statistical matching, for example, through the use of regression.

Sims (1978:175) warns: “In sparse regions we are almost bound to distort the joint distribution in synthetic file formation, unless we go beyond ‘matching’ to more elaborate methods of generating synthetic observations.” To check the effect of imperfect matching, Sims (1978) suggests the following procedure. Perform the regression Z₁ equals b X(B) for some variable Z₁ contained in Z. Then compare the output generated from the file [(X(A), Y, Z)] and the file {X(A), Y, Z+b[X(A)−X(B)]}. If the inference is similar, it is likely that matching bias has not affected the data set appreciably. However, if the two data sets produce substantially different results, some accounting for the effects of “far” matches is needed. In a related idea, Sims (1974) suggests only matching in areas where the data are dense. Otherwise, regression models could be used, but adjusted by the difference between the regression model and the matched value for the nearest “matchable” points.

Paass (1985) suggests that one choose a small number of X(A) variables to reduce the size of this bias, since matches are then easier to find. However, this approach will reduce the correlations between the matching variables and the singly occurring variables.

Reweighting of File B Data Resulting From Statistical Matching

A related problem concerns an additional impact of a statistical match on the

Next Chapter: Iterative Proportional Fitting

Subscribe to Emails from the National Academies

Stay up to date on activities, publications, and events by subscribing to email updates.

My Academies

Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers (1991)

Chapter: Reweighting of File B Data Resulting From Statistical Matching

Error Resulting From the Distance Between X(A) and X(B)

Reweighting of File B Data Resulting From Statistical Matching