Previous Chapter: The EM-AF Statistical Match
Suggested Citation: "Merge File of the Office of Tax Analysis." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.

with the results of the initial match. The dissatisfaction was primarily with estimates of numbers of recipients and aggregate amounts for several income types in the AF. The presence of several income types was given a larger role in the rematch in order to improve these estimates.

Finally, Radner (1983:137) notes: “Because the EM sample…contains few high-income records…it was decided to add more AF records at $30,000 AGI (adjusted gross income) and above.” For the EM records, this match replaced the previous match.

This statistical match was followed by several instances of controlling to various totals from a variety of sources using a variety of techniques, including the addition of recipients for certain income types, such as transfer income. Other changes were the additional inflating of amounts and audit corrections. The usual CPS weights were not used; instead, the weights were adjusted for the census undercount and for consistency with administrative control totals.

The model that used the statistical match for input was not strictly a microsimulation model, since no program changes were simulated with this data set, just changes to income estimates for various subgroups. However, the adjustments used are typical of those used on microsimulation data sets and so are relevant to consider.

Merge File of the Office of Tax Analysis

The federal individual income tax form lacks certain types of information, including sources of income and types of expenditures not subject to tax under current law, links between taxpayers and families, and information on nonfilers. These limitations motivated the creation of the merge file in the Office of Tax Analysis of the Internal Revenue Service (IRS) (see Bristol [1988] and Cilke, Nelson, and Wyscarver [1988]; creation of the most recent merge file is described in detail in Chapter 8 of Volume I). The merge file represents a statistical match of about 60,000 sample households from the March CPS with about 90,000 tax returns obtained from the Statistics of Income (SOI) sample of IRS records. Prior to the match, several variables are imputed to the SOI records, such as itemized deductions for nonitemizers and the share of earnings attributable to husbands and wives. The CPS data are also modified in several ways, including the correction of certain types of income for nonreporting and underreporting.

Before the statistical match is performed, the two files are “aligned” so that they represent the same universe. This process, a complicated form of reweighting, is not very successful because the CPS file has many low-income units and the SOI file is relatively rich in high-income units. For example (Bristol, 1988:118):

In the 1981 merge, 300 “returns” in the highest income class of the CPS

Suggested Citation: "Merge File of the Office of Tax Analysis." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.
Page 69
Next Chapter: 1966 Merge File for Household Income Data
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.