had to be matched or linked with 33,714 such returns in the SOI. At the other end of the income distribution, 4,277 low-income SOI records had to be matched with 17,647 CPS records.
The statistical match is accomplished using a transportation algorithm. The distance measure uses 10 variables, including family size, wage income, property income, and home ownership. Weights are frequently split, so the resulting file has more than 200,000 records.
After the merge is completed, the CPS nonfilers are appended. Then families are reconstructed. The resulting file can be used to simulate the behavior of individual taxpayers and their households in a microsimulation model. The merge file is reconstituted on a biennial basis.
Okner (1972, 1974) describes a statistical match between the 1966 IRS tax file and the 1967 Survey of Economic Opportunity (SEO), called the 1966 Merge File, in order to develop a “consistent and comprehensive set of household income data.”
The SEO population was chosen as file A. It gave a stratified representation of the total U.S. population on a family basis. The income information included data on both taxable and nontaxable sources of income. The SEO also contained rich demographic data, but it was inadequate by itself because the income data were understated for the wealthier families.
To remedy these problems a statistical match was performed. First, some pretreatment was necessary. SEO households and individuals who would not have filed an income tax return were excluded from the match. A number of other pretreatments did not interact with the statistical match: these included algorithms to allocate rent, interest, and dividends to the members of a household and allocation of pension income. The IRS and SEO data were then statistically matched. First, tax units were grouped into “equivalence classes” defined by marital status, whether over age 65, number of dependent exemptions, and the reported pattern of income. Unmodified, these groupings would have resulted in over 1,000 different equivalence classes. Instead, the number of equivalence classes was reduced, usually through combination of classes using marital status and an indicator variable for over age 65. The final number of equivalence classes was 74, containing 28,643 tax units. Then, for two records in the same equivalence class, a consistency score (distance function) was computed using factors such as home mortgage interest deduction, interest or dividend income, and farm income. Certain restrictions were used to limit the inconsistency of two potentially matched records. Within the acceptable consistency range, records were matched randomly but proportionally to the sampling weight of the return in the tax file. On a few occasions, no records satisfied the consistency restrictions, and the restrictions were then slightly broadened. This procedure
Sign in to access your saved publications, downloads, and email preferences.
Former MyNAP users: You'll need to reset your password on your first login to MyAcademies. Click "Forgot password" below to receive a reset link via email. Having trouble? Visit our FAQ page to contact support.
Members of the National Academy of Sciences, National Academy of Engineering, or National Academy of Medicine should log in through their respective Academy portals.
While logged on as a guest, you can download any of our free PDFs on nationalacademies.org . You will remain logged in until you close your browser.
Thank you for creating a MyAcademies account!
Enjoy free access to thousands of National Academies' publications, a 10% discount off every purchase, and build your personal library.
Enter the email address for your MyAcademies (formerly MyNAP) account to receive password reset instructions.
We sent password reset instructions to your email . Follow the link in that email to create a new password. Didn't receive it? Check your spam folder or contact us for assistance.
Your password has been reset.
Verify Your Email Address
We sent a verification link to your email. Please check your inbox (and spam folder) and follow the link to verify your email address. If you did not receive the email, you can request a new verification link below