The Survey of Earned Doctorates (SED) is conducted annually by the National Research Council and is a census of the research doctorates awarded at US universities during the academic year, from July 1 of one year to June 30 of the following year. The self-report response rate from the PhD recipients is about 95%, and information on the remaining 5% of the doctorates is obtained from commencement programs and institutional sources. The survey gathers information on all fields that award research and applied-research doctorates, except professional degrees such as the MD, DDS, OD, DVM, and JD. It gathers data on a field-specific basis, and includes information on ethnic background, sex, postsecondary education, time to PhD degree from the baccalaureate degree, financial support during graduate studies, and postdoctoral plans. The data from the survey become part of the Doctorate Records File (DRF), a virtually complete database on doctorate recipients from 1920 to the present. The data in this file can be manipulated in different ways to obtain the characteristics of graduates by nearly 20 broad fields or several hundred fine fields with regard to their institution, their graduate program, and their plans. The data in the DRF are kept on an individual basis and are linked to other files, such as the file for the Survey of Doctorate Recipients (see below) and the National Institutes of Health grants files.
In the life-science fields included in this report, 7,696 doctorates were added to the DRF in 1996. The field specialties in the life sciences include the agricultural and biomedical sciences and a portion of the health sciences as broad fields, and these are divided into 67 fine-field specialties.
The information in the DRF is complete and reliable for most data points. However, in the case of the data on sources of support during graduate school, students are not always aware of their sources or the type of support, and for postgraduate plans, the survey questionnaire might be complete at a time before a definite commitment or reflect a hope of a particular type of postdoctoral position.
The Survey of Doctorate Recipients (SDR) is a biennial longitudinal survey, dating to 1973, of research doctorate-holders working in the United States. The sample for each survey period is adjusted by the addition of persons from the most recent 2-year cohort in the DRF and the dropping of persons who have retired or have reached the age limit of the survey. Before 1991, the population of the survey included a broader range of people, such as holders of US-earned doctorates in humanities, education, and professional fields who were working in science and engineering (S&E), holders of foreign-earned doctorates who were working in S&E in the United States, and a 42-year period of PhD cohorts. The SDR was restructured in 1991 to include only persons under the age of 76 years who hold doctorates in S&E from US universities, and the sample was reduced by 55% to provide resources to increase the response rate.
The survey questionnaire is sent in the spring to each person in the sample. In 1995, the sample numbered 49,829. The people in the sample are
asked a series of demographic and employment questions. The response rate for the survey in 1995 was about 85% after second-wave mailings and telephone interviews; this was about a 30% increase in the response rate over 1989. Although the reduction of the sample reduced the overall number of responses from 1989 to 1995, it is believed that the increased response rate improves the quality of the data. However, the change in the survey produced a potential disjunction between data collected before 1991 and those collected since.
The sample is stratified across three variables: field of degree, sex, and a combination variable that includes degree field, sex, handicap status, ethnic group, and nationality of birth. The results of the survey are statistically analyzed to translate the data into weighted numbers for the entire population. From the weighted results, the doctorate workforce in S&E can be analyzed across different dimensions by looking at different demographic and employment characteristics and by taking different cohorts. That provides for both longitudinal and time-series analyses. However, in the analysis, one must take into consideration the change in sampling frame, the increased response rate in 1991, and the fact that some cells in an analysis could contain very few actual responses, in that the sample is only about 8% of the S&E workforce.
Data available from the SDR up to 1991 are field of doctorate and employment, sector of employment, geographic location, primary work activity, federal support, tenure status, salary data, and ethnic data. However, the 1991 SDR was administered in the fall, not the spring; some data points are not directly comparable with those from other survey years. The 1993 questionnaire incorporated substantial changes from earlier ones. In particular, the questionnaire before 1993 asked for data only as of a specific time, but the 1993 questionnaire asked for some retrospective employment information. There was also a change in the field employment questions, with much broader definitions of job categories, such as "biological scientist", as opposed to, for example, "ecologist" in the earlier surveys. As a result, the number of people in postdoctoral positions might have been slightly overestimated. In 1995, additional questions concerning detailed retrospective descriptions of the time spent in postdoctoral training were added.
The SDR is a sample survey of about 8% of PhD awards, and the number of responses might be low in some cases. A weighting formula is used to adjust the sample to the complete population. For example, a weighted response of 39 unemployed life scientists from the 26 high-quality institutions in 1995 corresponds to five responses; the 20 people working outside S&E in the same population is based on three responses. In the experience of the National Research Council's Office of Scientific and Engineering Personnel who have worked with these data for many years, a response of 10 or more provides a good estimate for a category. Although the sample is small and the analyses must be used with care, the sampling and weighting methods have been carefully developed to provide the most statistically valid results possible.
The National Science Foundation (NSF) conducts various surveys and data-collecting procedures as part of its responsibility in monitoring the state of science and engineering development in the United States. The survey that pertains most closely to graduate and postdoctoral training is the annual Survey of Graduate Students and Postdoctorates in Science and Engineering.
This survey is designed to provide a comprehensive picture of training of future scientists and engineers in US graduate schools and is used to assess future supply and demand. Graduate students counted in the survey are enrolled for credit in science and engineering master's-degree and PhD programs in the fall term of the survey year, and MD, DO, DVM, and DDS candidates are reported only if they will also receive a PhD. The survey also includes information on postdoctoral appointees and other nonfaculty researchers in academic departments and programs.
The survey is distributed to departments through an institutional coordinator and information is provided on students that are associated with departments. Nearly 10,400 graduate departments at 730 institutions are surveyed. Students in interdisciplinary or interinstitutional programs are reported only by their primary department. Therefore, information about individual programs could be distributed across departments, and data would be aggregated for departments with multiple degree programs.
The following types of information are requested:
The NSF requests that the survey form be returned by January 31 for data on the previous fall enrollments. The data are reported in a series of reports, many of which are available online through the Internet, on the different aspects of education by institution and field within the institution. However, data tapes will provide more detailed information on separate departments.
Data in table E.3, and figures 2.3 and 2.6 are taken from this NSF survey and are not directly comparable with other data, from the SED and SDR, used throughout the report. The NSF survey counts only persons at academic institutional whereas the SDR counts PhDs in all work environments. Furthermore, NSF definitions of fields differ somewhat from those used in this report (Appendix D). Those differences are not important when addressing questions about graduate students, because students are at academic institutions where NSF performs its survey. However, large differences in the count of postdoctoral fellows can exist between the NSF survey and the SDR. We have used the NSF count of postdoctoral fellows at academic institutions as a starting point because NSF counts both US citizens and foreign nationals, whereas the SDR excludes foreign nationals who have not received their PhD in this country. We have then estimated the number of postdoctoral fellows who might be in government, industry, and other nonacademic laboratories to obtain an estimate of the overall number of postdoctoral fellows in the United States.
The quality of the survey data depends on the knowledge of the persons at the department level who complete the survey.
The Association of American Medical Colleges (AAMC) maintains several data bases that contain information on US medical personnel. One particularly relevant personnel system is AAMC's Medical Faculty Roster.
The Medical Faculty Roster is a comprehensive data directory of medical-school faculty, including education and employment history, nature of current activities, degrees, rank, and ethnicity. The data for this system are collected continuously from medical schools, as changes occur, through questionnaires that are completed by the faculty members. The accuracy of the data is considered to be very high, as was demonstrated by pilot samples for different studies conducted by AAMC. Data from this system can be linked to other data sources through Social Security numbers.