A STATISTICAL AGENCY SHOULD STRIVE CONTINUALLY for the widest possible dissemination of the data it compiles, consistent with its obligations to protect confidentiality. Data should be disseminated in formats that are accessible and accompanied by documentation that is clear and complete. Dissemination should be timely, and information should be made readily available on an equal basis to all users. Agencies should have data curation policies and procedures in place so that data are preserved, fully documented, and accessible for use in future years.58
Planning for dissemination should be undertaken from the viewpoint that the public has contributed the data elements and paid for the data collection and processing. In return, the information should be accessible in ways that make it as useful as possible to the largest number of users—for decision making, program evaluation, scientific research, and public understanding.
An effective dissemination program is comprised of a wide range of elements:
__________________
58 Data curation involves the management of data from collection and initial storage to archiving (or deletion should the data be deemed of no further use—e.g., a data file that represents an initial stage of processing). The purpose of data curation is to ensure that information can be reliably retrieved and understood by future users.
59 Statistical Policy Directive No. 3 (U.S. Office of Management and Budget, 1985) prescribes a yearly calendar of firm, fixed release dates for key economic indicators; Statistical Policy Directive
Data release of aggregate statistics may take the form of regularly updated time series, cross-tabulations of aggregated characteristics of
__________________
No. 4 (U.S. Office of Management and Budget, 2008) lays out best practices for dissemination of other federal statistics (see Appendix A).
60 For example, the Bureau of Labor Statistics holds an Annual CE [Consumer Expenditure Surveys] Microdata Users’ Workshop: see https://www.bls.gov/cex/csxannualworkshop.htm [April 2017].
61 See, for example, http://www.data-archive.ac.uk/ [April 2017].
respondents, analytical reports, interactive maps and charts, and brief reports of key findings. Such products should be readily accessible through an agency’s website, which should also make available more detailed tabulations in formats that are downloadable from the website. Agencies should take care in designing their websites to make it as easy as possible for users to locate and access information, testing accessibility and usability with a range of users.
A relatively new way for agencies to expand public use of their aggregate statistics is by providing selected data through application programming interfaces (APIs) to developers who, in turn, build custom applications for the Internet, smartphones, and similar media. For example, the Census Bureau’s APIs include neighborhood population characteristics and county-level information on business activity.62
Yet another form of dissemination involves access to individual-level microdata files, which make it possible to conduct in-depth research in ways that are not possible with aggregate data. PUMS files can be developed for general release. Such files contain data for samples of individual respondents that have been processed to protect confidentiality by deleting, aggregating, or modifying any information that might permit individual identification.63
While honoring their obligation to be proactive in seeking ways to provide data to users, statistical agencies must be vigilant in their efforts to protect against disclosure of data obtained under a pledge of confidentiality (see Practices 7 and 8). The stunning improvements over the past three decades in computing speed, power, and storage capacity, the growing availability of information from a wide range of public and private sources on the Internet, and the increasing richness of statistical agency data collections have increased the risk that individually identifiable information can be obtained through reidentification of data thought to have been suitably protected (see Doyle et al., 2001; National Academies of Sciences, Engineering, and Medicine, 2017b:Ch. 5; National Research Council, 2003b, 2005b:Ch. 5). In response, statistical agencies may have to scale back the detail that is provided in PUMS files or other public data products.
As an alternative to public access, statistical agencies have pioneered several methods of restricted access. One method is to provide or arrange
__________________
62 See http://www.census.gov/data/developers/about.html [April 2017]; see also National Research Council (2012).
63 For a review of methods for confidentiality protection of PUMS files, see Federal Committee on Statistical Methodology (2005).
for a facility on the Internet to allow researchers to analyze restricted microdata to suit their purposes, with safeguards so that the researcher is not seeing the actual records and cannot obtain any output, such as too-detailed tabulations, that could identify individual respondents.64 A second method, pioneered by the National Center for Education Statistics (NCES), is to grant licenses to individual researchers to analyze restricted microdata at their own sites: such licenses require that the researchers agree to follow strict procedures for protecting confidentiality and accept liability for penalties if confidentiality is breached.65 A third method is to allow researchers to analyze restricted microdata at a secure site, such as one of the Federal Statistical Research Data Centers (FSRDCs) currently located at two dozen universities and research organizations around the country. The FSRDC network began as a Census Bureau initiative and now includes data from other agencies.66 Statistical agencies should continually seek to enlarge their suite of restricted access methods and, for each, to reduce as much as possible the cost, time, and burden of access for users.
__________________
64 The Data Enclave of NORC at the University of Chicago is such a facility: see http://www.norc.org/Research/Capabilities/Pages/data-enclave.aspx [April 2017]. It provides secure access by researchers to selected microdata sets of the Economic Research Service, the National Center for Science and Engineering Statistics, and several other federal agencies and private foundations. NCES provides similar functionality for access to its data sets: see, e.g., https://nces.ed.gov/datalab/ [April 2017].
65 For NCES’s licensing procedures and terms, see https://nces.ed.gov/statprog/instruct.asp [April 2017].
66 See https://www.census.gov/fsrdc [April 2017].