Minutes of the December 3, 2010 COPAFS Meeting

COPAFS Chair Judie Mopsik started the meeting, and we started with Ed Spar’s Executive Director’s report.

Ed Spar. Executive Director’s Report

Spar said there was reason to worry about the 2011 budget, as a continuing resolution (CR) needed to be signed by that night. The suspense was not about whether a CR would be passed, but with the form it would take, and the amount of time it would cover. For example, a CR extending to March would not give the Census Bureau time to hire the additional staff needed for the increased American Community Survey (ACS) sample, and deferring final budgets to the next Congress would add more uncertainty. A lengthy CR also would impact vital statistics and other data collection programs at NCHS, and upgrades to the Consumer Expenditure survey would be threatened. COPAFS is a well connected group, and before Spar concluded his remarks, an attendee relayed word that the CR had passed. It extends only through December 18, so there is hope for an omnibus bill that could enable agencies to proceed with enhancements planned for FY 2011.

Spar noted that space is still available for the December 14-15 OMB policy seminar, and he reported that the dates for the 2011 COPAFS meetings are March 4, June 3, September 9, and December 2.

COPAFS Chair Judie Mopsik announced the slate for the 2011 COPAFS Board as follows.

Felice Levine Chair
Maurine Haver Vice Chair
Seth Grimes Treasurer
Linda Jacobsen Secretary
Bob Parker At large
Ralph Rector At large
Ken Hodges At large
Chet Bowie At large

The slate was approved by voice vote.

Plans for the Research and Testing Phase of the 2020 Census
Frank Vitrano. U.S. Census Bureau

Vitrano started with a 2010 census update, where data capture is complete, local census offices have been closed, and final editing and imputation are underway. The Demographic Analysis estimates will be released December 6, followed by the first ACS 5-year estimates on December 14. The Census Bureau will stress that these are not 2010 census data, to minimize confusion with the first 2010 census counts – the state population totals due by December 31.

The Census Bureau is establishing a 2020 Directorate that will start small, but grow as the focus shifts to the 2020 census. The 2020 directorate will include the ACS, and seek integration with the work of a new Research and Methodology Directorate. The Research and Methodology Directorate will contribute generalizable knowledge and interdisciplinary research to the census and ACS. The directorate is being structured with five centers: economic studies, statistical research, methods research, disclosure avoidance research, and administrative records and applications.

The guiding principles for 2020 are as follows: a complete and accurate census count, embraced and valued results, and a census that is efficient and well managed. An overarching goal is to hold per household costs to no more than those of the 2010 census, while maintaining the quality of the count. Plans also call for using the ACS as a test bed for 2020 census programs – an approach that will enable a shift to an emphasis on many small tests in contrast to the few large tests conducted in advance of the 2010 census.

Vitrano showed a chart illustrating the rapid increase in per household census costs since 1970, and the projected increase for 2020. The increase is considered unsustainable, and Vitrano reviewed the major drivers of costs and the proposed response. For example, the Census Bureau plans to expand response options to offset costs driven by increasing population diversity and decreased willingness to cooperate with the Census. To offset the cost of address canvassing, the Bureau is considering targeted canvassing (or even no canvassing) along with a Geographic Support System initiative to enhance the continuous updating of the address list. In response to the failure to link acquisitions, schedule, and budget; 2020 plans call for a rolling approach, and strong management strategies to reduce program risk. The demand for absolute accuracy is powerful driver of census costs, and in response, the Census Bureau hopes to build a consensus among stakeholders regarding tradeoffs between data accuracy and costs.

Vitrano described a variety of 2020 design options defined along three major dimensions. First, address list options range from full address canvassing to targeted canvassing or even no canvassing. Second, enumeration options range from traditional plus Internet response to administrative data only, and third, IT/operational options range from centralized (as in 2010) to decentralized or a hybrid of the two. A chart summarizes the design alternatives defined by the mix of these options. For example, a possible design could call for targeted canvassing, multi-mode self-enumeration with some use of administrative records, and fewer local census offices.

Among the many research questions facing the 2020 planning effort are those related to the mix of modes and strategies, and the identification of frames linked to physical addresses. Also to be determined are the technologies that will be feasible for self-enumeration and field work in 2020, and how to improve the LUCA program for compatibility with continuous address list updating and targeted canvassing. Policy and communications challenges include the need to talk about the cost versus quality tradeoff, meeting stakeholder expectations, public concerns about confidentiality, public confidence in the census, and possible legislative changes.

Vitrano concluded noting that the purpose of the research and testing phase is to determine how much change is possible for 2020, and stressing that investments must be made early in the cycle to reduce costs and risk. He also noted that formal work on the 2020 census begins with the FY 2012 budget.

An extended discussion followed. In response to a question, Vitrano explained that they are considering the use of private sector address information, but he noted that care must be taken with Title 13 protections. And in response to concerns that the plan makes no reference to lessons learned from 2010 evaluations, Vitrano noted that the 2010 evaluations are still underway, and assured that they will provide valuable guidance. These plans in advance of the 2010 evaluations reflect the importance of an early start. Concern also was expressed with the impact on the ACS if it is to be used as a test bed for the census. Vitrano assured that they will not let the census impair the ACS, and that it is hoped that they can identify research that benefits both the ACS and the census.

The Challenge of the Ultimate Identifier: Confidentiality and Access to DNA Test Results.
Jennifer Madans, National Center for Health Statistics

The basic question, as Madans described it, is whether one can maintain confidentiality and have useful DNA data at the same time. The question arises because NCHS collects genetic information in its health surveys, and is required by statute to protect confidentiality, while at the same time making data available to users.

The National Health and Nutrition Examination Survey (NHANES) gathers data from respondents who come to mobile examination centers for standard tests including the collection fluid specimens from which DNA can be determined. Genetic testing is not part of the NHANES protocol, and the survey is not designed to provide data on genetic risk factors. But they started banking samples, the thinking being that the specimens might be valuable for later research.

The Public Health Service Act precludes NCHS from releasing data and identities unless there is explicit consent from the subject, so permission is acquired from all subjects, and all data are strictly confidential. Initially, subjects were asked to consent to DNA collection for unspecified future lab studies (not necessarily genetic research), but now, consent to genetic studies is specified. Subjects can opt out completely, consent to all studies, or consent only to non-genetic studies.

DNA data pose a unique problem with respect to confidentiality because, as Madans described it, DNA is the “ultimate identifier.” One could match DNA from NCHS surveys to DNA in other databases, and thereby reveal a great deal of information on individuals. Madans noted further that “DNA is forever,” identifying the individual for all time, and the consequences of inadvertent disclosure can affect family members as well as the individual identified. And the field of genetics is changing so rapidly that confidentiality protections that seem safe today, might not be in the near future.

Amid the concerns for confidentiality, NCHS remains committed to making data available to users. An initial approach was to make DNA data available only in anonymized formats, but this severely limited usefulness. An alternative approach is to make the data available in a controlled environment of a Research Data Center. Researchers propose studies which are reviewed by an Institutional Review Board, and upon approval are given access to the DNA specimens. Requests can also involve secondary analyses on DNA results from previous studies. Research results also are reviewed to guard against disclosure.

Asked if the NCHS confidentiality requirement has a time limit (like the 72 year limit on census responses), Madans said there is no time limit on NCHS confidentiality. Asked if there are confidentiality protections on private databases with genetic information Madans said private sources tend not to guarantee confidentiality. The discussion even touched on the scientific and ethical issues related to not making genetic information available. For example, in the absence of data and research findings, people might not get a test that could indicate a high risk of a disease for which preventive measures could be taken.

Integrating Federal Statistics With DataFerrett
Rebecca Blash, U.S. Census Bureau
William Hazard, U.S. Census Bureau

Blash listed the three major components of their work as DataWeb, DataFerrett, and HotReports.

DataWeb is a data library that draws multiple sources into a virtual data warehouse that integrates data in real time. DataFerrett is a system for “power users” who want to play with the data, and cannot get what they need from pre-defined tables. Work is underway to enable users to calculate variances on the fly for microdata tabulations, and to execute regression and cluster analyses in DataFerrett. DataFerrett also enables data visualization with mapping and graphing.

DataFerrett supports many types of data (micro, macro, time series, longitudinal), and provides access to a wealth of metadata that enable users to better understand the data they are working with. Most of the surveys available through DataFerrett are Census Bureau surveys, but there are others, such as some from NCHS. Blash noted that the survey providers must approve the inclusion of a survey on DataFerrett.

Asked about Data Ferrett’s major objective, Blash said it is to provide access to data that reside elsewhere, and that rather than bringing a mass of data to one location, the idea is to create a virtual data warehouse to facilitate access to a range of data on various topics from various surveys. And in response to a question about data inconsistencies (such as trying to crosstabulate data with different universes), Blash explained that DataFerrett does not alert users to such inconsistencies. Users need to be alert, and that is an example of why metadata are so important to the system.

Hazard presented next, showing examples of HotReports produced in DataFerrett. Users submit requests for tabulations (specifying variables and geography), and HotReports presents the table – drawing from the basic data rather than pre-defined tabulations. He stressed, however, that the reports draw from public use data – data one could get from CPS or PUMS files – as opposed to data that are confidentiality protected. Hazard also noted that data from different sources are presented side by side – that one cannot combine or crosstabulate data from different sources. HotReports also has an interactive mapping function that allows users to work with maps and their underlying data tables.

Hazard then logged on to DataFerrett, and showed some of its functionality using 1-year ACS data. The first step is to select a dataset, and then select the desired variables – screening for specific thresholds or values, and defining categories (such as 10 year age ranges). One then defines the geography needed, hits “go get data,” and DataFerrett presents the table. Asked about weights, Hazard explained that default weights are provided for most files, but for some, users need to assign weights manually. He also confirmed that the tables and/or data from a session can be saved for later use.

Recent Improvements and Current Challenges in the BLS Consumer Price Index (CPI) Program
Michael Horrigan, Bureau of Labor Statistics

Horrigan presented on a wide range of Consumer Price Index (CPI) and Consumer Expenditure Survey (CE) topics. He started by describing the CPI modernization effort, where the goal is to move the remaining components of periodic CPI revision (housing and geographic sampling) to a continuous process. A housing goal is to increase the sample of renters, and a geographic goal is to rotate new areas into the CPI on a continuing basis. Horrigan also described the need to move CPI production from the current outdated computer environment to one that provides greater flexibility and support for CPI research.

BLS also looks to redesign the Consumer Expenditure Survey (CE). The CE was designed in the 1980s, and while data collection has been upgraded with CAPI and improved diary forms, the admittedly burdensome survey is plagued with declining response rates, and there is concern that it has not kept pace with changes in the ways people spend money. Advisory input is sought from many sources, including a survey redesign panel (including people who have redesigned surveys), a data capture forum, a data users’ forum, contracts to outside survey experts, and a CNSTAT consensus panel. The objective is to consider outside the box technology, cognitive methods (to reduce burden), and administrative data options to achieve a CE that reflects the “interview of the future.”

Horrigan also described concern with the Telephone Point of Purchase Survey (TPOPS) that gathers information on where consumers purchase goods, and is used to select the establishments where prices are monitored for the CPI. TPOPS also suffers from declining response rates, as households are better at screening calls, and may be prone to biased results due to the exclusion of “cell phone only” households. The short run response is to add cell phone households starting summer 2011, and in the long run, BLS will examine the use of administrative and corporate data as alternatives. Alternative survey approaches also will be examined as a backup if administrative data do not work out, or are discontinued.

Moving to the CPI, Horrigan noted that Social Security cost of living adjustments are based on change in the CPI-W (the CPI for urban wage earners and clerical workers), which has yet to rebound from its 2008 peak associated with the spike in gas prices. As an alternative, one could consider the broader CPI-U (for all urban consumers), but Horrigan suggested the CPI-E – an experimental CPI re-weighted to reflect the purchasing behavior of the elderly – might be more relevant to the Social Security recipient population. For example, the CPI-E weights reflect seniors’ greater expenditures on medical care and relatively low expenditures on motor fuel.

Price indexes specific to demographic groups are of increasing interest, and would be attractive for targeted applications (such as Social Security adjustments). But a tradeoff is the higher variances associated with the smaller sample sizes. Price indexes for specific populations also would require information on where these populations make purchases (this is where TPOPS would contribute), what they purchase (TPOPS and the CE), and the ability to select items for pricing at each outlet – what Horrigan called disaggregation. Disaggregation is seen as a key step in determining exactly how different CPI weights need to be for specific population groups. Possible sources contributing to subgroup specific indexes are information from store managers, redesigned TPOPS and CE surveys, and corporate and scanner data.

As Horrigan described it, the availability of new data suggests the possibility of new price indexes, and the expectation is that there would be interest in indexes not just for the elderly, but for other groups. The CPI is clearly a topic with many technical and policy dimensions that one could explore in great depth, but time had run out, and this COPAFS meeting was adjourned.