COPAFS
 

MINUTES OF THE DECEMBER 4, 2009 COPAFS MEETING

Ed Spar. Executive Director’s Report

COPAFS Executive Director Ed Spar explained that a water leak had closed the BLS Conference Center and forced the meeting into an alternate room. He thanked his contacts at BLS for accommodating the COPAFS meeting.

Spar noted that the government is still running on a continuing resolution, and no one knows when the budgets will be finalized. Expectations are that the Census Bureau will continue to receive anomalies to flat funding, but budgets for other agencies are still up in the air.

BLS is in the process of looking at a possible redesign of the Consumer Expenditure Survey, and COPAFS has been asked to put on a two-day workshop on the topic. The last redesign dates back to about the early 1980s, and the process would be lengthy, so BLS is not rushing into this. Spar will keep COPAFS members posted on progress.

Spar then described an issue under consideration with respect to the Demographic Analysis (DA) measure of census coverage. Initial plans were to report the components of census coverage (persons missed and duplications) only for the national and regional levels, but at a recent meeting of the 2010 Census Advisory Committee, Spar and others made the case for reporting the component measures for states. The DA project is on target to report results by the end of next year, and COPAFS is putting on a workshop on the topic.

Turning to the American Community Survey (ACS), the first 5-year data are due in 2010. However, these data will be weighted to estimates based on the 2000 census, while the next set of 5-year estimates (released in 2011) will be controlled to estimates based on the 2010 census. There is concern with the inconsistencies the first set will have with the 2010 census, and the discontinuities with the second set (exacerbated by the transition to 2010 geography), so discussions are in progress at Census on how to best handle these releases.

Spar reported that the redesign of the COPAFS website is moving along, and the meeting dates for 2010 are March 5, June 4, September 24, and December 3.

Ataman Ozyildirim announced reviewed the upcoming CIRET Conference to be held in New York on October 16, 2010. More information can be found on the COPAFS web site at www.copafs.org

Data.gov Initiative
Margo Schwab. Office of Statistical and Science Policy. OMB
Paul Bugg. Office of Statistical and Science Policy. OMB.

Paul Bugg described Data.gov (launched in March 2009) as a flagship initiative reflecting the administration’s commitment to transparency and open government. The basic ideas are that the free flow of information between government and the public is essential to a democratic society, and that the public’s ability discover and understand information is of broad social benefit.

An early question from the audience, concerned how data.gov relates to Fedstats, and Bugg explained that Data.gov has a broader scope – going beyond federal data, and enabling users to search across agencies by topic. One could, for example, search on “household income,” to identify income data provided various agencies. Bugg also explained that the current platform, or website, is just a first step – a base from which the Data.gov system will evolve.

A key objective of Data.gov is to assist in finding and using government data. There are currently over 24,000 “dot-gov” websites, and one often needs to understand the government’s organizational structure to find datasets of interest. And data are not always downloadable from legacy systems with outdated technology. Data.gov is designed to transcend stovepipes, and encourage innovative applications by enabling access to data in formats that can be analyzed.

In response to a question on who is responsible for Data.gov, Bugg stressed that it is not OMB – although he attributed its origin to OMB’s Chief Information Officer, who joined recently from the DC government. Data.gov can be thought of as a set of links providing easier access to data that are already available. Responsibility for confidentiality rests with the agencies that collect and report the data, and the originating agencies need to be aware of any implications of wider access through Data.gov.

Bugg got a reaction when he shared Data.gov’s curious definition of “raw” data. Data.gov defines raw data as machine readable structured datasets that can be used for multiple purposes, and “mashed-up” with other data (combined on the fly by a wide range of users). Some attendees expressed bewilderment at the definition, and Bugg said they (his part of OMB) have resisted, but so far to no avail. There will be an opportunity for the public to comment on such issues, but one would have to learn of the opportunity either through the Federal Register or a notice on Data.gov itself.

Data.gov prefers data in formats such as XML, CSF/TXT, and RSS, and prefers not to post data in PDF or HTML tables containing data. But as a matter of policy, agencies retain control of their data, provide metadata, and again, need to be aware of the implications of broad access through Data.gov.

Bugg described a senior advisory group that provides OMB with a forum for working with those responsible for data generation and dissemination, and which provides advice on strategic and other issues. The advisory group consists primarily of federal entities (such as the Interagency Council on Statistical Policy) , and does not include any data user groups.

National Survey of Residential Care Facilities
Lauren Harris-Kojetin. National Center for Health Statistics
National Survey of Residential Care Facilities

Lauren Harris-Kojetin, from the National Center for Health Statistics, explained that the National Survey of Residential Care Facilities (NSRCF) has two major government partners – the Department of Health and Human Services (NCHS and other HHS agencies) and the Department of Veterans Affairs. Its goals are to provide general purpose national level data to support decision making, and to fill a significant gap in the collection of data on providers of long-term care. The survey will estimate the size of the residential care industry, as well as the characteristics of both the facilities and the residents who live in them.

The NSRCF defines residential care facilities as places that are licensed, registered, or otherwise regulated by the state; have at least four beds; and, provide room and board (at least two meals per day), help with personal care or health care-related services, such as medication management, and around the clock on site supervision for a primarily adult population. Facilities licensed to serve only the mentally ill, or mentally retarded/developmentally disabled populations are excluded, as are nursing homes, which are different in that they provide skilled nursing services 24/7. Harris-Kojetin noted that there is no single definition that applies across all states, and that states license such facilities under terms such as assisted living, board and care, congregate care, family care and personal care.

The survey is needed because the aging population has increased the need for long-term care services, little is known about these facilities, and current surveys are designed for other purposes. Harris-Kojetin noted that the number of residentials served by these facilities may rival the number of nursing home residents.

The survey consists of a screener, a facility questionnaire, a resident sampling questionniare, and a resident questionnaire. The sample uses a two-stage national probability design, with facilities sampled first, then residents within the facility. The goal is to sample 2,250 facilities and 8,450 residents. The survey starts with telephone eligibility screening, then CAPI interviews conducted in person. The facility questionnaire is conducted with the facility administrator, and the resident questionnaire is conducted with staff members knowledgeable about selected residents (such as a nurse aide or floor supervisor). The residents themselves are not interviewed.

Harris-Kojetin described a pretest, and the response rates it experienced (without attempts to reverse refusal). Reasons for refusal included inability to contact the facility director, the director/staff’s lack of time, lack of interest, or corporate refusal (some facilities are part of a chain). Harris-Kojetin described outreach activities including meetings with the Center for Excellence in Assisted Living (CEAL), an association of associations related to the assisted living industry. CEAL board members have provided insights into contact materials, and are promoting participation in the survey – for example, through a joint letter sent in advance, communicating industry support for the survey.

Harris-Kojetin described a number of other measures taken to gain cooperation. She also noted that the NSRCF averages about two and a half hours to three and a half hours to administer depending on facility size.. Once facilities agree to participate in the survey, few drop out. Getting initial agreement is the challenge. The low dropout rate may be due in part to the pre-interview worksheet that is sent to help facilities prepare for the interviews. The worksheet ensures there are no major surprises in the interview process, and reduces the sense of respondent burden.

The upcoming NSRCF schedule is as follows:

January 29, 2010 OMB approval sought by this date
March-April, 2010 Training
March-October, 2010 Facility recruitment
April-October, 2010 Data Collection
Early 2010 Public use files, methods report, and release of first findings product
 

Who Creates Jobs? Small vs. Large vs. Young?
Ron Jarmin. U.S. Census Bureau.

Ron Jarmin noted the persistence of the debate over the extent to which small businesses are responsible for job creation, and reported research on the subject (completed in collaboration with John Haltiwanger of the University of Maryland and Javier Meranda of the Census Bureau). Their work is funded in part by the Kauffman Foundation.

There are two camps in the debate – those who contend that most new jobs are created by small businesses, and those who argue that this is not true. Jarmin suggested there is some truth in both positions, and he presented data from a longitudinal (1992-2005) database of private sector non-farm business establishments with firm identifiers. The data noted not just the size of businesses, but their age. Size was constructed by aggregating establishment employment numbers to firm totals, and firm age was defined as the age of the oldest establishment at the time of firm birth. Spin offs from existing businesses were not treated as new, or start up, businesses.

If one looks only at firm size, most new jobs are in small businesses, but Jarmin’s work stresses the contribution of firm births, or “startups” to job growth, and the distinction between gross and net job creation. As he described it, there is a huge churn in firms and jobs, and an “up or out” dynamic for young businesses. Many young businesses fail, but those that survive contribute to dynamic growth.

A key to Jarmin’s analysis is the relationship between firm size and age. New or startup firms tend to be small, while larger firms tend to be older. When one controls for the age of firms, there is actually a positive relationship between firm size and job growth.

Looking at data for 2005, Jarmin noted that among the largest firms, the biggest job gains were from the oldest firms, while among the smallest firms, most growth was among the youngest. This makes sense since startups (businesses in their first year) cannot lose jobs (relative to previous year). And because there are so many startups in the course of a year, small firms account for a large number of new jobs. Among small firms, job growth was greatest during the start up year, with the number of jobs added dropping sharply in years two and beyond. Again, there is an “up or out” dynamic, with the failure of many new businesses resulting in the “destruction” of many of the new jobs that they contributed. Again, one can contrast the gross versus net creation of new jobs.

Jarmin then described the challenge of “picking winners,” or establishing policies to promote job growth. While startups and surviving young firms contribute disproportionately to job growth, idiosyncratic factors seem to dominate in the determination of which ones survive – factors that are not observable or predictable for policy purposes.

Jarmin concluded with the suggestion that we need a more nuanced view of small businesses and their contribution to job creation. It is not just the size of businesses that matters, but their age that matters. Another issue is the quality of jobs, and Jarmin noted that we need to look beyond the simple counting of new jobs to the kinds of jobs that are being produced by younger firms, the kinds of workers in these jobs, and the long term labor market outcomes.

Local Employment Dynamics: Synthetic Data for OnTheMap Version 4
Jeremy Wu. U.S. Census Bureau.

Jeremy Wu described OnTheMap as an online dynamic mapping and reporting tool for the Census Bureau’s Local Employment Dynamics (LED) data. He also presented on the integrated, synthetic data underlying the product.

The first OnTheMap release was in 2006, covering 14 states with data covering 2002-2003. The product has grown through successive releases, with the next scheduled for December 14, 2009, covering 47 states and data for 2002-2008. A December 2010 release will cover 47+ states with data for 2002-2009.

OnTheMap allows one to select on where workers live, or where they work, and report things like age, earnings, and cross-state flows. The base unit is the census block, and the product features innovative disclosure protection. Wu showed a screenshot of a Las Vegas area map shaded by where construction/manufacturing workers are employed, and side by side maps, with one showing workers in blocks near “The Strip,” and the other expanding to show where those workers live. Data tables are presented with the maps, and while the examples illustrated census blocks, one could show data by other geographies, such as ZIP Codes or Traffic Analysis Zones.

Turning to the data, Wu noted that censuses date back to ancient Rome and China, but sampling was first discussed in 1895. Even then, the idea was not well received, even among statisticians, who clung to the notion that there was no substitute for a complete count. The denial and debate went on for decades, and it was not until 1937 that the Census Bureau developed sampling techniques to measure unemployment during the Depression. Sampling was then introduced to the decennial censuses – starting with 1940 – and other surveys, such as the Current Population Survey, and now the American Community Survey.

With the introduction of sampling, it became clear that a 5 percent random sample is better than a 5 percent non-random sample, and the field of mathematical statistics was born. But Wu noted that the field of sample surveys has not lived happily ever after, as computers and administrative records databases have released a flood of data, and surveys are increasingly hampered by declining response rates, increasing labor costs, and confidentiality concerns. As recently as the 1990s, there was concern that we could either have access to microdata or confidentiality protection, but not both.

However, Wu described an LED approach that provides both with a design that involves record linkages, noise infusion, imputation, synthetic data modeling, and measures of goodness and quality. A slide diagramming how the synthetic data are prepared conveys its complexity. And with the workplace/residence data comprising an origin/destination matrix for 8 million census blocks (8 million times 8 million), the underlying database is huge.

Wu described OnTheMap and its data innovation as the latest development in the evolution dating back to the first discussions of sampling in 1895. It took decades for sampling to be accepted, and he expressed the hope that it will not take so long for this innovation to become accepted and widely used. Might this be the dawning of another new field of statistical knowledge?

Asked if these data have yet been accepted for academic research, Wu noted that academic researchers have been involved in their development, but the data are not yet widely used. There was agreement that more formal measures of goodness and quality are needed for the data to become more widely accepted in academic research.

Concerns From COPAFS Constituencies

No concerns were raised, and the meeting was adjourned.