Acting COPAFS Chair Ken Hodges started the meeting by introducing Ed Spar for his Executive Director’s report. 

Executive Director’s Report. Ed Spar 

Spar described a six-page OMB memo clamping down on the conference activities of federal agencies in the aftermath of the GSA conference scandal. With agencies now encouraged to organize conferences in-house, COPAFS conference activity is already being impacted. 

The budget numbers are looking pretty bad for some agencies, with many questions related to the economic census and the ACS. Spar described the $20 million cut from the economic census as basically a misunderstanding in Congress that is likely to be reversed. There are two bills on the ACS. One would eliminate it, and the other would make response to it voluntary. One line of thinking is that ACS funding will be restored, but that response will be made voluntary – thus leading to questions concerning the consequences of a voluntary ACS. Spar said there is more we don’t know than what we do know about the budgets at this point. In the past, Congress was intent on completing budgets before campaigning for re-election, but now closure on the budgets is not expected until after the elections. 

Spar also described the recently released 2010 census coverage measurement numbers, noting that the undercount and over-count offset almost exactly, so net undercount is about zero. He credited the Census Bureau for reporting the gross errors (undercount and over-count) for geographic detail, including large counties. Previous census evaluations had not reported such geographic detail.        

The next Federal Committee on Statistical Methodology Policy Seminar is scheduled for December 4-5, and the remaining COPAFS meetings are September 21 and December 7. 

A Review of Plans for the Bureau of Transportation Statistics 

Patricia Hu. Bureau of Transportation Statistics 

Hu is the new director of the Bureau of Transportation Statistics (BTS), which was established in 1992 under the Intermodal Surface Transportation Efficiency Act to administer transportation data collection, analysis and reporting, and ensure the cost-efficient use of resources in monitoring transportation’s contributions to the economy. Once a stand alone agency within the Department of Transportation, BTS is now part of DOT’s Research and Innovative Technology Administration (RITA), and is advised by a 10-member advisory council appointed by the Secretary of Transportation.   

BTS responsibilities include the provision of statistics and analysis to inform fiscally responsible investments. Objectives include reviewing the validity of data and methods, coordinating data collection, modernizing data programs, facilitating data standardization, making data accessible, and maintaining a National Transportation Library. The objectives reflect the view that transportation is a critical component of economic growth.      

Hu described some specific BTS programs. For example, a confidential close-call reporting system is a preventive effort to improve rail safety by allowing rail operators to report close calls confidentially and without disciplinary consequences. Reflecting the growing importance of freight, the Commodity Flow Survey (CFS) is a primary source of data on domestic freight shipments by commodity types, origins and destinations. Data collection from 100,000 shippers for the 2012 survey began in December 2011, and will last 12 months. A Trans Border Freight Data program supplements the CFS with data reflecting US trade with Canada and Mexico shipped by surface modes. Other programs report data on international freight and air freight.         

Hu also described programs on passenger travel, such as the collection of monthly data on airline enplanements, on-time performance, and ground delays. There is also a 2010 National Census of Ferry Operators, and a GIS-based Intermodal Passenger Connectivity Database that identifies, for example, if passengers have options other than car for getting from airports to final destinations. It is this program that has identified that Americans in rural counties are losing access to inter-city transportation options. 

BTS is engaged in efforts to modernize data programs, such as a web response option for CFS respondents, and the streamlined tracking of airline information. BTS also promotes data access, and is getting into the development of mobile apps, web engineering, and data visualization. Other initiatives include the re-introduction of the Journal of Transportation Statistics, the re-energizing of the American Statistical Association’s Transportation Statistics Interest Group, and the coordination of transportation statistics and definitions across North America through collaboration with partners in Canada and Mexico.           

Future plans call for a re-design of the omnibus household survey, enhancing the Transportation Safety Institute methodology to include the entirety of passenger travel, and an update to the 2004 study of transportation investment. 

Asked about the National Household Travel Survey (NHTS), Hu explained that it is spearheaded by the Federal Highway Administration, and not BTS. As Hu described it, BTS is not the only statistical agency within DOT. She mentioned that the NHTS faces budget challenges that might reduce its frequency, but that the next survey is planned for 2014. 

Measuring Sexual Identity in NCHS Surveys

Jennifer Madans. National Center for Health Statistics 

Madans stressed that while significant work has been in this area, it is not done yet, and she was presenting work in progress. She started by describing the need to better understand the health of sexual minority groups, as there is evidence of health disparities, and a need for data to help address them. 

Collecting data on sexual identity is not as straightforward as it might seem. One challenge is the complexity of concepts such as sexual orientation, sexual attraction, sexual behavior, and sexual identity. Madans described sexual orientation as a generic, catch-all term. Sexual attraction relates to same versus opposite sex desire, but is elusive and intangible. Even the concept of sexual behavior is complicated, as what counts as “sex” varies across sub-groups, and behavior is not necessarily consistent with presentation of self. Sexual identity relates to the way people view themselves, and thus has parallels with self-identification with race. Further complicating matters is the fluidity of sexual identity, as it can change over time, and with context (who is asking). There also are issues with the varied use and comprehension of terms in the media and across subgroups.         

As Madans described it, different groups relate to terms differently. For example, sexual non-minorities tend to lack strong sexual identity, and talk more about what they are not rather than what they are. For example, they might report that “I’m not gay.” They might also not know what terms like “heterosexual” and “bi-sexual” mean. In contrast, sexual identity tends to be highly salient to sexual minorities – such as lesbian, gay, bi-sexual and transgender persons. Madans noted that sexual identity is especially complex for transgender persons, and that the work at NCHS does not deal with that identity.      

Madans then reviewed some of the ways sexual identity has been asked in the National Health and Nutrition Examination Survey (NHANES) and the National Survey of Family Growth (NSFG). In the NSFG, about six percent of respondents do not answer, which gives “missing” a higher frequency than some of the target groups. The data are said to be useful, but subject to considerable noise. Missing responses are less of a problem in the NHANES, but they are not random. For example, missing responses have been more common among Hispanics. 

The 2006-2008 NSFG includes improvements to wording, and allows people to write in what they mean by “something else.” The number of “missing” is sharply reduced overall, but remains high in some groups. And the write-in responses are an interesting assortment of existing response options, expressions of “none of your business,” some related to transgender, and others that Madans described as really complicated.             

The plan is to add questions on sexual identity to the National Health Interview Survey (NHIS) – a larger and interviewer-conducted survey. Goals for the new questions are to reduce misclassification (especially for non-minorities), reduce “something else” and “don’t know” responses, and to sort non-minority from minority cases. The revised question uses labels that respondents use in referring to themselves, and asks follow-up questions to get meaningful responses to “something else” and “don’t know.” There is some concern that the NHIS is often administered on the doorstep, and might impact willingness to respond. In contrast, the NSFG respondents are paid, and the interview is often conducted inside the dwelling. Testing of the new NHIS questions will continue through 2012, and full implementation is targeted at January 2013. 

Asked if there has been criticism of sexual identity questions as an invasion of privacy, Madans acknowledged it as a potential concern, but noted that refusals have been less of a problem than “missing” – which could be refusal in disguise. She also noted that NCHS asks questions that are much more probing than sexual identity. As Madans put it, “I read the NSFG questionnaire, and I blush.”       

How Good are the Annual Social and Economic Supplement Earnings Data? A Comparison to SSA Detailed Earnings Records

Joan Turek. Health and Human Services Administration

Fritz Scheuren. National Opinion Research Center 

Scheuren described research that matches records from the Current Population Survey’s 2006 Annual Social and Economic Supplement (ASEC) with Social Security’s 2005 Detailed Earnings Records (DER). The joint project of Census Bureau and Department of Health and Human Services compares the two sources to gauge their consistency on income earned in calendar year 2005.     

Scheuren described the handling of income imputation in cases of CPS nonresponse – noting the difference between item imputes, where a specific question is unanswered, and whole imputes, where the entire supplement is imputed. Imputation rates for income have been increasing since 1988, when the CPS went from 12 to 55 income questions, and are now up to 35 percent. Item imputes have more than doubled in that period, but whole imputes have remained stable at about 10.5 percent.   

Imputation is a potential factor in the comparison, as poverty rates differ by imputation status. Persons with no imputes have the highest poverty rate, those with whole imputes are next, and those with item imputes have the lowest poverty rate. Because imputation has a strong effect on poverty rates, it is important to know how accurate the ASEC data are. 

In comparing ASEC and DER data, Scheuren noted that DER is not a gold standard, as it misses some persons and sources of earnings. DER and ASEC can differ in a number of other ways, such as health savings accounts and payments for dependent care. In other words the ASEC - DER comparison is not pure apples to apples.          

Results were tabulated for persons age 15 and above with earnings. Overall, 52 percent had ASEC and DER incomes within $10,000 of each other, and 79 percent were within $20,000. For records without imputation, 61 percent were within $10,000 and 88 percent were within $20,000. Records with whole imputes were least consistent, with only 24 percent within $10,000 and 72 percent within $20,000. Looking at wages, as opposed to earnings, those within $10,000 rose from 52 percent to 56 percent. For those with no imputes, the correspondence rose to 71 percent. 

Another view is provided by the ratio of ASEC/DER, and again, the correspondence is strong. The correspondence also is strong when looking at the poverty population. When DER is substituted for ASEC, the majority of persons do not change poverty status, and that result holds across demographic groups. Scheuren and Turek were both surprised by the extent of consistency between ASEC and DER – especially across demographic groups. This consistency suggests that – apart from questions about poverty definitions – the poverty measure is working well in that the data would not be much different if measured with DER. When asked how many persons in poverty have positive income, Scheuren noted that it is about 82 percent. 

Still to be studied are persons with no ASEC-DER match, and those with highly dissimilar incomes. Scheuren and Turek also hope to investigate the extent to which the results hold for the supplemental poverty measures.  

Managing and Analyzing Longitudinal Data 

Patricia Ruggles. Orlin Research, Inc.   

Ruggles described Olin Research, the company she runs with her sister Catherine and brother Steven. The company provides products to enhance the use of complex data in social science research. For example, longitudinal data have complex record linkages and structures that make them difficult to use. Analyses involve complex relationships across records, variables, and time. Analysts often shy away from using large longitudinal data sets such as the Survey of Income and Program Participation (SIPP), or they use such databases only for cross-sectional analyses.         

Ruggles described three basic steps in the use of longitudinal analysis: 1) understanding the data, 2) preparing data for analysis, and 3) performing analyses. Using 2008 SIPP as an example, Ruggles expanded on these steps, and described how her company’s products can help with each step.   

Step 1: Understanding the Data

Many longitudinal datasets are large and not well documented. For example, the 2008 SIPP has 48 months of data with more than 1,000 variables on about 120,000 persons. Documentation exists in many places, and is “a bit patchy,” so it can be hard to determine all the complex linkages in the data. Ruggles showed Orlin’s welcome page with hyper-linking features that help users figure out what is in SIPP. The page has links to a metadata tab that takes one to a set of person/month records, and a search box in which one can narrow a variable search by typing a topic such as “employment.” One can click on a variable name to select it, or to get information on the variable – including its universe, summary statistics, and links to related variables.         

Step 2: Preparing Data for Analysis

Even after one finds the right variables, longitudinal data require substantial manipulation and recoding before analysis can begin. For example, one has to create data extracts that preserve necessary information on relationships between units of analysis, and going longitudinal means moving information across both record types and points in time. Further complicating matters are sample attrition, weighting problems, and inconsistencies in response across waves of the survey. Creating the necessary links can be difficult in packages such as SAS or SPSS – or highly inefficient, as one would have to track, for example, each person’s income for each month of the survey.   

To help users with this process, Orlin uses database technology to keep track of variables and their linkages across record types and time. It also allows for intelligent data transformations because records are linked internally in a database, and the system understands those links. Ruggles described an example involving employment status, and showed the options provided by the templates on the Orlin site. 

Step 3: Performing Analyses  

Pressing the “Analyze” button on the Orlin home page presents templates for analyses such as crosstabs and regressions. Analyses use the R statistical system, but results can also be exported for analysis in packages such as SAS and SPSS. The system is designed specifically for longitudinal analyses, and facilitates time-related analyses involving transitions (changes in state, such as from unemployed to employed), and spells (periods of time between two transitions, such as duration of unemployment). Ruggles described how the Orlin system handles such analyses (again with single page templates), and showed examples of output. 

Asked about the need to run data on small units of analysis but also account for data at higher levels, Ruggles clarified that the lowest level is where one makes the longitudinal link, but that all other hierarchical links are preserved – for example, the linkages between persons and households and months. In other words, the different levels of analysis can be worked with simultaneously, and the ability to do this with easy to use templates is one of the system’s key contributions.   

Further information on the Orlin system can be found at

Concerns From COPAFS Constituencies

No concerns were raised, and the meeting was adjourned.