MINUTES September 21, 2012 COPAFS QUARTERLY MEETING
COPAFS Chair Felice Levine opened the meeting by expressing appreciation for the efforts of the Search Committee which led the effort that led to the appointment of Kitty Smith as COPAFS Executive Director effective October 1. Smith made brief remarks expressing enthusiasm for the position, and her interest in listening to COPAFS members. Levine announced the Board positions open for the coming election, and called on those interested to contact Judie Mopsik, chair of the Nominations Committee. An announcement about the Board positions will be posted on the COPAFS website.
Executive Director’s Report. Ed Spar
Spar started his final Executive Director’s report by noting some recent changes at the Census Bureau – including Tom Messenbourg becoming Acting Director, Nancy Potok Deputy Director, and Enrique Lamas Associate Director for Demographic Fields. The head position at Population Division is currently open.
As for budgets, Spar described the continuing resolution that will freeze spending at 2012 levels through March. Sequestration looms as a possibility, and Spar directed our attention to a preliminary OMB report indicating the cuts that the various agencies likely would face. As Spar put it, there are some “real disasters in the making.”
The next meeting is December 7. Noting that he would not be sitting at the head table for that meeting, Spar presented incoming Executive Director Kitty Smith with a stack of papers consisting of the agendas for all 78 quarterly meetings during his tenure at COPAFS.
EPA Statistical Initiatives
Barry Nussbaum. Environmental Protection Agency
Nussbaum explained that while EPA is an enforcement agency, rather than a statistical agency, they gather a lot of data – both through measurement and administration. Measurement data derive from environmental monitoring, and administrative data derive from programs involving things like permits and required submissions.
EPA never knows who is using their data, but they are often asked “are your data good?” Nussbaum argued that the answer depends on the use. For example, hazardous waste data with bad latitude/longitude coordinates are only bad if one needs the coordinates. To illustrate the elusive nature of good data, Nussbaum described the wide range of numbers one can find purporting to identify the total coastline (in miles) of the U.S.
Nussbaum then described EPA’s Toxic Release Inventory (TRI) – a mandatory reporting of toxics released and recycled. Data are reported annually by 20,000 reporting entities (with 10 or more employees) for 650 toxic chemicals. Reporting is mandatory, enforced with penalties, and there is no confidentiality. And the TRI data are widely used. For example, by mutual funds to assess “social responsibility,” by labor unions in contract negotiations, by IRS for tax on CFCs, and for internal EPA processing.
New EPA data activities include “Proving Tests Equivalent,” which Nussbaum described as a response to EPA’s need to show that new and less expensive tests provide equivalent measures to those they replace. The emphasis on demonstrating that measures are not different is the opposite of most statistical studies, which focus on differences. “Non-Detects” focus on the challenge of measuring emissions that are below the levels that tests can detect, but that still could be hazardous. Nussbaum also described EPA’s foray into social media. For example, they have set up a wiki called Statipedia, and have established a presence on Facebook. Through these initiatives, EPA is offering collaborative tools, but they are not yet sure how many are using them. Through their “GeoPlatform,” Nussbaum said they are “mapping everything,” including water quality, air quality and hazardous waste.
Nussbaum then described some jurisdiction considerations. Citing the BP oil spill as an example, he noted that it is EPA’s jurisdiction if inland water is involved, but the Coast Guard if it’s coastal water. He then described some of EPA’s data collection methods related to air quality monitoring and water samples and sediment samples, and briefly described some of the technical questions they get from the president (and others), and the challenges they face in responding quickly and accurately to these questions.
Prior to the next session, COPAFS Chair Felice Levine announced that on December 6, the evening before the December 7 quarterly meeting, a reception will be held to celebrate Ed Spar’s contributions as Executive Director of COPAFS. The reception will be 6:00 to 8:00 at the NAS Keck Center in Washington.
Cybersecurity and Confidentiality Protection
Paul Bugg. Office of Statistical and Science Policy, OMB
Bugg noted the importance of cybersecurity efforts to guard the statistical system from frequent malware and other attacks, but argued that there is a continuing need to preserve the pledges of confidentiality made to respondents who provide information to the system. The concern is that proposed cybersecurity legislation does not account for confidentiality concerns. As Bugg described it, none of the bills address statistical confidentiality, and in fact, give the Department of Homeland Security authority to go into federal data sources “notwithstanding any other provisions of law.” In other words, the legislation would give access to confidentiality protected data for use in unspecified ways. For example, current legislation protects confidentiality in mail back information, but would permit tracking down the respondent if the mail back included anthrax. But the proposed legislation would permit a wider use of data in investigating other crimes.
The lack of confidentiality protection is such that OMB sees the need to become involved, and they have held meetings with DHS and the statistical agencies. Bugg stressed that OMB does not dispute the need for cybersecurity protections, including the sharing of information. He also explained that the intent of cybersecurity legislation is not to violate confidentiality pledges. The problem is that as currently written, the proposed legislations do reduce protections, so amendments are needed to uphold them. The challenge is to protect against threats, and protect confidentiality at the same time, and Bugg asserted that “we can have both” if the legislation is done correctly.
OMB is working to increase awareness of the confidentiality issue, and to find ways for agencies to cooperate with DHS on cybersecurity without compromising protected data. However, he said it has been an uphill battle because OMB was not involved in the drafting of the proposed legislation. Bugg said the data user community also needs to be aware of this situation. OMB is getting the word out, and appreciates the opportunity to do this at COPAFS. Bugg reiterated that they are not trying to prevent the passage of legislation – just trying to make sure it maintains the confidentiality of statistical information.
In response to a suggestion that publicity about the confidentiality issue could have a negative impact on response rates, Bugg said they are trying to get the amendments added without a lot of public attention.
New Perspectives on the Quality of Administrative Data
William Iwig. National Agricultural Statistics Service
Iwig described the Federal Committee on Statistical Methodology (FCSM) Subcommittee on the Statistical Uses of Administrative Data that was created in 2007 to address questions about the quality of administrative data that are now so widely used for statistical purposes. As Iwig described it, the error properties of survey data are well known, but quality is a more elusive notion for administrative data where it is subject to non-statistical dimensions.
With so many administrative data sources being used (or considered for use) in statistical applications, the subcommittee has developed a Data Quality Assessment Tool to help users assess the “fitness for use” for their applications. Based on information provided by the tool, users might decide to use the administrative data as planned, alter their plans for use, accommodate quality weakness in the data, or not use the data at all.
The tool consists of a set of questions asked of administrative data providers related to dimensions of quality including relevance, accessibility, interpretability, coherence, accuracy (closeness to true values), and institutional environment. The questions cover three phases in the assessment process. First is the discovery phase, with 12 questions covering relevance, accessibility, and interpretability. Next is the initial acquisition phase, with 29 questions covering accessibility, interpretability, coherence, accuracy, and institutional environment. Finally, the repeated acquisition phase has 11 questions covering interpretability, coherence, accuracy, and institutional environment.
The tool provides potential users with information on the quality characteristics of administrative files. However, it does not address the quality of linked data, nor does it offer quality improvement recommendations or guidance on the fitness for use decision. The tool also is not designed to address the quality of commercial data.
Tests of the tool are currently underway on two sources of child care data, and additional tests are pending approval. Comments so far suggest the providing agencies consider the completion of the tool’s questions burdensome, and some may be concerned that their data are being judged. But Iwig was quick to point out that this is not the purpose of the tool, and that the data are assumed to be good for the administrative purpose for which they are collected.
Looking to next steps, Iwig said they will present on the tool at the FCSM policy conference in December (hosted by COPAFS), update the tool based on feedback, make it available to the public, and pursue opportunities to promote its use. In the discussion that followed, it was suggested that the FCSM subcommittee pay more than passing attention to quality standards for commercial data, and when asked if timeliness is not a dimension of quality, Iwig explained that the tool includes timeliness as part of relevance.
U.S. Census Bureau Geographic Support System Initiative
Tim Trainor. U.S. Census Bureau
Trainor started with a review of the Census Bureau’s major geographic support initiatives – TIGER for the 1990 census, the Master Address File for the 2000 census, and MAF/TIGER enhancements for the 2010 census. For the 2020 census, the Geographic Support System Initiative (GSS-I) is in progress. The initiative focuses on improvements to MAF and TIGER to support the objective of maintaining census quality while containing costs. A specific objective is to develop the 2020 census address list without the need for a costly full address canvass. Instead, it is hoped that canvassing can be efficiently targeted to areas with concentrations of hard-to-locate housing units.
Specific quality improvement initiatives include the acquisition of address data from more sources, crowd sourcing (user-reported errors in TIGER), and tools that enable users to update TIGER information. A community TIGER program would standardize the varied inputs to TIGER. Partnerships are a key to the GSS-I, and Trainor described working groups and an Address Summit held to educate partners about the GSS-I, and learn how they are collecting, using, and maintaining address data. The participants were glad the Census Bureau is already engaged on address list development for 2020, but their comments also revealed the contrast between local government focus on the public safety aspect of addresses with the Census Bureau’s interest in questionnaire delivery.
The Summit has led to five pilot projects related to objectives including address authority outreach, address standardization, coordination of address information (from varied government levels), data sharing, and hard-to-locate housing units. Hard to locate units would include those above garages, in basements, and in remote or hidden locations. The idea is not to include these units on the MAF, but to identify where they exist in numbers large enough to warrant canvassing. The pilot projects will guide future geographic partnership programs, and help identify the best methods for the continual update of the MAF/TIGER system.
Trainor also described iSIMPLE – an evaluation that identifies areas where TIGER road features are inconsistent with web-based imagery, and thus where work may be needed. The GSS Lab Data Viewer is an online interactive mapping tool to facilitate the visualization of data, and is another way to identify areas in need of attention.
Trainor then described efforts to evaluate the current quality of the MAF/TIGER database. Working with census tracts as the geographic unit, addresses are evaluated in terms of consistency, mailability, deliverability, locatability, and geocode accuracy. Features are evaluated in terms of spatial accuracy, feature naming, address ranges and feature classifications. The evaluation also notes tracts that have experienced rapid change in images, due to recent development or demolitions. The result of these and other evaluations is the identification of tracts where resources need to be allocated for MAF/TIGER updates.
Trainor stressed that the need for address canvassing is a continuum, and it is a challenge to set thresholds defining which areas would be targeted. He showed how some example tracts would be scored on that continuum based on factors including mail back response rates, lack of revisions to census counts, stability of USPS address counts and consistency between USPS address counts and 2010 census housing unit counts. The challenge is formidable, but the potential benefits are substantial, and Trainor emphasized that all of this is still very much a work in progress.
Concerns From COPAFS Constituencies
No concerns were raised, and the meeting was adjourned.