The Needs of Researchers in Regard to Population Estimates

Prepared for
The Conference on “U. S. Census Bureau Population Estimates: Meeting User Needs”
July 19, 2006
Alexandria, Virginia
Hosted by
The Council of Professional Associations on Federal Statistics
Sabre Systems.

David A. Swanson
Department of Sociology and Anthropology
University of Mississippi
University, Mississippi 38677-1848

*I appreciate comments made by Steve Murdock, Stan Smith, Paula Walashek, and Kathy Wallman on earlier versions of this paper. I thank the Helsinki School of Economics for funds that supported my research on the statistical programs of Finland in 2005 through a grant provided by the School’s Mikkeli Campus under a research program developed by its Dean, Dr. David Atkinson. Also, I thank the following people for kindly meeting with me and explaining how population data are assembled and used in Finland: local government officials Jyrki Myllyvirta and Hannu Vesa; and Statistics Finland staff members, Riitta Harala and Pekka Myrskylä.

The Needs of Researchers in Regard to Population Estimates

  • Introduction

    In writing this paper, I was tasked to explore the needs of researchers in regard to the population estimates program of the U. S. Census Bureau (the “Census Bureau”). I broaden this to simply “the needs of researchers in regard to population estimates.” I also note that my topic, by necessity, overlaps not only with the needs of federal, state and local users, but with research that could be done on the Census Bureau’s estimation program. Thus, this paper not only covers the needs of researchers in regard to population estimates in general, but also some of the research needs in the area of population estimation, as identified by the Census Bureau (2006) and others (Breidt, 2005; Swanson and Pol, 2003).

    Who are the researchers that need information in regard to population estimates? In answering this question, I use as a starting point the distinction between applied and basic demography offered by Swanson, Burch, and Tedrow (1996): (1) “applied demography” is primarily concerned with solving exogenously-defined problems by producing the information necessary to effect practical decision-making while minimizing the time and resources needed to produce this information; and (2) “basic demography” is primarily concerned with solving endogenously-defined problems by offering convincing explanations of demographic phenomena while viewing time and resources as barriers to surmount in order to maximize precision and explanatory power. By simply applying the distinction noted above to researchers that need information from the Census Bureau’s estimates program, we have the following: (1) applied researchers that need information in regard to population estimates are primarily concerned with solving exogenously-defined problems by producing the information necessary to effect practical decision-making while minimizing the time and resources needed to produce this information; and (2) basic researchers that need information in regard to population estimates are primarily concerned with solving endogenously-defined problems by offering convincing explanations of demographic phenomena while viewing time and resources as barriers to surmount in order to maximize precision and explanatory power.

    As a means of providing a context for this effort it is important to recall why estimates are done in the United States. As we know, the census is the most complete and reliable source of information on the number of people in the United States – as well as in Australia, Canada, England, and New Zealand, among other countries. In addition to actually conducting census counts, there are three other characteristics that link the United States with these other countries: (1) well-developed administrative records systems (e.g., vital events registration); (2) regular census counts; and (3) no population registration system, such as those found in the Nordic countries. As we know, a census is a time-consuming and costly endeavor. In the United States, a census of the population is done only once every ten years; in Australia, Canada, England and New Zealand, for example, it is once every five years. Because there is the potential for constant and sometimes quite rapid population change, especially at the sub-national level, census statistics for every tenth and even every fifth year are often inadequate for many purposes (Waldrop, 1995). To fill this gap, population estimates are used by government officials, market research analysts, public and private planners and others for determining national and sub-national fund allocations (Murdock and Ellis, 1991; Serow and Rives, 1995; Siegel, 2002), calculating denominators for vital rates and per capita time series, establishing survey controls, guiding administrative planning, developing marketing, and for descriptive and analytical studies (Long, 1993; Pol and Thomas, 2001: 93-95; Swanson and Pol, 2005). In the United States, the Census Bureau is not the only provider of population estimates (Bryan, 2004b: 524-526), but it is the ultimate source of estimates and the data needed to develop them.

    In order to meet the need for current population figures, many estimation methods have been developed, virtually all of which can be categorized into one or the other of two “traditions:” (1) demographic (Bryan, 2004b); and (2) statistical – that is, the methods used by those who do sample surveys (Kordos, 2000; Platek, Rao, Sarndal, and Singh, 1987; and Rao, 2003). Demographic methods are used to develop estimates of a total population as well as the ascribed characteristics – age, race, and sex - of a given population. Statistical methods are largely used to estimate the achieved characteristics of a population – educational attainment, employment status, income, and martial status, for example. As is the case in the national statistical agencies of other countries, the Census Bureau produces estimates using both of these traditions, demographic and statistical.

    In this paper, I focus my discussion on methods that fit within the demographic tradition and only briefly consider those in the statistical tradition. However, I identify links among selected methods in both traditions. This discussion provides a point of departure for my recommendations in regards to the needs of researchers.

    Before launching into the main body of my paper, I also want to note that my discussion primarily covers the definition of population used by the Census Bureau, which is based on place of “usual residence.” This also is known as the “de jure” population (Cook, 1996; Wilmoth, 2004). I also note that there is the concept of a “de facto” population (Cook, 1996; Wilmoth, 2004). Examples of de facto populations are many. They include vacationers (of interest, for example, to the casino industry in Las Vegas and the Hawaii Visitors Bureau), migratory workers (of interest, for example, to health care, school, and other social service providers), and the people who work in the central business district of a large city each day, but leave it largely vacant in the evenings (of interest to the San Francisco City Planning Office, for example). While estimates of de facto populations are of great interest, they are very difficult to make in the United States because of the lack of census type benchmarks (Cook, 1996, Smith, 1994). I identify this issue as being important, but it is beyond the scope of my mandate to cover research needs for de facto populations in depth. I only suggest here that the Census Bureau is the logical agency to develop systematic and comprehensive estimates of de facto populations in the United States.

    The remainder of this paper consists of four sections, endnotes, references, and two appendices (one of which has itself references). The following section provides an overview of basic concepts, data sources, and methods used to estimate populations in the U. S. The third section discusses the needs of researchers, both applied and basic, while the fourth section offers a suggestion for meeting the needs of these researchers. The fifth section discusses obstacles associated with this suggestion and how they might be overcome.

    Appendix A is a reproduction of the principles underlying the Census Bureau’s estimates and projections programs. Appendix B is a report that compares the Finnish and U.S. systems of developing population information. This report is included because the Finnish system, the most advanced register system in the world, is used as a framework for the suggestions found herein.

  • Basic Concepts, Data Sources, and Methods

    In this section, my intention is not to cover concepts, data sources, and methods related to population estimates in depth. Rather, it is to generally describe them while providing citations to more detailed descriptions and discussions.

    Basic Concepts. 1. Following Smith, Tayman, and Swanson (2001: 16), I make the following distinctions among the terms “estimate,” “projection,” and “forecast.”

    Estimate – A calculation of a current or past population, typically based on symptomatic indicators of population change.

    Projection-- The numerical outcome of a particular set of assumptions regarding future population trends.

    Forecast – The projection deemed most accurate for the purpose of predicting future population.

    In regard to an estimate, there also has been a tradition of distinguishing between “inter-censal” and” post-censal,” where the former refers to an estimate for a date between two censuses that takes the results of these censuses into account and the latter refers to an estimate for a date subsequent to the most recently available census (Bryan, 2004b: 523).1 These definitions and distinctions fall into the demographic tradition. Among survey statisticians, the demographer’s definition of an estimate is generally termed an “indirect estimate” because unlike a sample survey, the data used to construct a demographic estimate do not directly represent the phenomenon of interest (Swanson and Stephan, 2004: 758 and 763). In this paper I use the demographic tradition’s definitions and distinctions unless specifically noted.2

    Another useful set of concepts is the notion of “stocks and flows”. As defined by Popoff and Judson (2004: 603), “…stock data are the numbers of persons at a given date, classified by various characteristics…(and) are recorded from censuses….flow data are the collection of or summation of events. At the most basic level this includes births, deaths, and migration flows….” This distinction is useful for purposes of this paper because, as is discussed later in this section, there are population estimations methods that solely rely on “stock” data while others rely on a combination of “stocks” and “flows.”

    Finally, it is useful here to define micro data and aggregated data. I take micro data to mean records for individual persons. These records are often linked by relationships to form family and household records and I use the term “micro data” to refer to these linked records as well. The “Public Use Microdata Sample” (PUMS) is such a file (Swanson and Stephan, 2004: 772). Aggregated data are summations of records of individuals (families and households) such as one would find in a table. The aggregations are often done to specific geographic areas, but they can also be done for types of people across different geographies. The life table constructed by Kintner and Swanson (1994) for retirees of General Motors is an example of such an aggregation.

    Basic Data Sources. All estimates, including post-censal ones, rely on one or more censuses and use administrative record systems on which different estimation methods for census-defined populations rely – vital events, tax returns, housing permits, assessor parcel files, utility hookups, licensed drivers, covered employment, K-12 enrollment, Medicare, and child support payments, among others ( Bryan, 2004a; Bryan, 2004b). It is important to note that there is some variation in availability and quality of administrative records systems by state and by local jurisdictions in the U. S. as well as variation among countries. It also is important to note that the Census Bureau maintains as much consistency in data sources and methods as it can because among other desirable features, it wants to have a consistent set of estimates for a given “vintage” year (See Appendix A of this report, U. S. Census Bureau, no date). I note here the emergence of an important resource directly collected by the Census Bureau – a Master Address File (MAF) constructed for the 2000 census that is updated and maintained until the next census. This is a new resource for the Census Bureau’s estimates program because in the previous “mail-out/mail-back censuses, the MAF was constructed from scratch before each census. As observed nearly 25 years ago by Pittenger (1982) and more recently by Wang (1999), this housing unit inventory serves as a key resource in the Census Bureau’s ability to construct population estimates and I use his key ideas in discussing how the MAF can so be used later in this paper.3

    Methods. Although it is not used directly in any of the standard population estimation methods used at the sub-national level, the fundamental demographic identity known as the balancing equation forms the conceptual framework for most of these same methods. This identity is defined as Pt = P0 + I – O, where Pt is the given population at time 0 + t, P0 is the given population at time 0, I is the number of persons entering the population through birth and in-migration during the period 0 – t, and O is the number of persons exiting the population through death and out-migration during the period 0 – t (Swanson and Stephan, 2004: 753). This identity can be phrased in more detail to separately recognize births, deaths, in-migration, and out-migration. It is used as a point of departure to discuss in detail the concept of “stocks and flows” and the measurement thereof encompassed in the following methods.

    1. Simple Interpolation and Extrapolation Methods

      Although no longer widely used in their own right, interpolation methods (see, .g., Judson and Popoff, 2004) and extrapolation methods (see, e.g., Smith, Tayman, and Swanson, 2001) represent ways to construct, respectively, inter-censal estimates and post-censal estimates. These methods range from being relatively simple (e.g., linear trending) to very complex (ARIMA models). Both interpolation and extrapolation are based on mathematical formulas that are applied to “stock” data to produce “flows” that, in turn, generate estimates. As such, the principles underlying these methods, particularly extrapolation, are often found in other estimation methods (e.g., regression methods).

    2. Housing Unit Method

      The Housing Unit Method (HUM) is a “stock” method that describes a basic identity in the same way that the balancing equation does. In the case of the HUM, this identity is usually given as P = H*O*PPH + GQ, where P = Population, H = housing units, O= Proportion occupied, PPH = average number of persons per household, and GQ = the population residing in “group quarters” and the homeless (Bryan, 2004b). Like the balancing equation, the HUM equation can be expressed in less detail (i.e., P= HH*PPH + GQ, where HH=H*O, Smith and Cody, 2004: 2) or more detail - by structure type, for example (Swanson, Baker, and Van Patten, 1983). It also can be used in combination with sample data, which opens the door to developing measures of statistical uncertainty for the estimates so produced (Roe, Carlson, and Swanson, 1992). Because of how data are collected, the HUM had not been a method that could be used for all sub-national areas and the nation as a whole until recently. However, with the continuous MAF, it has now emerged as a method that can be used by the Census Bureau for all sub-national areas and the nation as a whole (Wang, 1999).

    3. Regression Methods

      Regression approaches to population estimation are basically “stock” methods in which measures of change in the ratios of indicators to population are used as “flow” estimates that are extrapolated to generated population estimates (Bryan, 2004b). The flow estimates serve as independent variables in these forms, which means that the dependent variable is a measure of population change. Measures of change can be in the form of ratios, lagged ratios, and differences (Bryan 2004b). These regression methods require a nested set of geographies (e. g., the counties within a given state) and they are inherently embedded in statistical inference (Swanson, 2004). As observed by Prevost and Swanson (1985), the “ratio-correlation” form can be viewed as a regression-based version of the so-called “synthetic” method of estimation.4

    4. Component Methods

      Component methods are directly based on the fundamental demographic identity known as the balancing equation. As such, they are stock and flow methods. Included in this set are “Component Method II,” “Cohort-Component Method,” and the “Tax Return Method,” each of which is described by Bryan (2004b). The stock data are comprised of census counts in each of these methods, which use administrative records (e. g, vital events) to develop flow estimates.

    5. Administrative Records

      So-called direct estimates can be acquired from selected types of administrative records systems, namely the national population registration systems found in the Nordic countries (Bryan, 2004a: 31-33; Statistics Finland, 2004). Although the United States lacks a national population registration system, it has several national administrative record systems that serve as partial population registers, including those relating to social insurance and welfare and the payment of income taxes (Bryan, 2004a; Judson, 2000).6 It is worthwhile at this point to again bring up the MAF, which represents a national housing registration system that can be used to generate estimates using the Housing Unit Method.

    6. Other Methods

      Here, I include the economic-demographic models and urban systems models described by Smith, Tayman, and Swanson (2001: 185-237) as well as the iterative proportional fitting, log-linear, and multiregional methods described by Judson and Popoff (2004). To this list can be added the methods developed for statistically underdeveloped countries and those for estimating wildlife populations (briefly discussed in Endnote # 2) as well as the imputation methods used by the Census Bureau to compensate for missing data (see, e.g., Swanson and Stephan, 2004: 762).

      In concluding this brief overview of methods of population estimation, I note that it is often the case that various data adjustments must be made to effectively operate the preceding methods and that these adjustments serve as “other methods” in themselves (Wang, 1999). For example, the presence of non-household populations, such as found in prisons, school dormitories, and long-term care facilities, can affect the accuracy of virtually all of the methods just described, as can the presence of seasonal populations, undocumented aliens, and the occurrence of disasters, natural and otherwise.5

  • Researcher Needs

    In this section, I describe what I believe researchers need and note some of the benefits before I propose a solution to meet these needs (the next section). As stated earlier, I postpone a discussion of obstacles to my proposal until the final section.

    What I believe is needed by all researchers is a system that provides an historical set of sub-county estimates of populations and their characteristics that can be rolled up to all higher administrative and statistical geographies for a given vintage to produce a “one-number” set that is consistent with data both from decennial census counts and sample surveys regularly done by the Census Bureau. This is consistent with the principles underlying the Census Bureau’s estimates program (See Appendix A). Further, the ideal foundation of these estimates would, I believe be comprised of individual data on persons that are linked to households and other living arrangements in specific locations. What I have just described, of course, is something that does not exist for the United States – a national population register, a system that contains micro level data that can be rolled up and linked both across time and with other data, such as the case found in Finland (Statistics Finland, 2004). For discussion purposes, I will refer to this as the U. S. national population file (the “population file”).

    Although the distinction is not clear-cut, most researchers, applied and basic, tend to use aggregated data. However, as noted later, some types of basic researchers would prefer to use micro data, given that it is available. Moreover, while both are interested in historical data, most researchers, applied and basic, tend to use cross-sectional data while some types of basic researchers would likely lean toward truly longitudinal data. Both groups, applied and basic, would be served by a population file in these regard to this types of data, however.

    Swanson and Pol (2004: 29-30) distinguish public sector and private sector interests in applied demography and observe that estimates and projections serve as an important link between the two sectors. Using a similar public and private sector dichotomy, Siegel (2002: 220- 328) describes demographic applications for businesses on the one hand, and government and private nonprofit organizations on the other. Siegel’s (2002) list includes: market research, business site location, sales forecasting, legal and regulatory requirements associated with business activities, consumer research, demographic aspects of domestic and foreign investment, the labor force, organizational demography, housing needs, transportation planning, socio-economic and health characteristics, political applications (e.g., apportionment and re-districting), the costs and benefits of insurance, the U.S. social security system, and the allocation of public funds between children and the elderly. Clearly, a population file containing the number of people under consideration for a given study would be useful in all of these areas of applied work, as would information on the characteristics of these people.

    Basic researchers can largely be distinguished as being either mathematical demographers or “socio-economic” researchers - those that use demographic perspectives in answering questions in specific disciplines, such as economics, geography, and sociology (Burch, 1993, McNicoll, 1992 and Swanson and Stephan, 2004: 758). Often, those in the latter category are interested in the determinants and consequences of demographic changes.

    Like applied demographers, most basic researchers use aggregated data in their work and for most mathematical demographers, this type of data is preferable. (Coale and Demeny, 1966; Dharmalingam, 2004; Li and Tuljapurkar, 2005; Pollard, 1973; Rogers, 1995; and Suchindran, 2004). While it is the case that those interested in using demographic perspectives to answer discipline-specific questions often use aggregated data (Clark, 1986; Rogers, Hummer, and Nam, 2000; Stockwell, Goza, and Balistreri, 2005; Treyz, Rickman, Hunt, and Greenwood, 1993), it is clear from discussions and analyses found in the literature, many would prefer to have micro data, if they were available. This is because many of these basic researchers are interested in hypotheses concerning individuals (Brandon and Hogan, 2004; Livingston, 2006; Mutchler and Baker, 2004; Ryan, Manlove, and Hofferth, 2006) and in using aggregated data to addresses their hypotheses about individuals, they have to deal with problems such as aggregation bias and the ecological fallacy (Freedman, 2004; and King, Rosen, and Tanner, 2005). As is the case for applied demography, I believe that a population file containing the number of people under consideration for a given study would be useful in both of these areas of basic research, as would information on the characteristics of these people.

    I do not believe that there are many who would argue against the utility of a national population file for applied and basic researchers. I believe that the situation would be similar for national, state and local data users. The issue here, of course, is that it is virtually a certainty that there will be no national population file in our lifetimes, if ever. American traditions and values are not in favor of such a system, given concerns about government intrusion into privacy. So, why have I bothered to discuss this ideal but unachievable data source? The reason is that there is an existing “register” in the Census Bureau that can yield something close to a national population register when coupled with the Bureau’s record matching, extant data collection, and other capabilities. This register is the Master Address File, or MAF, and to it I now turn.

  • A Suggestion for Meeting the Needs of Researchers

    Before I offer my suggestion regarding the MAF and its potential for meeting the needs of applied and basic researchers, it is important to note that others have thought along similar lines in regard to other “registers.” Here, I am thinking primarily of research into the development of an “administrative records census,” which has been going on for at least 20 years (Alvey and Scheuren, 1982; Kliss and Alvey, 1984, Scheuren, 1999). Initially, much of this work was done within the U.S. Internal Revenue Service, but this has broadened to include other federal agencies, including the Census Bureau (Prevost, 1996; Judson, 2000; Judson and Bauder, 2002). Research and other activities in the U. S. related to administrative records censuses have also been commented on by researchers outside of the country (Redfern, 1986). However, it is still the case that the U. S. Census Bureau had not attempted to conduct a full-blown administrative records census (Bryan 2004a, Bryan 2004b, Bryan and Heuser 2004).

    I also note that my suggestion is largely based on a call by Wang (1999) for greater recognition of the utility of the MAF in regard to population estimates. Wang also provided specific suggestions on how to overcome the problems associated with maintaining and updating the MAF such that the data were of high quality.

    Wang’s (1999) suggestions, along with the ideas underlying an administrative records census, lead directly to the idea of viewing the MAF as the basis for a national housing register (the “housing register”). What primarily distinguishes the MAF from the housing register is the presence of population data. The first step in turning the MAF into a national housing register is to load it with both enumerated and estimated population and related data for individual housing units. How might this work?

    Initial estimates could be provided by matching selected census 2000 short form data to individual housing units in the MAF. On a regular basis (e. g, once each year), the individual housing unit records could be updated using similar “short form” data from the American Community Survey (ACS) in conjunction with demographic methods (e. g, survivorship estimation), direct substitution in housing units appearing in the ACS sample for a given vintage (i.e., a given year), and imputation and related estimation methods for those in the same vintage and area that are not in the ACS. Individual housing unit data from the “old” version would be so identified and remain attached to each record so that measures of change could be computed for individual records (i.e., individual housing units). Thus, the system would be a housing register containing a combination of collected and estimated data centered on demographic characteristics (i.e., age, sex, race, household relationships) distinguished, as appropriate, by year. When a year ending in zero is reached, the data for the preceding decade could be archived and a new file started for the coming decade.

    For many applied researchers as well as basic researchers interested in mathematical demography, this short form housing register would serve most of their needs. For others, the housing unit records would need to start each decade with both short and long form data and be updated accordingly with ACS short and long form data. Because the long form data were collected on a sample basis in the 2000 Census, this would mean that long form data would be imputed for individual housing units not in the sample. Once the annual update cycle starts, all long form data would be imputed for individual housing units not in the ACS sample.

    What are some of the specific benefits of a national housing register? Here are some examples. To begin, I believe it would assist the Census Bureau in solving four of the problems facing its estimates program identified by Habermann (2006). First, “short form” data from the housing register would serve well as the population controls for the ACS. This could be particularly important for small pieces of geography. Second, the combination of short and long form data in the housing register would serve to improve estimates of internal migration as well as emigration and immigration. Third, the housing register would allow bringing additional data sources into the sub-national population estimates beyond the ACS, to include administrative data sources on employment and taxes. And, fourth, the housing register would allow for research needed to improve methods to achieve integrated and consistent population estimates at different levels of geography. In this regard, Habermann (2006) observes that the current approach begins at the county level, with the estimates controlled only at the national level.

    As many, if not all, of you know, the Census Bureau has recently been confronted with the possibility of a reduction of more than $50 million in the budget proposed by the Executive Branch for its FY 2007 operations (Lowenthal, 2006). This is not a new phenomenon and much of the impetus for reduced and otherwise tight budgets comes from the high costs of collecting data. In this regard, I believe that the housing register would also be of benefit. For example, Statistics Finland (2004: 26) reports that it was pressured by the Ministry of Finance to move to a register-based system because of the recurring high costs associated with taking a census. After it made the change following its 1980 census, Statistics Finland (2004: 26) reports that in 2003 money terms the cost of its 2000 register-based census was less than one million euros while the traditional 1980 census costs were approximately 35 million euros. This evidence strongly suggests that a housing register would assist the Census Bureau in containing costs.

    A housing register would contribute toward having more timely, comprehensive, and internally consistent demographic, housing, and socio-economic data for the U. S. as a whole and its sub-areas. In regard to geography, I note that register based data are extremely flexible in that they can be geo-coded to a specific location (as opposed to being assigned to an area defined by administrative or statistical boundaries). This also means that the housing register can be overlaid with other features using GIS capabilities. The TIGER street address file comes to mind first in this regard. This would lead to an entirely new way of looking at the concept of a “small area,” in that boundaries could be drawn that are much finer than those allowed by the census defined block. This would allow much higher precision in defining areas for purposes of marketing, site location. Once up and running, this would also allow for greater ease in producing a consistent time series for areas in which administrative boundaries changed over time.

    It is also worthwhile to note that if the housing register were assembled into a single register along with similarly geo-coded group quarters locations and commercial establishments, and public buildings (e.g., fire stations), the result would be tremendous data source for applied researchers. Imagine being able to map not only existing, but also historical and potential “future” service areas and their populations using such a system. Here, it is useful to note that is precisely the situation that exists currently in Finland (Statistics Finland, 2004: 41-44). I also note that this proposal also is in line with recommendations made by the National Research Council’s Committee on the Human Dimensions of Global Change (National Research Council, 2005).

    To summarize, what I am proposing here is a register – the national housing register – in which each individual housing unit contains not only existing MAF variables (e.g., geocode, address, and structure type), but also information on occupancy status; in addition, each occupied housing would include variables that provide “short form” demographic characteristics and, if feasible, variables that provide some degree of “long form” socio-economic characteristics. Occupancy status and the demographic and socio-economic characteristics would be generated using a combination of decennial census and ACS data in conjunction with a combination of record matching and estimation methods, particularly imputation and related forms of modeling.

  • Discussion

    Turning now to the obstacles associated with my proposal for a national housing register, I begin with the issue of confidentiality. The National Research Council’s Panel on Data Access for Research Purposes (2005) has identified the lack of resources and structural incentives for making data more readily available as major contributors to the difficulty of reconciling access to data by researchers with the need to preserve confidentiality.7 The issue of confidentiality is not an insignificant problem. As the Census Bureau recently learned, even the perception of a breach of confidentiality can become a major outcry (Clemetson 2004a, 2004b, 2004c; Lipton, 2004). One can see that the development by the Census Bureau of any type of register containing information on individuals can run into public and political resistance due to confidentiality concerns. This was noted over twenty years ago by Pittenger (1982). However, I believe that this problem is not insurmountable in regard to my proposal for a national housing register. The National Research (2005) Council has issued recommendations to reconcile access and confidentiality and the Census Bureau itself has appointed a Chief Privacy Officer and worked to put effective procedures in place regarding this reconciliation. There are recommendations for going even further (El-Badry and Swanson, 2005) as well as the ideas provided by the highly effective laws, rules, and procedures, developed by Statistics Finland (2004) to effect the reconciliation of access to data by researchers and the preservation of confidentiality.8 Taken altogether, I believe that the Census Bureau is capable of creating a national housing register that would be useful to researchers while also being subject to strong confidentiality safeguards.

    What about the issue of privacy?7 What may be ideal from a researcher’s point of view may not be ideal from the perspective of others. For example, those concerned about the intrusion of the federal government into private lives would not be pleased at the prospect of what amounts to a national individual data base even no major outcry has been raised in regard to the three “lightly” regulated, non-mandated, de facto private sector registration systems maintained by Equifax, Experian, and TransUnion for purposes of determining credit worthiness. I believe that this may be a more difficult obstacle for the Census Bureau to overcome than that represented by concerns over confidentiality. Much of this has to due with privacy being intertwined with the mix of constitutional mandate, case law, executive orders, and general tradition that calls for an actual count of the population rather than the development of a register (Anderson, 1988; U. S. GAO, 2003; Swanson and Walashek, 2004; Weinjert, 2003). Thus, the Census Bureau and its allies would have to mount a dedicated effort to build public trust in the idea of a national housing register.

    Another obstacle is the financial cost of developing a national housing register. An idea of these costs is given by Redfern (1986) in his discussion of the cost of converting from a traditional census to an administrative records census. However, once developed (or converted, as the case may be), it appears that the costs for a national housing register could be less than the system currently being used in the U. S. for developing post-censal estimates and decennial census counts. I use here the information from Statistics Finland (2004: 26) discussed earlier in regard to the comparative costs of registries and censuses. It also is worth noting here that local officials in Finland update the country’s population and housing registries (Statistics Finland, 2004: 21). Thus, I see no major cost obstacle in following Wang’s (1999) suggestion that state and local governments be funded to assist in maintaining the MAF under the general supervision of the Census Bureau. Before such a major step is taken, however, it would be wise to research the various forms this could take. El-Badry and Swanson (2005) call for research on such a recommendation in terms of public involvement in administrative oversight of the Census Bureau.

    What about accuracy? Can the proposed housing register provide accurate data? In a recent report, the Government Accounting Office (U. S. GAO, 2006) identified MAF/TIGER problems that needed to be solved in order to have a good census in 2010. These problems include: (1) resolving address related issues such as duplication, omission, deletion, and incorrect locations in the MAF; and (2) implementing GPS-based geo-coding of housing units. These same two problems represent sources of error in the proposed housing register. Consequently, if the Census Bureau solves these problems in regard to the 2010 census, it will essentially do so in regard to the proposed housing register.

    There are problems already known in regard to using the housing unit method of population estimation that would affect the MAF and therefore the accuracy of a housing register. Many of these are known to the Census Bureau staff already dealing with MAF updates (e.g., tracking new housing units, converted housing units, and deleted housing units). One problem worth mentioning here involves seasonal populations and seasonal housing. In areas with substantial seasonal changes in population, great care must be taken to get an estimate of the de jure population. Since the implementation of the ACS, this problem will be compounded. This is because of differences between the ACS and the decennial census in regard to what constitutes the de jure population (CACPA/PAA, 2005). As such, an accurate housing register will need to deal with the seasonal housing issue and the differences in the definition of the de jure population found in the ACS and the decennial census.

    Judson, Popoff, and Batutis (2001) have pointed out that there is a great deal of evidence to support the idea that administrative records systems have systematic biases and they found support for this in an empirical study they conducted. This means that the MAF and, hence, the proposed housing register will be subject to systematic biases. Fortunately, however, Judson, Popoff, and Batutis (2001) also use their findings to make several recommendations regarding the reduction of these biases. Considering their research in conjunction with the experience being gained by Census Bureau in regard to the MAF/TIGER system, I believe that the accuracy of a national housing register would be sufficient for purposes of resource allocation, research, and planning.

    Another obstacle is the need to have a set of unified identification codes in order to match and merge records from different systems using electronic processing. As noted by Statistics Finland (2004), if there is no unified system of identification codes then it is extremely difficult and laborious, if not impossible, to link records across different systems. In particular, a unique code will be needed for every dwelling in the register, including those in multi-unit structures. In this regard, I point out that Finland has developed such a coding system and that it includes all structures – commercial, residential, and seasonal (Statistics Finland, 2004: 58-60).

    With the exception of the issues of confidentiality and privacy, all of the challenges facing the development of a national housing register are in the form of costs, technical problems, or a combination of both. I agree with Wang (1999) that the major technical tasks of the National Accounting of Address and Housing Inventory come down to two areas - Address data collection and MAF/TIGER update. I also agree with Wang (1999) that a feasible way to effect a solution to these problems is to enhance the federal-state-local cooperative programs already part of Census Bureau activities such that local entities are compensated for helping to maintain the system. This is how Statistics Finland (2004) maintains its register system and there are data collection activities in the U. S. that already follow this model (Wang, 1999).

    The national housing register I am proposing goes beyond what was envisioned by Wang. As such, I believe that his suggestions are necessary but not sufficient for this purpose. As is suggested here, there are many political, administrative, and technical obstacles that would need to be overcome. How exactly would researcher access be reconciled with confidentiality and privacy? What would the housing register cost to build and maintain and what savings elsewhere would be gained, if any? How would ACS data be combined with individual housing units – are they sufficient to provide the household level estimates that I am proposing (e.g., age, race, sex, household relationships, household size, vacancy rates, and socio-economic characteristics) or would that stretch imputation and related modeling techniques, as well as other capabilities too far?9 If so, could the register function on, say, a block group level? If this were the case, then could the ACS provide block level results of the data? These are questions only further thought and empirical testing are likely to resolve. The question that the Census Bureau needs to answer is if it appears my recommendation is sufficiently interesting to considering giving it the “thought” test before considering any small “empirical” test (e.g., similar to the Administrative Records Census Experiment reported by Judson and Bauder in 2002) before proceeding further. To give the Census Bureau some food for thought, as it considers my question, I offer a quote from Ching-Li Wang’s (1999: 15) paper on developing the MAF into a resource for making post-censal population estimates:

    “Is the development of the National Accounting of Addresses and Housing Inventory feasible? The ideas presented in the paper may cause many people to say that it is impossible because there are so many problems. This is exactly the same reaction we saw in the late 80s when the Census Bureau was developing the TIGER to digitize the nation’s geography from coast to coast. Now we can see how useful and powerful the TIGER is today.”

    In closing, I would like to believe that if Ching-Li were still alive, he would be willing to make a similar statement on behalf of the national housing register.