- 1. The sample size is simply not large enough to provide good estimates for very small areas. There are a lot of BG's with less than 100 population and the effective sampling rate is about 1-in-11 over 5 years. This should result in rather large degrees of statistical uncertainty. There is not much that can be done about this, short of increasing the sample size of the survey. We just need to be aware that the data for these very small areas is going to have very large MOE's (margin of error).
2. The way the Bureau handles weighting of the ACS data concerns us, at least to the extent that we really know how it works. What we do know is that weighting occurs in stages. The initial weighting is done the old-fashioned way: the sample households are pulled from a household universe on the MAF (the Bureau's Master Address File) and a weight assigned based on the sampling ratio. So if a sampling unit (not sure which - probably one based on a set of contiguous tracts) has 1000 housing units and 100 get sampled, then each of these gets a weight of 10 (initially). But then if only 70 of those 100 respond, and they do what is called "non-response followup" to get data for the 30 non-responders. If after some time they get data from only 15 of those 30 they go with the data they have and they adjust the weights on the 15 late responders so that they represent the entire set of 30 (this is called "sampling for non-response" and is a cost-cutting measure that was never used in a decennial census). The next phase of the weighting process is the one that bothers us. In this phase the Bureau adjusts the weights on the sample surveys so that when the data are tabulated at the county level by age, race, sex and Hispanic origin they will more or less match the estimates produced by the FSCPE program. So if we don't have enough black females in an age cohort we go to the surveys for people in this group and "adjust" their weights in order to make the data add up to these control totals. The problem with this is that while this may "enhance" the data when viewed at the county level, it could introduce significant inconsistencies when the data are tabulated at the smaller neighborhood levels.
Basically these 2 issues relate to both the shape and the size of a distribution. If we want to look at a table of persons by poverty ratio for a census tract, for example, the too-small sample size can mean that we may not have a very reliable way of insuring that the portions of persons in the various poverty ratio intervals are reliable. We might estimate that 10% of the population is below 50% of the poverty level when in fact it should be 5 or 20%. The other issue involving the weighting is more concerned with the size of the estimates, starting with the total population. We may be reporting 5000 persons for the tract (which comes from just adding up the weights assigned to all surveys in the area) when the actual count should be 3500 or 6000. These under- or over-inflated weights will then affect all of the counts within the tables. We can see examples of this sort of thing in the ACS data we have seen already. While data at the county level may match the official estimates, data at the place (city) level does not. The ACS total pop figures for most cities differs from the official place-level estimates by fairly significant amounts. The Bureau is aware of this problem and claims to have come up with a new weighting scheme that is going to improve this situation. We have no idea how this works, but we think it sounds like a whack-a-mole kind of solution, where they keep adjusting to make it fit at one geographic level, but then what does that do to other geographies that are not getting such statistical "fixes".
This means that when we get that set of basic pop counts at the tract and block group level in December that we are not going to really trust them. The Bureau likes to say that the ACS data are about the characteristics of the population and not the size.
The data released this year will use 2000 census geography (blocks, block groups and tracts), but the data that are to be released in the next cycle (the 2006-2010 period estimates to be released presumably late in 2011) will be using the new 2010 census geography. More importantly the Bureau plans to go back and re-weight all their surveys taking into account the results of the 2010 census. I don't know the details of how this will work, but surely it will result in some significant changes in the data we'll be seeing starting with the vintage 2010 data released in 2011. What they could do, of course, is do weight adjustments down at the small neighborhood levels (say, census tract) in order to get the pop counts for smaller areas to be more in sync with the data we'll be seeing coming from the 2010 census next year. There won't be any county-level pop estimates to worry about for 2010 since that is the year they get to use decennial census data -- no need to make educated guesses.
We have talked to people at the Census Bureau about our concerns regarding these small-area data and have had mixed responses. We have had expressions of "concern" and assurances that "it's not going to be that bad". We can't wait to find out which it's going to be.