The Census Bureau will be releasing Summary File 1 starting next week (the first two states will have their files released to the public next Thursday, June 16). We do not yet know which states those will be, but the Bureau did indicate during a May 26 webinar dealing with the SF1 release that we would be told each week what states to expect the following week. So we should know some time today (Thursday, June 9) which states to expect next week. [This just in: Alabama and Hawaii are the first two states!] These files will be released a few at a time during the months of June thru August. It is similar to how they released the SF1 Demographic Profiles product back in May, only much slower.
Summary File 1 will be the most important product to be released based on the 2010 census. In previous decades that honor went to Summary File 3 (with many more tables on many more interesting topics such as income, poverty, education, occupation, etc). You might think that because the questionnaire for the 2010 census had only sever questions per persons (one of which was the person's name, which doesn't really count as a data question) that there would not be all that many different ways to summarize the data. But you'd be wrong. SF1 contains 331 data tables comprised of 8,912 data cells. Not all of those tables/cells are available at all geographic levels. At the block and block group levels, for example, we have only 235 tables with only 3,346 cells. To disseminate all these data the Bureau is providing access to a web site where you can download a set of 47 comma-delimited files with the tabular data. Yet another file contains all the geographic identifier data. A logical recorn number key can be used to link all these files together for one large summary for a single geographic area. The Bureau has released a spredsheet (xls file) that describes all these tables. We have downloaded a copy of this spreadsheet and stored in in our new sf12010 filetype directory, which can be accessed via Uexplore at http://mcdc.missouri.edu/cgi-bin/uexplore?/pub/data/sf12010. You'll see three files with the same name, sf1tables, and 3 different filetypes: xls, pdf and txt. The first is the original xls file and the last is a text file (tab-delimited and slightly edited version of the xls file) which we read in order to generate our own metadata.
The Varlabs subdirectory (which can be easily accessed by just clicking on its name from the Uexplore directory page) provides a series of codebook text files, which just happen to double as SAS source code used to provide variable labels to all the cell variables we'll be storing on our SAS data sets in this collection. We also have created a series of metadata data sets which will be of much importance to us, but not to you (unless you want to do your own software-generation thing). We have not yet written our SAS code to actually read the 47 csv files and generate the SAS data sets, but we have a model from last decade that we can use. We really cannot test any such code until we have some data. The Bureau has not seen fit to let us have access to any test files this decade, so we have to wait for the first "live" files, which we get embargoed access to next Tuesday. That is when the real fun begins for us.
We have not settled on a final strategy for how we shall be partitioning the data for a state into multiple SAS data sets. But here is the tentative plan. We create six data sets for each state:
a. Mopco: (Replace “Mo” with whatever state or “us”): Only the pco data tables are saved here. These tables will not be on any other data set. (pco is a new table-type for 2010: Person data tables available only at county level and "above").
b. Moblocks: Block level data. Only the P and H tables saved.
c. Mobgs: Block group level data. Only the P and H tables.
d. MoInventory: Inventory summary levels for census tract and above. Has P, H, PCT and HCT tables excluding those with an alpha suffix.
e. MoHierarchal: Hierarchal summary levels (070, 080, etc) for tract and above. Same as previous.
f. MoRtables: The tables with alpha suffixes for all geographic levels. Kept separate because of the size.
We welcome comments and suggestions for alternate strategies.
