How do you know the tabulations are right?

This is a significant and sensible concern, so I’m going to suggest a few steps you can take to verify your tabulations.

  1. Apply common sense to make sure everything appears reasonable.  Suppose, for example that you are looking at populations in CA and AK.  Obviously the CA numbers should be much larger than the AK numbers.
  2. Use the census bureau’s pretabulated sources to help you understand what kind of numbers are reasonable (they are free).
  3. When following rule #1 do not automatically assume the tabulation is wrong if something seems strange – instead, track it down and find out why it looks bizarre.  The tabulation is probably correct, and you may find something fascinating.  For example, a few days ago I was running a tabulation that showed where millennials were migrating towards, while also looking at their average incomes.  I found that the most popular location for them to go was a particular area of Texas, and that they were getting below poverty level income.  On the surface, this makes no sense, and might appear to be in error. Why would they flock to this area to be poor?   With a little effort, I found that area is the home of Texas A&M university, and they were college students.
  4. Spot check the data.  This is going to take some work on your part, but for really important tabulations (or to satisfy your boss) it can be worthwhile.  Here’s the recipe:
    1. Pick out a line (or a few lines) of the tabulation that you want to check.
    2. Tell me which line it is.  I’ll send you a list of the serial numbers of the households in the cell (or if it’s person level, I’ll send the serial numbers and person numbers)
    3. Download the appropriate PUMS file from the census bureau
    4. Search the PUMS file and check that every Serial/Person Number I gave you belongs in the appropriate line of the tabulation (by looking at the characteristics of each record).
    5. Because this can be a tedious operation, I suggest choosing a line with very few households or people, and that applies only to a single state (so you only have to download and work with only one state file).  If it’s a small state file (e.g., AK) you can load it into a spreadsheet and work there.  Otherwise, you will need some other method to extract the records you need from a large state file, then put the extracted records in a spreadsheet.  (I use bash grep, which is free and works great for this, but you need some background in how to use it.)