Monthly Archives: August 2018

PUMAs: What they are, and why I use them so often

PUMAs are used to protect the privacy of the people who live in the U.S., while still allowing their data to be analyzed in a multitude of ways.

Let me explain.

A PUMA is a geographical area (defined by the Census Bureau) that is smaller than a state. PUMA stands for “Public Use Microdata Area”, and they are important for a couple of reasons.

When the Census Bureau distributes their data to people like me (who tabulate it themselves, instead of using a pre-made Census Bureau tabulation), they give me the actual information from the person who answered the Census questionnaire. In order to do that, while still protecting the privacy of the respondent, it must be ‘sanitized’ before they hand it out.

One of the things they do to sanitize the data is to ‘blur’ the geographical area that the person lives in, by grouping them with at least 100,000 of their neighbors. If they did not sanitize the data, but kept it at (for example) the block group level, the public could look at a person in the data, and by knowing a few things about them (e.g. their block group, number of people in the house, number of cars, number of bedrooms in the house, ages of occupants), they could identify the people. Once they were identified, they could use the PUMS file to find out what their mortgage payments were, their earnings, their investment incomes ….

To prevent that, while still allowing us to analyze data, the Census Bureau invented the PUMA.

The PUMA is the smallest geographical area in the PUMS file, and (along with other sanitizing features) makes it impossible to use the PUMS file to violate a person’s privacy. Using PUMAs allows us to analyze areas smaller than states.

In the case of larger cities, we can analyze fractions of the cities. For example, NYC is broken up into 55 PUMAs, so by grouping them all together, I can analyze NY City separately from NY State. Or I can analyze the 5 Burroughs individually. Or I can regroup the PUMAs in any way that is most appropriate for the analysis.

In the case of the city that I live near (Altoona, PA), the city is too small to have even a single PUMA to itself. Instead, two counties have to be added together to make a single PUMA, and Altoona is a portion of them. The end result is that I can split NYC into 55 different areas and regroup them any way desirable, but I can’t analyze Altoona at all, unless I lump Blair and Huntingdon counties in with it.

In short, PUMAs allow a very good balance that protects privacy, while allowing analysis of fairly small geographical areas.

U.S. Census Bureau page on PUMAs
U.S. Census Bureau page on PUMS file