Can MAST Really Answer Nearly Every Question (Part II – a bit more about scripts)?

It is well understood by most, though not all, that when I say that MAST can answer nearly every question, it is limited to the answers that can be possibly be derived from the data. Obviously, MAST cannot derive the next set of lottery numbers or the distance to the moon from census data if the information isn’t in census data to begin with. Similarly it cannot produce mailing lists or analyze on geographies smaller than PUMAs because that information is not in the ACS PUMS file.

Where MAST excels is in deriving information that is either directly stored, or can be derived from, the data. For example, yesterday a market research analyst needed to know the average household incomes of people who were 18-34 years old, and see this information by each state. Sounds simple, right? You would think that you could just google something that basic!!

For MAST, that’s a pretty simple tabulation, but without MAST, it’s actually difficult to get. There are a few aspects to the request that make it non-trivial. First, the age breakout is specific, and therefore not something that is likely to be available online in a pre-tabulated form. A much bigger hurdle is that it has a household level component (household income) and a person level component (age). This is a point where most resources (either collections of pre-tabulated data or query tools) will fall flat. Here is the MAST script that can answer this question, followed by an explanation:

BEGIN VOLUME
Wgtp
Hincp_vwa
END VOLUME

BEGIN ENTITY CLASSIFICATION
Name=RefAge
BEGIN BUCKET01
Agep_v
_Relp,Labs/relp
Reference person
END BUCKET01
BEGIN RELATIONSHIP
END RELATIONSHIP
BEGIN BAND VALUES
0-17
18-34
35+
END BAND VALUES
END ENTITY CLASSIFICATION

BEGIN ENTITY CLASSIFICATION
Name=Hst
_Hst,Labs/hst
END ENTITY CLASSIFICATION

BEGIN ENTITY DISPLAY
RefAge
Hst
END ENTITY DISPLAY

Scripts are written in blocks, and we see 4 pretty simple blocks here. The blocks are contained within BEGIN/END bookends.

The first block we see here is a VOLUME block. These are the numbers that we want to collect, sliced-and-diced by dimensions. In this case, we want an average household income. That means that we will need to accumulate the total weighted and adjusted volumetric portion of HINCP, which is in Hincp_vwa (details described in a previous post). We are also going to need total household counts (Wgtp) for each line in the tabulation, so that we can divide the total Hincp_vwa by Wgtp to get the required average.

The next block that we see is contained within BEGIN/END ENTITY CLASSIFICATION bookends. This type of block allows us to create household level information based on aggregations of person level data. This block creates a new household level variable, called RefAge. It categorizes every household based upon the age of the reference person within the household. The analyst only needed the 18-34 band, but I included the others just because it is often useful to see how much data is in the part of the tabulation that you aren’t interested in.

Next a household level classification called Hst is made, which is just State (at the household level). This is just a direct usage of the census bureau’s “ST” field with no manipulation.

Finally (because there is no person section to this simple tabulation) there is a block contained within BEGIN/END ENTITY DISPLAY bookends. This tells MAST the household level dimensions that are desirable for the tabulation, in this case, RefAge and Hst.

Running this script through MAST causes MAST to dimensionalize Pwgt and Hincp_vwa based on State and RefAge. MAST prints this information in a spreadsheet, and the user can then divide Hincp_vwa by Pwgt in the spreadsheet (a simple operation) and have their answer.

Leave a Reply

Your email address will not be published. Required fields are marked *