Hierarchical Regression in Stata: An Easy Method to Compare Model Results

by Jeff Meyer

by Jeff Meyer

An “estimation command” in Stata is a generic term used for a command that runs a statistical model. Examples are regress, ANOVA, Poisson, logit, and mixed.

Stata has more than 100 estimation commands.

Creating the “best” model requires trying alternative models.  There are a number of different model building approaches, but regardless of the strategy you take, you’re going to need to compare them.

Running all these models can generate a fair amount of output to compare and contrast. How can you view and keep track of all of the results?

You could scroll through the results window on your screen. But this method makes it difficult to compare differences.

You could copy and paste the results into a Word document or spreadsheet. Or better yet use the “esttab” command to output your results.  But both of these require a number of time consuming steps.

But Stata makes it easy: my suggestion is to use the post-estimation commandestimates”.

What is a post-estimation command?  A post-estimation command analyzes the stored results of an estimation command (regress, ANOVA, etc).

As long as you give each model a different name you can store countless results (Stata stores the results as temp files).  You can then use post-estimation commands to dig deeper into the results of that specific estimation.

Here is an example.  I will run four regression models to examine the impact several factors have on one’s mental health (Mental Composite Score). I will then store the results of each one.

regress MCS  weeks_unemployed   i.marital_status
estimates store model_1

regress MCS  weeks_unemployed   i.marital_status   kids_in_house
estimates store model_2

regress MCS  weeks_unemployed   i.marital_status   kids_in_house   religious_attend
estimates store model_3

regress MCS  weeks_unemployed   i.marital_status   kids_in_house  religious_attend    income
estimates store model_4

To view the results of the four models in one table my code can be as simple as:

estimates table model_1 model_2 model_3 model_4

But I want to format it so I use the following:

estimates table model_1 model_2 model_3 model_4, varlabel varwidth(25)  b(%6.3f) /// star(0.05 0.01 0.001) stats(N r2_a)

Here are my results:

image001

My base category for marital status was “widowed”. Is “widowed” the base category I want to use in my final analysis? I can easily re-run model 4, using a different reference group base category each time.

Putting the results into one table will make it easier for me to determine which category to use as the base.

image002

Note in table 1 the size of the samples have changed from model 2 (2,070) to model 3 (2,067) to model 4 (1,682). In the next article we will explore how to use post-estimation data to use the same sample for each model.
Jeff Meyer is a statistical consultant, instructor and writer for The Analysis Factor. Learn more about Jeff…

{ 12 comments… read them below or add one }

James Hirst

Hi Jeff – great article!

I am able to get the table, however the /// star(0.05 0.01 0.001) stats(N r2_a) section of code generates an error, so I cant see the significance stars or stats beneath. I see the following on my screen:

. estimates table TR TRS TRSC TRSCCS TRSCCSi, varlabel varwidth(25) b(%6.3f) /// star(0.05 0.01 0.001) stats(N r2_a)
option / not allowed
r(198);

How do I fix this?

Thanks so much!!

Reply

Jeff Meyer

Hi James,

Glad you found the article worthwhile. Is the text after the /// on the next line of your do-file? The three diagonal lines tells Stata that the code continues on the next line. If you want to have it all on the same line just remove the three diagonal lines.

Hope this solves your problem. I was able to run the code okay on my computer.
Jeff

Reply

AH

Hello Jeff,

I am running a “xtpcse” (linear regression with panel-corrected standard errors, I am correcting for heteroscedasticity and autocorrelation.
I have two models with the same DV, in the second regression I take one of the IV’s from regression1 and I break it down into components. I want to check if there is a significant improvement in the model when the IV is broken down. I usually use the Vuong test (1989) for thsi but it doesn;t work after xtpcse, any suggestions?
Thanks

Reply

Jeff Meyer

Hi,

Stata has removed the option for the Vuong test because it can give incorrect results.

Could you explain what you mean by breaking an IV into components.

Jeff

Reply

Ahmad

I ran three sets of multiple regression equations. The IVs are the same while the DVs are broken into quantity, quality and combined (quantity_quality using Principal Component Analysis). The R=squared and F-statistics of each of the three models are the same and p-values for each of the IVs in all the models are the same. However, the coefficients significantly differ across models for each variable though the direction of association (negative or positive) is the same across the models. My question is whether such estimation could be considered valid. Also, if I want to compare between the results, what do I do? Thank you.

Reply

Jeff Meyer

Hi Ahmad, if you have different dependent variables for each model you don’t want (and can’t) compare results between models. Anytime you have different dependent variables you have a different research question. If you are interested in all three models you will need to discuss the impact that each predictor has on each outcome variable. The results from one model are independent from the results of a different model.

Reply

kk

Dear Jeff,

I noticed some say that in logistic regressions, it’s possible to compare across models ( same set of independent variables, diff dependent variables as in different types/groups) by using predicted probability.

Reply

Jeff Meyer

Hi, if you have two models and they have different outcome (dependent) variables you have different research questions and are testing different hypotheses. Doing a statistical analysis you begin with your research question and identify your outcome variable. The next step is to select your predictors. We can determine which models’ predictors are a better fit by running -2LL, AIC and BIC if they have the same outcome variable.

If in your analysis you have a few research questions you are attempting to solve you could look at each model’s ROC curves and compare them. But this is not generally the way that research is done.

Reply

Balta

Hi Jeff
Thanks for the explanation. what about the different tests and statistics (such as AR1, AR2, Sargan tests, R2, etc) of each estimations? how to add them all in the final table?
Thanks

Reply

Jeff Meyer

Hi Balta,

The tests and statistics that you mention are from post estimation commands in Stata. The results shown in this article are extracting the results from the linear regression model.

My suggestion is, after running the post estimation command, use the command “ereturn list” to see what scalars are held in short term memory for that post estimation command. You might then be able to extract those by using the combination “estimates store” and “estimates table”. I have never tried doing that with post estimation commands.

Jeff

Reply

Marita

Excelent!, I have a 5 logistic models and need compare corrected classification between its, there is possible with this command?

Reply

Jeff Meyer

Yes and no, you would use the “estimates” command as shown above but since you are using a logistic model (which uses maximum likelihood method) you would need to run a likelihood-ratio test to see which model is best. The command in Stata for the likelihood-ratio test is “lrtest”. The example above gives an adjusted r-square because it is a linear model.

Jeff Meyer

Reply

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Previous post:

Next post: