Converting Panel Data into Percentiles to Observe Trends in Stata (Part 1)

Panel data provides us with observations over several time periods per subject. In this first of two blog posts, I’ll walk you through the process. (Stick with me here. In Part 2, I’ll show you the graph, I promise.)

The challenge is that some of these data sets are massive. For example, if we’ve collected data on 100,000 individuals over 15 time periods, then that means we have 1.5 million cells of information.

So how can we look through this massive amount of data and observe trends over the time periods that we have tracked? (more…)

Understanding Interaction Between Dummy Coded Categorical Variables in Linear Regression

The concept of a statistical interaction is one of those things that seems very abstract. Obtuse definitions, like this one from Wikipedia, don’t help:

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the simultaneous influence of two variables on a third is not additive. Most commonly, interactions are considered in the context of regression analyses.

First, we know this is true because we read it on the internet! Second, are you more confused now about interactions than you were before you read that definition? (more…)

Incorporating Graphs in Regression Diagnostics with Stata

You put a lot of work into preparing and cleaning your data. Running the model is the moment of excitement.

You look at your tables and interpret the results. But first you remember that one or more variables had a few outliers. Did these outliers impact your results? (more…)

Free May Craft of Statistical Analysis Webinar: Unlocking the Power of Stata’s Macros and Loops

There are many steps to analyzing a dataset. One of the first steps is to create tables and graphs of your variables in order to understand what is behind the thousands of numbers on your screen. But the type of table and graph you create depends upon the type of variable you are looking at.

There certainly isn’t much point in running a frequency table for a continuous variable with hundreds of unique observations. Creating a boxplot to look for outliers doesn’t make much sense if the variable is categorical. Creating a histogram for a dummy variable would be senseless as well.

How should you start this process? Should you create a spreadsheet listing all the names of the variables and list what type of variable they are? Should you paste the names into a Word document?

In this free webinar with Stata expert Jeff Meyer, you will discover the code to quickly determine the type of every variable in a dataset. By simply pressing the execute button on a do-file you will observe Stata placing each variable in a group (the macro) based on the type of variable it is.

You will watch, through the use of loops, Stata create the proper table and graph for each type of variable in a matter of minutes and output the data into a pdf file for future viewing. You will also receive the code to recreate and practice what you’ve learned.


Title: Improving Your Productivity by Unlocking the Power of Stata's Macros and Loops
Date: Thurs, May 26, 2016
Time: 1-2 pm EDT
Presenter: Jeff Meyer

Linear Regression in Stata: Missing Data and the Stories it Might Tell

In a previous post , Using the Same Sample for Different Models in Stata, we examined how to use the same sample when comparing regression models. Using different samples in our models could lead to erroneous conclusions when interpreting results.

But excluding observations can also result in inaccurate results.

The coefficient for the variable “frequent religious attendance” was negative 58 in model 3 (more…)

Mixed Models: Can you specify a predictor as both fixed and random?

One of the most confusing things about mixed models arises from the way it’s coded in most statistical software.  Of the ones I’ve used, only HLM sets it up differently and so this doesn’t apply.

But for the rest of them—SPSS, SAS, R’s lme and lmer, and Stata, the basic syntax requires the same pieces of information.

1.       The dependent variable

2.       The predictor variables for which to calculate fixed effects and whether those (more…)