Stata: Multiple Regression and Partial and Semipartial Correlations
This post will:
- Show how to extend bivariate regression to include multiple predictor variables.
- Show how to manually create partial and semipartial correlations using residuals from a regression model.
- Show how to use the
pcorrcommand to obtain partial and semipartial correlations.
We will again use the auto dataset.
sysuse auto, clear
Suppose we want to regress price on three variables — mpg, weight, and foreign. To add three predictors, we simply add the variables we want after the dependent variable.
regress price mpg weight foreign
This produces the following output.
Partial and Semipartial Correlations – Manual Method
Recall that a partial correlation is the relationship between x and y once the shared variance between x and x2 has been removed from x and once the shared variance between y and x2 has been removed from y. A semipartial correlation is similar except that we only remove the shared variance between x and x2 (i.e., y remains untouched). Note: Although I’ve only referenced x2, we can in principle include many control variables as our example will show.
Suppose we want to obtain the partial correlation between price and mpg controlling for weight and foreign. To do this we need the part of price that is independent of weight and foreign. We also need the part of mpg that is independent of weight and foreign. We can get this information with residuals.
The part of price independent of weight and foreign
To obtain the part of price independent of weight and foreign we regress price on weight and foreign.
regress price weight foreign
We then save the residuals for price. We’ll call this priceres
predict priceres, residuals
We now have a new variable in our dataset called priceres.
summarize priceres, detail
The part of mpg that is independent of weight and foreign
At this point, we need to repeat what we did about except substitute mpg for price. We regress mpg on weight and foreign.
regress mpg weight foreign
We save the residuals for mpg. We’ll call this mpgres (not very original, I know).
predict mpgres, residuals
We now have a new variable in our dataset called mpgres.
summarize mpgres, detail
We know have everything we need for the partial correlation between price and mpg controlling for weight and foreign. Specifically, we have priceres — which is the part of price that is independent of weight and foreign. We also have mpgres — which is the part of mpg that is independent of weight and foreign. Therefore, to obtain the partial correlation we simply need to correlate priceres and mpgres.
corr priceres mpgres
We also already have everything we need for the semipartial correlation. Recall that for the semipartial correlation we only remove the shared relationship between x and the x2 (or set of covariates). We don’t do anything with y. In our example price is our y variable. So to compute the semipartial correlation we correlate price (i.e., the “untouched” y variable) and mpgres (i.e., the part of mpg that is independent of weight and foreign).
corr price mpgres
Partial and Semipartial Correlations –
Thankfully Stata has a built in command for computing partial and semipartial correlations — pcorr. To obtain the partial and semipartial correlations, we type:
pcorr price mpg weight foreign
Note that the first variable listed is considered the y variable. All other variables are variables are considered x variables. Stata reports as many partial and semipartial correlations as there are x variables. Additionally, Stata reports the squared partial and squared semipartial correlations. These are interpreted as the proportion of shared variance between y and x controlling for the other x variables. The partial and semipartial correlations listed for mpg are the same as what we found above.