# Stata: Transparency in Graphs

Stata 15 includes the ability to add transparency in graphs. What’s transparency you ask? Transparency is relevant when you have graphical elements that overlap. Without any transparency the element that is in front will completely obscure any elements behind it. Transparency solves that problem.

Suppose you want to plot two normal distributions. You can use the twoway function graph type to accomplish this (see Stata Transparency). If the distributions overlap at all, it will be difficult to fully appreciate how much they overlap because the distribution in front will obscure the distribution in back. Here’s an example:

twoway function y = normalden(x), range(-4 4) ///
color(eltgreen) recast(area) ///
|| function y = normalden(x+.5), range(-4 4) ///
color(ebblue) recast(area) ///
scheme(burd) legend(off)


￼ Note, I add the option scheme(burd) to use the burd plotting scheme (see burd). This isn’t necessary to use transparency. I just like burd better than the default graphing scheme. Type ssc install scheme-burd, replace if you dont have the burd scheme.

As you can see, the the blue distribution obscures the green. To fix this, we add transparency to the blue distribution. This is done by changing the color to color(ebblue%40). This makes the blue distribution 40% opaque.

twoway function y = normalden(x), range(-4 4) ///
color(eltgreen) recast(area) ///
|| function y = normalden(x+.5), range(-4 4) ///
color(ebblue%40) recast(area) ///
scheme(burd) legend(off)


￼ Trying fiddling around with the percentage to see how it affects things.

# LKJ Priors

## Visualizing the LKJ Correlation Distribution

I use multilevel models a lot. I’ve transitioned from using software like lme4 in R or mixed in Stata, which use maximum likelihood methods, to using Bayesian software like Stan or JAGS to estimate multilevel models as Bayesian hierarchical models. When mulitlevel models have 2 or more random effects, the prior for the random effects is usually a covariance matrix. Typically we want to estimate the parameters in the covariance matrix and thus we place a (hyper)prior on the covariance matrix.

Consider a simple growth-curve model for longitudinal data.

Where $b_0$ is the overall intercept, $b_1$ is the effect of time, $u_{0j}$ is a person-specific intercept, $u_{1j}$ is a person-specific effect of time. The $u$’s are the random effects, which we typically assume follow a multivariate normal distribution:

Unfortunately, choosing a prior for the covariance matrix can be difficult and, frankly, occassionally maddening. Usually I have to use an inverse-Wishart distribution, which seems easy enough. However, they can be tricky to specify once the covariance matrix gets pretty big. Further, in software like JAGS, the multivariate normal distribution is parameterized with a precision matrix, so you have to use the Wishart distribution. Finally, it is downright hard to choose a sensible prior for a covariance matrix.

Wouldn’t it be great if you could choose a prior using a correlation matrix? I, and I suspsect most folks, are a little more comfortable thinking about correlations than covariances. Fortunately, the developers of the open-source Bayesian modeling program Stan have made choosing a prior for a correlation matrix reasonably straightforward by using the LKJ Correlation Density.

It turns out it isn’t too difficult to choose a prior for this type of model because there is just a single covariance to deal with. However, suppose we extend the model above to include a random intercept and two random slopes and we estimate the covariances between all three. A possible way to set up this model in Stan is to choose whatever prior you’d like for the variances and then use the LKJ Correlation Distribution to to provide a prior for the correlations. You would then obtain the covariances by using the appropriate variances and correlation. I’ll do another post that provides the specifics.

There is only a single parameter for the LKJ Distribution - $\eta$. When I first started using the LKJ Correlation Distribution, I didn’t know what to set $\eta$ to and the Stan documentation didn’t initially make a lot of sense to me. Fortunately, Stan provides a random number generator for the LKJ Correlation Distribution, so it is easy to visualize the distribution using random draws for various values of $\eta$. The primary purpose of this post is to show how different values of $\eta$ affect the probability of different correlations.

## What the Stan Manual Says

The Stan Manual provides the following guidance regarding $\eta$ (page 385, Stan Manual 2.5):

The shape parameter $\eta$ can be interpreted like the shape parameter of a symmetric beta distribution.

• if $\eta$ = 1, then the density is uniform over all correlation matrices of a given order;
• if $\eta$ > 1, the identity matrix is the modal correlation matrix, with sharper peaks in the density around the identity matrix for larger $\eta$; and
• if 0 < $\eta$ < 1, the density has a trough at the identity matrix.

## Using Simulation to Visualize What the Manual Means

To see what this looks like we are going to simulate correlation matrices for $\eta$ = .5, 1, 2, 5, 10, and 50. We’ll then plot the density of one correlation from each of the simulated matrices to see how they look. Stan requires that you input some data, so I simulated some data out of poisson distribution to fit in Stan. We don’t care about those and they could be anything.

First, I load the required libraries for this post. Then I set the seed and create the data needed to pass into Stan. $x$ is 30 random draws from a poisson with a rate parameter = 5, $N$ is the sample size for the poisson data, and $R$ is the number of correlations we are dealing with (recall we have the correlations among the random intercept and 2 random slopes).

### Stan Model

Next I write the Stan model. The key part of this code is generated quantities block. This block creates 6 correlation matrices named Omega. The number after Omega describes the size of $\eta$ in the LKJ Distribution. This block also takes random draws from the LKJ Distribution with $\eta$ values equal to .9, 1, 2, 5, 10, or 50. We can then use these random draws to visualize the shape of the priors for a given correlation.

### Using rstan to fit the model

Next, I fit the model in R through the rstan package.

### Processing the MCMC draws and getting them prepped for ggplot

Finally, I process the data and transform it so that I can plot the density in ggplot2.

### Creating the Plot

Here’s the code for the plot.

### Visualizing the Density

The moment you’ve been waiting for – a plot of the density.

These plots are consistent with what the Stan manual says:

The shape parameter $\eta$ can be interpreted like the shape parameter of a symmetric beta distribution.

• if $\eta$ = 1, then the density is uniform over all correlation matrices of a given order;
• if $\eta$ > 1, the identity matrix is the modal correlation matrix, with sharper peaks in the density around the identity matrix for larger $\eta$; and
• if 0 < $\eta$ < 1, the density has a trough at the identity matrix.

I find this helps me make better decisions about the value of $\eta$.

This page provides Mplus input and output files, as well as data files, for the examples I use in the Structural Equation Modeling (SEM) workshop. I also provide a link for the lecture slides. The Mplus files and data are stored on GitHub. For any who are interested, feel free to clone the directory and improve it. I will merge changes as needed. You can download all the training files in .zip file here: SEM Workshop Example Files. The slides for the presentation can be downloaded here: SEM_ABCT_2013_morning and SEM_ABCT_2013_afternoon

## Power

For power there are a ton of examples. I am going to link to a zip file with all the example in them. Please get that here: Power Examples Zip File.

# Predicted Scores and Residuals in Stata

## Predicted Scores in Stata

As we discussed in class, the predicted value of the outcome variable can be created using the regression model. For example, we can use the auto dataset from Stata to look at the relationship between miles per gallon and weight across various cars. We estimate the follow equation

sysuse auto
regress mpg weight
Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =  134.62
Model |   1591.9902     1   1591.9902           Prob > F      =  0.0000
Residual |  851.469256    72  11.8259619           R-squared     =  0.6515
Total |  2443.45946    73  33.4720474           Root MSE      =  3.4389
------------------------------------------------------------------------------
mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight |  -.0060087   .0005179   -11.60   0.000    -.0070411   -.0049763
_cons |   39.44028   1.614003    24.44   0.000     36.22283    42.65774
------------------------------------------------------------------------------


Thus, we see a negative relationship between weight and mpg. For every 1 unit increase in weight, mpg goes down by -.006. We can obtain the predicted scores for the observations in our dataset with the following command:

predict mpg_pred


This creates a new variable called mpg_pred with the predicted mpg for all the weight values in our dataset. Here’s 20 of the actual mpg values and 20 of the predicted values.

list mpg mpg_pred in 1/20
+----------------+
| mpg   mpg_pred |
|----------------|
1. |  22   21.83483 |
2. |  17   19.31118 |
3. |  22   23.57735 |
4. |  20   19.91205 |
5. |  15   14.92484 |
|----------------|
6. |  18    17.3884 |
7. |  26   26.04091 |
8. |  20   19.73179 |
9. |  16   16.12658 |
10. |  19   19.01075 |
|----------------|
11. |  14   13.42267 |
12. |  14    16.0064 |
13. |  21   13.66302 |
14. |  29   26.76196 |
15. |  16   17.26823 |
|----------------|
16. |  22   20.33266 |
17. |  22   20.09231 |
18. |  24    22.9164 |
19. |  19   18.83049 |
20. |  30   26.70187 |
+----------------+


## Residuals in Stata

Recall the a residual in regression is defined as the difference between the actual value of $Y$ and the predicted value of $Y$ (or $Y^\prime$):

Thus, to compute residuals we can just subtract mpg_pred from mpg. Stata will do this for us using the predict command:

predict mpg_res, residuals


Here’s 20 of the actual mpg values, 20 of the predicted values, and 20 of the residuals.

 list mpg mpg_pred mpg_res in 1/20
+----------------------------+
| mpg   mpg_pred     mpg_res |
|----------------------------|
1. |  22   21.83483    .1651688 |
2. |  17   19.31118   -2.311183 |
3. |  22   23.57735    -1.57735 |
4. |  20   19.91205    .0879486 |
5. |  15   14.92484    .0751587 |
|----------------------------|
6. |  18    17.3884    .6115971 |
7. |  26   26.04091   -.0409119 |
8. |  20   19.73179    .2682092 |
9. |  16   16.12658   -.1265787 |
10. |  19   19.01075   -.0107484 |
|----------------------------|
11. |  14   13.42267    .5773304 |
12. |  14    16.0064   -2.006405 |
13. |  21   13.66302    7.336983 |
14. |  29   26.76196    2.238046 |
15. |  16   17.26823   -1.268229 |
|----------------------------|
16. |  22   20.33266    1.667341 |
17. |  22   20.09231    1.907688 |
18. |  24    22.9164    1.083605 |
19. |  19   18.83049    .1695122 |
20. |  30   26.70187    3.298132 |
+----------------------------+


We can see that the residuals are intact the difference between the first two columns. Given that the residuals are the part of the mpg that is unrelated to weight, mpg_res should be uncorrelated with weight. Let’s check:

. corr weight mpg_res
(obs=74)
|   weight  mpg_res
-------------+------------------
weight |   1.0000
mpg_res |   0.0000   1.0000


Magic!

# References for Power in Structural Equation Modeling

Kaplan, D. (1995). Statistical power in structural equation modeling. In R. H. Hoyle (Ed.), Structural Equation Modeling: Concepts, Issues, and Applications (pp. 100-117). Thousand Oaks, CA: Sage. Kaplan, D. & Wegner, R. N. (1993). Asymptotic independence and separability in covariance structure models: Implications for specification error, power, and model modification. Multivariate Behavioral Research, 28, 467-482. Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and structural equation analysis (4 ed.). Mahwah, NJ: Lawrence Erlbaum. – See Chapter 2 and Appendix Maccallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130-149. Muthen, L., & Muthen, B. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 9, 599-620. Saris, W. E., & Satorra, A. (1993). Power evaluations in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 181-204). Newbury Park, CA: Sage. Satorra, A. & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83-90.