LKJ Priors

Visualizing the LKJ Correlation Distribution

I use multilevel models a lot. I’ve transitioned from using software like lme4 in R or mixed in Stata, which use maximum likelihood methods, to using Bayesian software like Stan or JAGS to estimate multilevel models as Bayesian hierarchical models. When mulitlevel models have 2 or more random effects, the prior for the random effects is usually a covariance matrix. Typically we want to estimate the parameters in the covariance matrix and thus we place a (hyper)prior on the covariance matrix.

Consider a simple growth-curve model for longitudinal data.

Where is the overall intercept, is the effect of time, is a person-specific intercept, is a person-specific effect of time. The ’s are the random effects, which we typically assume follow a multivariate normal distribution:

Unfortunately, choosing a prior for the covariance matrix can be difficult and, frankly, occassionally maddening. Usually I have to use an inverse-Wishart distribution, which seems easy enough. However, they can be tricky to specify once the covariance matrix gets pretty big. Further, in software like JAGS, the multivariate normal distribution is parameterized with a precision matrix, so you have to use the Wishart distribution. Finally, it is downright hard to choose a sensible prior for a covariance matrix.

Wouldn’t it be great if you could choose a prior using a correlation matrix? I, and I suspsect most folks, are a little more comfortable thinking about correlations than covariances. Fortunately, the developers of the open-source Bayesian modeling program Stan have made choosing a prior for a correlation matrix reasonably straightforward by using the LKJ Correlation Density.

It turns out it isn’t too difficult to choose a prior for this type of model because there is just a single covariance to deal with. However, suppose we extend the model above to include a random intercept and two random slopes and we estimate the covariances between all three. A possible way to set up this model in Stan is to choose whatever prior you’d like for the variances and then use the LKJ Correlation Distribution to to provide a prior for the correlations. You would then obtain the covariances by using the appropriate variances and correlation. I’ll do another post that provides the specifics.

There is only a single parameter for the LKJ Distribution - . When I first started using the LKJ Correlation Distribution, I didn’t know what to set to and the Stan documentation didn’t initially make a lot of sense to me. Fortunately, Stan provides a random number generator for the LKJ Correlation Distribution, so it is easy to visualize the distribution using random draws for various values of . The primary purpose of this post is to show how different values of affect the probability of different correlations.

What the Stan Manual Says

The Stan Manual provides the following guidance regarding (page 385, Stan Manual 2.5):

The shape parameter can be interpreted like the shape parameter of a symmetric beta distribution.

  • if = 1, then the density is uniform over all correlation matrices of a given order;
  • if > 1, the identity matrix is the modal correlation matrix, with sharper peaks in the density around the identity matrix for larger ; and
  • if 0 < < 1, the density has a trough at the identity matrix.

Using Simulation to Visualize What the Manual Means

To see what this looks like we are going to simulate correlation matrices for = .5, 1, 2, 5, 10, and 50. We’ll then plot the density of one correlation from each of the simulated matrices to see how they look. Stan requires that you input some data, so I simulated some data out of poisson distribution to fit in Stan. We don’t care about those and they could be anything.

sim_data <- list(x = rpois(30, 5), N = 30, R = 3) 

First, I load the required libraries for this post. Then I set the seed and create the data needed to pass into Stan. is 30 random draws from a poisson with a rate parameter = 5, is the sample size for the poisson data, and is the number of correlations we are dealing with (recall we have the correlations among the random intercept and 2 random slopes).

Stan Model

Next I write the Stan model. The key part of this code is generated quantities block. This block creates 6 correlation matrices named Omega. The number after Omega describes the size of in the LKJ Distribution. This block also takes random draws from the LKJ Distribution with values equal to .9, 1, 2, 5, 10, or 50. We can then use these random draws to visualize the shape of the priors for a given correlation.

sim_stan <- "
data {
  int<lower=0> N; // number of observations
  int<lower=0> x[N]; // outcome variable
  int R;
parameters {
  real lambda;
model {
  x ~ poisson_log(lambda);  
generated quantities {
  corr_matrix[R] Omega0;
  corr_matrix[R] Omega1;
  corr_matrix[R] Omega2;
  corr_matrix[R] Omega5;
  corr_matrix[R] Omega10;
  corr_matrix[R] Omega50;
  Omega0 <- lkj_corr_rng(R,.9);
  Omega1 <- lkj_corr_rng(R,1);
  Omega2 <- lkj_corr_rng(R,2);
  Omega5 <- lkj_corr_rng(R,5);
  Omega10 <- lkj_corr_rng(R,10);
  Omega50 <- lkj_corr_rng(R,50);

Using rstan to fit the model

Next, I fit the model in R through the rstan package.

sim_parms <- c('Omega0', 'Omega1', 'Omega2',
			   'Omega5','Omega10', 'Omega50')
fit_sim <- stan(model_code = sim_stan, pars = sim_parms,
                data = sim_data, chains = 1, iter = 10000)

Processing the MCMC draws and getting them prepped for ggplot

Finally, I process the data and transform it so that I can plot the density in ggplot2.

res_sim <-
names(res_sim) <- sub("\\[", "_", names(res_sim))
names(res_sim) <- sub("\\,", "_", names(res_sim))
names(res_sim) <- sub("\\]", "", names(res_sim))

smalldata <- res_sim[, c("Omega50_3_1", "Omega10_3_1", "Omega5_3_1",
                         "Omega2_3_1", "Omega1_3_1", "Omega0_3_1")]

smalldata$draw <- seq(1:length(smalldata$Omega50_3_1))

plotdata <- melt(smalldata, id = c("draw"))

plotdata$eta <- factor(plotdata$variable, 
		levels = levels(plotdata$variable), 
		label = c("eta = 50", "eta = 10", 
			  "eta = 5", "eta = 2", 
			  "eta = 1", "eta = .9"))
my.labs <- list(bquote(eta == 50), bquote(eta == 10),
		bquote(eta == 5), bquote(eta == 2),
		bquote(eta == 1), bquote(eta == .9))

Creating the Plot

Here’s the code for the plot.

p <- ggplot(plotdata, aes(x = value, colour = eta))
p + geom_density() +
  scale_colour_manual(values=1:6, breaks = levels(plotdata$eta),
                      labels = my.labs, name = "Shape") +
  xlab("Correlation Value") +
  ylab("Density") +
  ggtitle("Visualization of a correlation from the 
  		   lkj_corr density in Stan \n for various 
		   values of the shape parameter") + 

Visualizing the Density

The moment you’ve been waiting for – a plot of the density.

LKJ Density

These plots are consistent with what the Stan manual says:

The shape parameter can be interpreted like the shape parameter of a symmetric beta distribution.

  • if = 1, then the density is uniform over all correlation matrices of a given order;
  • if > 1, the identity matrix is the modal correlation matrix, with sharper peaks in the density around the identity matrix for larger ; and
  • if 0 < < 1, the density has a trough at the identity matrix.

I find this helps me make better decisions about the value of .

Beginning and Advanced SEM

This page provides Mplus input and output files, as well as data files, for the examples I use in the Structural Equation Modeling (SEM) workshop. I also provide a link for the lecture slides. The Mplus files and data are stored on GitHub. For any who are interested, feel free to clone the directory and improve it. I will merge changes as needed. You can download all the training files in .zip file here: SEM Workshop Example Files. The slides for the presentation can be downloaded here: SEM_ABCT_2013_morning and SEM_ABCT_2013_afternoon


For power there are a ton of examples. I am going to link to a zip file with all the example in them. Please get that here: Power Examples Zip File.

Predicted Scores and Residuals in Stata

Predicted Scores in Stata

As we discussed in class, the predicted value of the outcome variable can be created using the regression model. For example, we can use the auto dataset from Stata to look at the relationship between miles per gallon and weight across various cars. We estimate the follow equation

sysuse auto
regress mpg weight
      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =  134.62
       Model |   1591.9902     1   1591.9902           Prob > F      =  0.0000
    Residual |  851.469256    72  11.8259619           R-squared     =  0.6515
-------------+------------------------------           Adj R-squared =  0.6467
       Total |  2443.45946    73  33.4720474           Root MSE      =  3.4389
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      weight |  -.0060087   .0005179   -11.60   0.000    -.0070411   -.0049763
       _cons |   39.44028   1.614003    24.44   0.000     36.22283    42.65774

Thus, we see a negative relationship between weight and mpg. For every 1 unit increase in weight, mpg goes down by -.006. We can obtain the predicted scores for the observations in our dataset with the following command:

predict mpg_pred		

This creates a new variable called mpg_pred with the predicted mpg for all the weight values in our dataset. Here’s 20 of the actual mpg values and 20 of the predicted values.

list mpg mpg_pred in 1/20
     | mpg   mpg_pred |
  1. |  22   21.83483 |
  2. |  17   19.31118 |
  3. |  22   23.57735 |
  4. |  20   19.91205 |
  5. |  15   14.92484 |
  6. |  18    17.3884 |
  7. |  26   26.04091 |
  8. |  20   19.73179 |
  9. |  16   16.12658 |
 10. |  19   19.01075 |
 11. |  14   13.42267 |
 12. |  14    16.0064 |
 13. |  21   13.66302 |
 14. |  29   26.76196 |
 15. |  16   17.26823 |
 16. |  22   20.33266 |
 17. |  22   20.09231 |
 18. |  24    22.9164 |
 19. |  19   18.83049 |
 20. |  30   26.70187 |

Residuals in Stata

Recall the a residual in regression is defined as the difference between the actual value of and the predicted value of (or ):

Thus, to compute residuals we can just subtract mpg_pred from mpg. Stata will do this for us using the predict command:

predict mpg_res, residuals

Here’s 20 of the actual mpg values, 20 of the predicted values, and 20 of the residuals.

 list mpg mpg_pred mpg_res in 1/20
     | mpg   mpg_pred     mpg_res |
  1. |  22   21.83483    .1651688 |
  2. |  17   19.31118   -2.311183 |
  3. |  22   23.57735    -1.57735 |
  4. |  20   19.91205    .0879486 |
  5. |  15   14.92484    .0751587 |
  6. |  18    17.3884    .6115971 |
  7. |  26   26.04091   -.0409119 |
  8. |  20   19.73179    .2682092 |
  9. |  16   16.12658   -.1265787 |
 10. |  19   19.01075   -.0107484 |
 11. |  14   13.42267    .5773304 |
 12. |  14    16.0064   -2.006405 |
 13. |  21   13.66302    7.336983 |
 14. |  29   26.76196    2.238046 |
 15. |  16   17.26823   -1.268229 |
 16. |  22   20.33266    1.667341 |
 17. |  22   20.09231    1.907688 |
 18. |  24    22.9164    1.083605 |
 19. |  19   18.83049    .1695122 |
 20. |  30   26.70187    3.298132 |

We can see that the residuals are intact the difference between the first two columns. Given that the residuals are the part of the mpg that is unrelated to weight, mpg_res should be uncorrelated with weight. Let’s check:

. corr weight mpg_res
             |   weight  mpg_res
      weight |   1.0000
     mpg_res |   0.0000   1.0000



Resources for Power in SEM

References for Power in Structural Equation Modeling

Kaplan, D. (1995). Statistical power in structural equation modeling. In R. H. Hoyle (Ed.), Structural Equation Modeling: Concepts, Issues, and Applications (pp. 100-117). Thousand Oaks, CA: Sage. Kaplan, D. & Wegner, R. N. (1993). Asymptotic independence and separability in covariance structure models: Implications for specification error, power, and model modification. Multivariate Behavioral Research, 28, 467-482. Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and structural equation analysis (4 ed.). Mahwah, NJ: Lawrence Erlbaum. – See Chapter 2 and Appendix Maccallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130-149. Muthen, L., & Muthen, B. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 9, 599-620. Saris, W. E., & Satorra, A. (1993). Power evaluations in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 181-204). Newbury Park, CA: Sage. Satorra, A. & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83-90.

Stata: Dummy Coding


This post will illustrate how to:

  1. Use the generate and replace commands to create dummy variables.
  2. A second use of the generate command to dummy variables that is simpler that #1.
  3. Using tabluate to create dummy variables.

Dummy coding is used when you have nominal categories, meaning the groups are assigned a value for coding purposes, but the values don’t represent more or less of anything. For example, I might code three different categories of race and coded them as follows: Caucasian = 1, African American = 2, Hispanic = 3. The numbers are for coding purposes only, 3 is not actually any bigger or more than 1. But if we use these in a regression (or any other) analysis, the numbers will be treated as continuous - not categorical. So we need to create dummy variables. Generally, we create k-1 new groups, where k is the total number of groups, and one group is used as the reference sample, or the group we want to compare other groups to.

Method 1: Using generate and replace

We’ll use the built in system data systolic.


The drug variable has 4 levels.


Consequently, we’ll need to make 3 dummy coded indicator variables to represent drug. We’ll use level 4 as the reference category. We’ll use a series of generate and replace commands to create the variables. This is definitely the brute-force way to make the variables, but it makes the logic behind creating dummy variables clear.

generate drug1=1 if drug==1
replace drug1=0 if drug!=1
generate drug2=1 if drug==2
replace drug2=0 if drug!=2
generate drug3=1 if drug==3
replace drug3=0 if drug!=3

Tabulating each of the dummy variables – drug1 - drug3 – we see they match our original tabulation.


Method 2: Use generate only

We can also use a feature of the generate command to create a new variable that takes on the values 1 and 0. So, for example, if we want to create drug1, where drug1 is equal to 1 when the drug condition equals 1, we say:

generate drug1=drug==1

This creates a variable that is 1 when drug equals 1 (recall that == is a logical evaluator) and 0 any other time. If we want to use the fourth condition as the reference category, we repeat the generate command for drug2 and drug3.

generate drug2=drug==2
generate drug3=drug==3

Method 3: Use the generate option of tabulate

The function tabulate has an option called generate. The generate option takes one argument called stubname, where stubname is the stub of the new variable names created by the option. In our examples so far, the stub has been drug. Unlike the examples we have done so far, this method will create as many dummy variables as there are levels of the categorical variable (4 in our case). In our example, we type:

tabulate drug, generate(drug)

We now have four new variables in our dataset – drug1-drug4.