This will illustrate fitting a three level model (where we have time variation and then clusters) and extracting the various components from it. This is a rough document, but hopefully will be useful. This example is based on a dataset used in Rabe-Hesketh and Skrondal, chapter 8.10, but you don’t really need that text.
40.1 Load the data
We first load the data. Shoving a lot of things under the rug, we have five measurements on a collection of kids in Kenya across time. We are interested in the impact of improved nutrition. The children are clustered in schools. This gives a three-level structure. The schools were treated with different nutrition programs.
In the following we load the data and look at the first few lines. Lots of variables! The main ones are id (the identifier of the kid), treatment (the kind of treatment given to the school), schoolid (the identifier of the school), gender (the gender of the kid), and rn (the time variable). Our outcome is ravens (Raven’s colored progressive matrices asssessment).
kenya =read.dta( "data/kenya.dta" )# look at first 9 variableshead( kenya[1:9], 3 )
Level 1: We have for individual \(i\) in school \(j\) at time \(t\): [ Y_{ijt} = +* (t-L) + _{ijt} ]
Level 2: Each individual has their own growth curve. Their curve’s slope and intercepts varies around the school means: [ =* + gender{ij} + u_{0ij} ] [ = + gender*{ij} + u_{1ij} ] We also have that \((u_{0ij}, u_{1ij})\) are normally distributed with some 2x2 covariance matrix. We are forcing the impact of gender to be constant across schools.
Level 3: Finally our school mean slope and intercepts are [ =* + w_{0i} ] [ =* + meat_j +* milk_j + calorie_j + w*{1i} ] For the rate of growth at a school we allow different slopes for different treatments (compared to baseline). The milk, meat, and calorie are the three different treatments applied. We also have that \((w_{0j}, w_{1j})\) are normally distributed with some 2x2 covariance matrix: [
\[\begin{pmatrix}w_{j0}\\
w_{j1}
\end{pmatrix}\]
N((
\[\begin{array}{c}
0 \\
0
\end{array}\]
), ) = N((
\[\begin{array}{c}
0 \\
0
\end{array}\]
), _{sch} ) ]
The \(\mu_0\) and \(\mu_1\) are the slope and intercept for the overall population growth (this is what defines our marginal model).
We will use \(L = 1\) to center the data at the first time point (so our intercept is expected ravens score at onset of the study).
Conceptual question: Why do we not have treatment in the intercept for school? What would changing \(L\) do to our model and this reasoning?
\(\mu_{00} = 17.41\) and \(\mu_{11} = 0.59\). The initial score for boys is 17.4, on average, with an average gain of 0.59 per year for control schools.
\(\gamma_{01} = -0.30\) and \(\gamma_{11} = -0.14\), giving estimates that girls score lower and gain slower than boys.
The school-level variation in initial expected Raven scores is 0.45 (this is the standard deviation of \(w_{0i}\)), relatively small compared to the individual variation of 1.40 (this is the standard deviation of \(u_{0ij}\)).
The correlation of the \(u_{0ij}\) and \(u_{1ij}\) is basically zero (estimated at -0.09).
The random effects for school has a covariance matrix \(\Sigma_{sch}\) of [ _{sch} = ] The very negative correlation suggests an extrapolation effect, and that perhaps we could drop the random slope for schools.
The treatment effects are estimated as \(\mu_{11}=0.17, \mu_{12}=-0.13\), and \(\mu_{13}=-0.02\).
P-values for these will not be small, however, as the standard errors are all 0.09.
We could try to look at uncertainty on our parameters using the confint( M1 ) command, but it turns out that it crashes for this model. This can happen, and our -0.99 correlation gives a hint as to why. Let’s first drop the random slope and then try:
We then have to puzzle out which confidence interval goes with what. The .sig01 is the variance of the kid (id:schoolid), which we can tell by the range it covers. Then the next must be correlation, and then the slope. This tells us we have no confidence the school random intercept is away from 0 (.sig04).
Generally individual curves are estimated to have positive slopes. The schools visually look quite similar; any school variation is small compared to individual variation.