Applied Analysis Note 1

Computing model-implied or expected scores in a growth modeling context

When estimating a growth curve model (e.g., i s | y1@0 y2@1 y3@2 y4@3; i s on x;), one may invoke the Mplus SAVEDATA/FSCORES output command and option and save individual level estimates of level (i) and slope (s) parameters. These can be used to estimate model-implied or expected values for y for each person at each time point. The general formula is

Predicted_y = i + s*t

where t corresponds to the time steps (in this example: 0,1,2,3).

The general formula applies regardless of whether or how many x's are in the model, what their scale is (e.g., if they are centered), or what their regression estimates are. As long as the x's influence latent variables only, and not the y's directly, the factor scores are indifferent to the presence of x's in the Mplus model. That implies that the relationship between the latent variables (i, s) and the x's viewed in the Mplus output will be mostly preserved when looking at the factor score estimates. The relationships will not be exact, however, due to factor score indeterminancy, skewness in the observed variables, and missing data.

Below is a STATA program that calls runmplus that shows this is true.

First, the results are summarized -------------------------------------
NO CENTERING
-----------------------------------------------------------------
          |                           t (TIME)                          
        x |         0          1          2          3      Total
----------+------------------------------------------------------
 Group  0 | -0.002417   0.197629   0.389425   0.592540   0.294294   mean observed y
          | -0.002323   0.195289   0.392901   0.590514   0.294095   mean model-implied y (py=i+s*t)
          | -0.000094   0.002340  -0.003477   0.002027   0.000199   mean residual (y-py)
          | -0.197706   0.002340   0.194136   0.397251   0.099005   mean residual for wrong way++

++ py2=(ci+`g11'*cx)+(cs+`g21'*cx)

          | 
 Group  1 |  0.016746   2.194302   3.419218   4.599193   2.557365   mean observed y
          |  0.325799   1.818557   3.311315   4.804073   2.564936  mean model-implied y (py=i+s*t)
          | -0.309052   0.375745   0.107903  -0.204881  -0.007571   mean residual (y-py)
          |  -3.4e+00   -1.2e+00  -0.022339   1.157636  -0.884192   mean residual for wrong way ++

++ py2=(ci+`g11'*cx)+(cs+`g21'*cx)
          | 
    Total |  0.007165   1.195965   1.904321   2.595866   1.425829   mean observed y
          |  0.161738   1.006923   1.852108   2.697294   1.429516   mean model-implied y (py=i+s*t)
          | -0.154573   0.189042   0.052213  -0.101427  -0.003686   mean residual (y-py)
          |  -1.8e+00  -0.622458   0.085898   0.777444  -0.392594   mean residual for wrong way ++

++ py2=(ci+`g11'*cx)+(cs+`g21'*cx)



-----------------------------------------------------------------
WITH CENTERING on X
-----------------------------------------------------------------
          |                           t                          
       cx |         0          1          2          3      Total
----------+------------------------------------------------------
 Group-.5 | -0.002417   0.197629   0.389425   0.592540   0.294294   mean observed y
          | -0.002323   0.195290   0.392902   0.590515   0.294096  mean model-implied y (py=i+s*t)
          | -0.000095   0.002339  -0.003478   0.002025   0.000198  mean residual (y-py)
          |  0.613793   0.813839   1.005635   1.208750   0.910504  mean residual for wrong way py2=(ci+`g11'*cx)+(cs+`g21'*cx)
          | 
Group  .5 |  0.016746   2.194302   3.419218   4.599193   2.557365  mean observed y
          |  0.325799   1.818558   3.311317   4.804076   2.564937  mean model-implied y (py=i+s*t)
          | -0.309052   0.375744   0.107901  -0.204884  -0.007573  mean residual (y-py)
          |  -2.6e+00  -0.435756   0.789160   1.969135  -0.072693  mean residual for wrong way py2=(ci+`g11'*cx)+(cs+`g21'*cx)
          | 
    Total |  0.007165   1.195965   1.904321   2.595866   1.425829  mean observed y
          |  0.161738   1.006924   1.852110   2.697296   1.429517  mean model-implied y (py=i+s*t)
          | -0.154573   0.189042   0.052212  -0.101429  -0.003687  mean residual (y-py)
          | -0.999759   0.189042   0.897398   1.588943   0.418906  mean residual for wrong way py2=(ci+`g11'*cx)+(cs+`g21'*cx)
-----------------------------------------------------------------
---------------------------------------------------------------------
*** Experiments with LGM using MPLUS AND STATA
*** BUILT OFF THE lgm power calculation mplus/stata PROGRAM 
*** ESTIMATE LGM MODELS FOR TWO GROUPS FOLLOWING TRUE MODELS LISTED BELOW
local model1 "i s | y1@0 y2@1 y3@2 y4@3; [i@0 s@.2]; i@.5; s@.1; i with s@0; y1-y4@.5;"
local model2 "i s | y1@0 y2@1 y3@2 y4@3; [i@1 s@1.2]; i@.5; s@.1; i with s@0; y1-y4@.5;"
* The first step is to get a means and covariance matrix for a sample with the above
* growth models as the true model. The procedure for doing so is based on the LGM power 
* calculation procedure described on STATMODEL.COM Then we use STATA to generate
* data from the implied means and covariance matrix, separately for each group (corresponding
* to true population models described above). The we run a growth model with the two groups 
* combined, with raw and centered covariate x (corresponding to true population model above), 
* and then calculate predicted y's various ways. We calculate predicted y with and without taking
* into consideration the observed x value and regression weight of i or s on x, from models with
* x centered and x not centered.
* The result is the formula above (py=i+s*t) produces the right result (lowest residual) 
* and the residual is essentially the same for the raw-x and centered-x models.
tempname null 
capture file close `null'
file open `null' using c:/trash/null.dat , write replace
#d ;
file write `null'
   "0 0 0 0 "   _n
   "1"   _n
   "0 1"   _n
   "0 0 1"   _n
   "0 0 0 1"   _n
   ""   _n ;
#d cr
file close `null'
set more off
tempfile f1 f2
foreach i of numlist 1/2 {
runmplus , ///
data(file = c:\trash\null.dat ; type = means covariance; nobservations = 10000;) ///
variable(names are y1-y4;) ///
model(`model`i'') ///
standardized residual savelog(c:\trash\step1) log(off)
* pull out model implied means
qui infix str line 1-90 using c:\trash\step1.out , clear
format line %90s
gen linenum=_n
gen keep=.
replace keep=1 if trim(line)=="Model Estimated Means/Intercepts/Thresholds"
su linenum if keep==1
drop if _n<=`r(min)'
replace keep=1 if trim(line)=="Residuals for Means/Intercepts/Thresholds"
replace linenum=_n
su linenum if keep==1
drop if _n>=`r(min)'
keep line
replace line=substr(line,15,75)
strparse line , g(m)
keep m*
gen i=_n
reshape long m , i(i) j(j)
drop i j
gen M=real(m)
drop m
drop if M==.
tempname true
capture file close `true'
file open `true' using c:/trash/trash1.do, write replace
file write `true' "drawnorm y1 y2 y3 y4 , clear n(10000) cstorage(lower) means("
while _N>0 {
local thud=M
file write `true' "`thud' "
drop if _n==1
}
file write `true' ") cov("
* pull out model implied correlation coefficients
qui infix str line 1-90 using c:\trash\step1.out , clear
format line %90s
gen linenum=_n
gen keep=.
replace keep=1 if trim(line)=="Model Estimated Covariances/Correlations/Residual Correlations"
su linenum if keep==1
drop if _n<=`r(min)'
replace keep=1 if trim(line)=="Residuals for Covariances/Correlations/Residual Correlations"
replace linenum=_n
su linenum if keep==1
drop if _n>=`r(min)'
keep line
strparse line , g(c)
keep c*
gen i=_n
reshape long c , i(i) j(j)
drop i j
gen C=real(c)
drop c
drop if C==.
while _N>0 {
local thud=C
file write `true' "`thud' " 
drop if _n==1
}
file write `true' ") " _n
file close `true'
do c:/trash/trash1.do
save `f`i'' , replace
}
use `f1' , clear
gen x=0
append using `f2'
replace x=1 if x==.
gen id=_n
qui {
runmplus y1-y4 x id , ///
  idvariable(id) ///
  model(i s | y1@0 y2@1 y3@2 y4@3 ; i s on x ; ) ///
  savedata(save=fscores; file=c:\trash\trash.dat) savelogfile(c:\trash\trash)
mat B=r(estimate)
mat g11=B["i_on_x",1]
mat g21=B["s_on_x",1]
local g11=g11[1,1]
local g21=g21[1,1]
preserve
runmplus_load_savedata , out(c:/trash/trash.out) clear
reshape long y , i(id) j(obs)
gen t=obs-1
        * this is the right way
gen py=i+s*t
        * this is the wrong way
gen py2=(i+`g11'*x)+(s+`g21'*x)
gen resid=y-py
gen resid2=y-py2
noisily di "NO CENTERING"
noisily table x t , c(mean y mean py mean resid mean resid2) f(%8.6f)
sort id t
save `f1' , replace
restore
* with centering
gen cx=x-.5
runmplus y1-y4 cx id , ///
  idvariable(id) ///
  model(ci cs | y1@0 y2@1 y3@2 y4@3 ; ci cs on cx ; ) ///
  savedata(save=fscores; file=c:\trash\trash.dat) savelogfile(c:\trash\trash)
   mat B=r(estimate)
   mat g11=B["ci_on_cx",1]
mat g21=B["cs_on_cx",1]
local g11=g11[1,1]
local g21=g21[1,1]
runmplus_load_savedata , out(c:/trash/trash.out) clear
reshape long y , i(id) j(obs)
gen t=obs-1
        * this is the right way
gen cpy=ci+cs*t
        * this is the wrong way
gen cpy2=(ci+`g11'*cx)+(cs+`g21'*cx)
gen cresid=y-cpy
gen cresid2=y-cpy2
noisily di "WITH CENTERING on X"
noisily table cx t , c(mean y mean cpy mean cresid mean cresid2) f(%8.6f)
keep id t cpy cresid cpy2 cresid2 cx ci cs
sort id t
save `f2' , replace
use `f1' , clear
merge id t using `f2'
}