Applied Analysis Note 1
Computing model-implied or expected scores in a growth modeling context
When estimating a growth curve model (e.g., i s | y1@0 y2@1 y3@2 y4@3; i s on x;), one may invoke the Mplus SAVEDATA/FSCORES output command and option and save individual level estimates of level (i) and slope (s) parameters. These can be used to estimate model-implied or expected values for y for each person at each time point. The general formula is
Predicted_y = i + s*t
where t corresponds to the time steps (in this example: 0,1,2,3).
The general formula applies regardless of whether or how many x's are in the model, what their scale is (e.g., if they are centered), or what their regression estimates are. As long as the x's influence latent variables only, and not the y's directly, the factor scores are indifferent to the presence of x's in the Mplus model. That implies that the relationship between the latent variables (i, s) and the x's viewed in the Mplus output will be mostly preserved when looking at the factor score estimates. The relationships will not be exact, however, due to factor score indeterminancy, skewness in the observed variables, and missing data.
Below is a STATA program that calls runmplus that shows this is true.
First, the results are summarized -------------------------------------
NO CENTERING
-----------------------------------------------------------------
| t (TIME)
x | 0 1 2 3 Total
----------+------------------------------------------------------
Group 0 | -0.002417 0.197629 0.389425 0.592540 0.294294 mean observed y
| -0.002323 0.195289 0.392901 0.590514 0.294095 mean model-implied y (py=i+s*t)
| -0.000094 0.002340 -0.003477 0.002027 0.000199 mean residual (y-py)
| -0.197706 0.002340 0.194136 0.397251 0.099005 mean residual for wrong way++
++ py2=(ci+`g11'*cx)+(cs+`g21'*cx)
|
Group 1 | 0.016746 2.194302 3.419218 4.599193 2.557365 mean observed y
| 0.325799 1.818557 3.311315 4.804073 2.564936 mean model-implied y (py=i+s*t)
| -0.309052 0.375745 0.107903 -0.204881 -0.007571 mean residual (y-py)
| -3.4e+00 -1.2e+00 -0.022339 1.157636 -0.884192 mean residual for wrong way ++
++ py2=(ci+`g11'*cx)+(cs+`g21'*cx)
|
Total | 0.007165 1.195965 1.904321 2.595866 1.425829 mean observed y
| 0.161738 1.006923 1.852108 2.697294 1.429516 mean model-implied y (py=i+s*t)
| -0.154573 0.189042 0.052213 -0.101427 -0.003686 mean residual (y-py)
| -1.8e+00 -0.622458 0.085898 0.777444 -0.392594 mean residual for wrong way ++
++ py2=(ci+`g11'*cx)+(cs+`g21'*cx)
-----------------------------------------------------------------
WITH CENTERING on X
-----------------------------------------------------------------
| t
cx | 0 1 2 3 Total
----------+------------------------------------------------------
Group-.5 | -0.002417 0.197629 0.389425 0.592540 0.294294 mean observed y
| -0.002323 0.195290 0.392902 0.590515 0.294096 mean model-implied y (py=i+s*t)
| -0.000095 0.002339 -0.003478 0.002025 0.000198 mean residual (y-py)
| 0.613793 0.813839 1.005635 1.208750 0.910504 mean residual for wrong way py2=(ci+`g11'*cx)+(cs+`g21'*cx)
|
Group .5 | 0.016746 2.194302 3.419218 4.599193 2.557365 mean observed y
| 0.325799 1.818558 3.311317 4.804076 2.564937 mean model-implied y (py=i+s*t)
| -0.309052 0.375744 0.107901 -0.204884 -0.007573 mean residual (y-py)
| -2.6e+00 -0.435756 0.789160 1.969135 -0.072693 mean residual for wrong way py2=(ci+`g11'*cx)+(cs+`g21'*cx)
|
Total | 0.007165 1.195965 1.904321 2.595866 1.425829 mean observed y
| 0.161738 1.006924 1.852110 2.697296 1.429517 mean model-implied y (py=i+s*t)
| -0.154573 0.189042 0.052212 -0.101429 -0.003687 mean residual (y-py)
| -0.999759 0.189042 0.897398 1.588943 0.418906 mean residual for wrong way py2=(ci+`g11'*cx)+(cs+`g21'*cx)
-----------------------------------------------------------------
---------------------------------------------------------------------
*** Experiments with LGM using MPLUS AND STATA
*** BUILT OFF THE lgm power calculation mplus/stata PROGRAM
*** ESTIMATE LGM MODELS FOR TWO GROUPS FOLLOWING TRUE MODELS LISTED BELOW
local model1 "i s | y1@0 y2@1 y3@2 y4@3; [i@0 s@.2]; i@.5; s@.1; i with s@0; y1-y4@.5;"
local model2 "i s | y1@0 y2@1 y3@2 y4@3; [i@1 s@1.2]; i@.5; s@.1; i with s@0; y1-y4@.5;"
* The first step is to get a means and covariance matrix for a sample with the above
* growth models as the true model. The procedure for doing so is based on the LGM power
* calculation procedure described on STATMODEL.COM Then we use STATA to generate
* data from the implied means and covariance matrix, separately for each group (corresponding
* to true population models described above). The we run a growth model with the two groups
* combined, with raw and centered covariate x (corresponding to true population model above),
* and then calculate predicted y's various ways. We calculate predicted y with and without taking
* into consideration the observed x value and regression weight of i or s on x, from models with
* x centered and x not centered.
* The result is the formula above (py=i+s*t) produces the right result (lowest residual)
* and the residual is essentially the same for the raw-x and centered-x models.
tempname null
capture file close `null'
file open `null' using c:/trash/null.dat , write replace
#d ;
file write `null'
"0 0 0 0 " _n
"1" _n
"0 1" _n
"0 0 1" _n
"0 0 0 1" _n
"" _n ;
#d cr
file close `null'
set more off
tempfile f1 f2
foreach i of numlist 1/2 {
runmplus , ///
data(file = c:\trash\null.dat ; type = means covariance; nobservations = 10000;) ///
variable(names are y1-y4;) ///
model(`model`i'') ///
standardized residual savelog(c:\trash\step1) log(off)
* pull out model implied means
qui infix str line 1-90 using c:\trash\step1.out , clear
format line %90s
gen linenum=_n
gen keep=.
replace keep=1 if trim(line)=="Model Estimated Means/Intercepts/Thresholds"
su linenum if keep==1
drop if _n<=`r(min)'
replace keep=1 if trim(line)=="Residuals for Means/Intercepts/Thresholds"
replace linenum=_n
su linenum if keep==1
drop if _n>=`r(min)'
keep line
replace line=substr(line,15,75)
strparse line , g(m)
keep m*
gen i=_n
reshape long m , i(i) j(j)
drop i j
gen M=real(m)
drop m
drop if M==.
tempname true
capture file close `true'
file open `true' using c:/trash/trash1.do, write replace
file write `true' "drawnorm y1 y2 y3 y4 , clear n(10000) cstorage(lower) means("
while _N>0 {
local thud=M
file write `true' "`thud' "
drop if _n==1
}
file write `true' ") cov("
* pull out model implied correlation coefficients
qui infix str line 1-90 using c:\trash\step1.out , clear
format line %90s
gen linenum=_n
gen keep=.
replace keep=1 if trim(line)=="Model Estimated Covariances/Correlations/Residual Correlations"
su linenum if keep==1
drop if _n<=`r(min)'
replace keep=1 if trim(line)=="Residuals for Covariances/Correlations/Residual Correlations"
replace linenum=_n
su linenum if keep==1
drop if _n>=`r(min)'
keep line
strparse line , g(c)
keep c*
gen i=_n
reshape long c , i(i) j(j)
drop i j
gen C=real(c)
drop c
drop if C==.
while _N>0 {
local thud=C
file write `true' "`thud' "
drop if _n==1
}
file write `true' ") " _n
file close `true'
do c:/trash/trash1.do
save `f`i'' , replace
}
use `f1' , clear
gen x=0
append using `f2'
replace x=1 if x==.
gen id=_n
qui {
runmplus y1-y4 x id , ///
idvariable(id) ///
model(i s | y1@0 y2@1 y3@2 y4@3 ; i s on x ; ) ///
savedata(save=fscores; file=c:\trash\trash.dat) savelogfile(c:\trash\trash)
mat B=r(estimate)
mat g11=B["i_on_x",1]
mat g21=B["s_on_x",1]
local g11=g11[1,1]
local g21=g21[1,1]
preserve
runmplus_load_savedata , out(c:/trash/trash.out) clear
reshape long y , i(id) j(obs)
gen t=obs-1
* this is the right way
gen py=i+s*t
* this is the wrong way
gen py2=(i+`g11'*x)+(s+`g21'*x)
gen resid=y-py
gen resid2=y-py2
noisily di "NO CENTERING"
noisily table x t , c(mean y mean py mean resid mean resid2) f(%8.6f)
sort id t
save `f1' , replace
restore
* with centering
gen cx=x-.5
runmplus y1-y4 cx id , ///
idvariable(id) ///
model(ci cs | y1@0 y2@1 y3@2 y4@3 ; ci cs on cx ; ) ///
savedata(save=fscores; file=c:\trash\trash.dat) savelogfile(c:\trash\trash)
mat B=r(estimate)
mat g11=B["ci_on_cx",1]
mat g21=B["cs_on_cx",1]
local g11=g11[1,1]
local g21=g21[1,1]
runmplus_load_savedata , out(c:/trash/trash.out) clear
reshape long y , i(id) j(obs)
gen t=obs-1
* this is the right way
gen cpy=ci+cs*t
* this is the wrong way
gen cpy2=(ci+`g11'*cx)+(cs+`g21'*cx)
gen cresid=y-cpy
gen cresid2=y-cpy2
noisily di "WITH CENTERING on X"
noisily table cx t , c(mean y mean cpy mean cresid mean cresid2) f(%8.6f)
keep id t cpy cresid cpy2 cresid2 cx ci cs
sort id t
save `f2' , replace
use `f1' , clear
merge id t using `f2'
}