-
Title
-
Longitudinal Data Analysis
-
Author
-
Little, Todd D.
-
Deboeck, Pascal
-
Wu, Wei
-
Research Area
-
Methods of Research
-
Topic
-
Statistical Methods
-
Abstract
-
In this essay we review some of the emerging trends in modeling repeated measures data. Three general forms of longitudinal models are discussed: panel model designs, growth curve models, and intensive within‐person assessments. Each section discusses design elements that should be considered when using each of these types of longitudinal models, and introduces some emerging trends. In the section on panel designs, continuous time models and planned missing data models are introduced; these ideas will revolutionize the modeling and collection of panel data. In the section on growth curve models, the necessity of separately evaluating mean and covariance model fit is discussed. This section also introduces methods being used to carefully consider the time of measurements in temporal designs. Finally, the budding analysis of intensive within individual observations is considered, including recent work from mathematics that limits the generalizability of interindividual studies to individual outcomes.
-
Related Essays
-
Social Epigenetics: Incorporating Epigenetic Effects as Social Cause and Consequence (Sociology), Douglas L. Anderton and Kathleen F. Arcaro
-
To Flop Is Human: Inventing Better Scientific Approaches to Anticipating Failure (Methods), Robert Boruch and Alan Ruby
-
Repeated Cross‐Sections in Survey Data (Methods), Henry E. Brady and Richard Johnston
-
Ambulatory Assessment: Methods for Studying Everyday Life (Methods), Tamlin S. Conner and Matthias R. Mehl
-
Models of Nonlinear Growth (Methods), Patrick Coulombe and James P. Selig
-
Quantile Regression Methods (Methods), Bernd Fitzenberger and Ralf Andreas Wilke
-
The Evidence‐Based Practice Movement (Sociology), Edward W. Gondolf
-
Meta‐Analysis (Methods), Larry V. Hedges and Martyna Citkowicz
-
The Use of Geophysical Survey in Archaeology (Methods), Timothy J. Horsley
-
Network Research Experiments (Methods), Allen L. Linton and Betsy Sinclair
-
Structural Equation Modeling and Latent Variable Approaches (Methods), Alex Liu
-
Data Mining (Methods), Gregg R. Murray and Anthony Scime
-
Remote Sensing with Satellite Technology (Archaeology), Sarah Parcak
-
Quasi‐Experiments (Methods), Charles S. Reichard
-
Digital Methods for Web Research (Methods), Richard Rogers
-
Virtual Worlds as Laboratories (Methods), Travis L. Ross et al.
-
Modeling Life Course Structure: The Triple Helix (Sociology), Tom Schuller
-
Content Analysis (Methods), Steven E. Stemler
-
Person‐Centered Analysis (Methods), Alexander von Eye and Wolfgang Wiedermann
-
Translational Sociology (Sociology), Elaine Wethington
-
Identifier
-
etrds0208
-
extracted text
-
Longitudinal Data Analysis
TODD D. LITTLE, PASCAL DEBOECK, and WEI WU
Abstract
In this essay we review some of the emerging trends in modeling repeated measures
data. Three general forms of longitudinal models are discussed: panel model designs,
growth curve models, and intensive within-person assessments. Each section discusses design elements that should be considered when using each of these types of
longitudinal models, and introduces some emerging trends. In the section on panel
designs, continuous time models and planned missing data models are introduced;
these ideas will revolutionize the modeling and collection of panel data. In the section
on growth curve models, the necessity of separately evaluating mean and covariance
model fit is discussed. This section also introduces methods being used to carefully
consider the time of measurements in temporal designs. Finally, the budding analysis of intensive within individual observations is considered, including recent work
from mathematics that limits the generalizability of interindividual studies to individual outcomes.
INTRODUCTION
Longitudinal data analysis refers to any form of repeated assessments on the
same person(s). Three general categories of longitudinal model exist. The first
is the panel model where two or more assessment occasions are administered to a sample of persons. The panel model, as we detail later, focuses on
the individual differences across a sample of persons (or entities). The types
of model that can be fit to multioccasion data include discrete time models,
such as the cross-lag panel model, and continuous time models. A second category of longitudinal model is the latent growth curve model, which focuses
on intraindividual differences in change across multiple occasions. The third
type of model is the intensive within-person assessment approach. These
models, which are sometimes called p-technique, dynamic p-technique, or
dynamical systems models, focus on within-person changes across a very
large number of observations; these models also can be specified as either
discrete or continuous time forms. In the following, we delve into the pros
and cons of each approach. We then discuss a number of emerging trends
Emerging Trends in the Social and Behavioral Sciences. Edited by Robert Scott and Stephen Kosslyn.
© 2015 John Wiley & Sons, Inc. ISBN 978-1-118-90077-2.
1
2
EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES
surrounding design and measurement issues that are associated with these
longitudinal models. We close with a discussion of future directions.
Longitudinal models can be analyzed using traditional manifest-variable
analyses such as repeated measures ANOVA, ANCOVA, and MANOVA; as
cross-lagged regression models, as manifest variable path models, or as latent
variable models. We focus on the latent-variable approaches to modeling longitudinal data because of the attendant benefits they have over any manifest
variable approach (Bollen, 1989; Little, 2013). The benefits of latent variable
approaches include specifying and estimating a measurement model that
corrects for measurement error, has relaxed assumptions on the nature of the
measured items (e.g., only congenerity is assumed), and allows testing and
enforcing factorial invariance. In addition, latent variable models can easily thoroughly assess model fit, powerfully compare competing models, and
efficiently test complex statistical hypotheses related to mediation, moderation, and the like (see Little, 2013, for more on these benefits). Latent variable
models also can incorporate planned missing data designs as well as efficient
recovery of unplanned missing data.
Emerging trends in the area of longitudinal data analyses, therefore,
include relying on latent variables, incorporating modern missing data
treatments, and advancing the models that can be fit to data (e.g., multilevel structural equation models, continuous time models, and mixture
distribution models). Other emerging trends include critical design elements that are often ignored or underutilized. The trends that we highlight
are not comprehensive because many areas of longitudinal research are
experiencing developments that enhance the rigor and applicability of the
techniques (e.g., longitudinal social network modeling, latent transition
modeling, integrative data analysis, etc.). We focus on core issues related
to the “traditional” longitudinal models that involve latent variables (i.e.,
constructs represented by multiple indicators).
FOUNDATIONAL RESEARCH
INDIVIDUAL DIFFERENCES PANEL DESIGNS
The foundational longitudinal model is the individual differences panel
design. Traditionally, three or more measurement occasions are used. Panel
models attempt to model change and predictors of change. Most panel
models are fit as discrete time models. The primary characteristic of discrete
time models is that they do not explicitly account for the time interval or lag
between subsequent observations (Voelkle, Oud, Davidov, & Schmidt, 2012).
Figure 1 presents a three-wave model for discrete time points with the key
parameters labeled (from Little, 2013). This prototypical model shows three
Longitudinal Data Analysis
θ1,1
θ2,2
1
λ1,1
θ3,3
2
λ2,1
θ7,7
3
7
λ3,1
θ8,8
λ7,3
8
θ9,9
λ8,3
ψ1,1
Auto-regressive
β
θ13,13 13
θ14,14 14 θ15,15 15
λ13,5
λ9,3
ψ3,3
Residual
correlation
d
e
gg
2
β 3,
ψ2,2
λ4,2
4
θ5,5
λ5,2
5
β 5,4
-la
ss
o
Cr
Negative
affect
T1 (2)
Negative
affect
T2 (4)
β 4,2
λ6,2
θ6,6
λ10,4
6
ψ6,5
3
1
ψ2,1
λ15,5
Positive
affect
T3 (5)
ψ4,3
β
λ14,5
ψ5,5
β 5,3
6,
4,
Zero-order
correlation
θ4,4
Positive
affect
T2 (3)
β 3,1
Positive
affect
T1 (1)
9
3
λ11,4
Negative
affect
T3 (6)
β 6,4
ψ4,4
λ12,4
θ10,10 10 θ11,11 11 θ12,12 12
λ16,6
λ17,6
ψ6,6
λ18,6
θ16,16 16 θ17,17 17 θ18,18 18
Figure 1 A three-wave cross-lag panel model with parameter labels. Source:
(from Little, 2013). Little, T. D. (2013). Longitudinal structural equation modeling.
New York, NY: Guilford.
indicators per construct with strong factorial invariance (factor loadings and
intercepts are equal across time) specified. The key parameters of this model
are the auto-regressive paths, which estimate the stability or prior levels of
a construct, and the cross-lagged paths, which estimate the predictability
of the change variance between constructs. Other parameters could be
estimated, including additional (higher order) auto-regressive paths and
additional cross-lagged paths between Times 1 and 3.
With two-occasion data, difference scores or gain scores are sometimes used
to represent change. Gain scores, however, are not sensitive to predictors of
change because they are essentially a model for the mean structures and use
the individuals’ scores to characterize the nature of the distribution round
the mean change (Schoemann, Gallagher, & Little, 2015). Residual difference
scores, on the other hand, are optimally sensitive to identifying individual
difference predictors of the relative changes (Schoemann, Gallagher, & Little, 2015). In this regard, the emerging recommendations are to focus on the
residual difference score model to identify predictors of change.
4
EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES
CONTINUOUS TIME MODELS
0.3
0.0
0.1
0.2
M→Y
X→Y
−0.2
Discrete time parameter (Cross-lags)
Continuous-time models are emerging as powerful alternatives to the
traditional discrete-time cross-lagged panel model. Not explicitly modeling
the lag between observations, as is the case with discrete time models, has
several impactful consequences. Foremost, the estimated parameters will
depend on the selected lag (Gollob & Reichardt, 1987, 1991). Not only does
the magnitude of the estimates change with lag but the relative contributions of constructs to an outcome also change depending on lag. Even
the sign of parameters (i.e., positive versus negative), statistical inference,
and model fit can change depending on lag (Bergstrom, 1990; McCrorie
& Chambers, 2006; Oud, 2007). Continuous-time models, which explicitly model the lag between observations, can overcome the overwhelming
dependence of results on the lag selected by the researcher (Bergstrom, 1990).
Continuous-time models may also provide additional advantages such as
the ability to handle unequal lags, even those that differ across individuals.
Figure 2 presents an example of how parameters from a continuous-time
model differ from those of the discrete time model. This figure is based
on a model using a first-order stochastic differential equation (Oud &
Jansen, 2000; Voelkle et al., 2012). The circles at lags of 2 and 6 illustrate
the case of two researchers measuring the same construct at differing
lags. Both researchers produce three parameter estimates, but these estimates are not directly comparable because each is dependent on the lag
at which the observations were collected. The parameter estimates differ
in magnitude and, moreover, the conclusions regarding the effect of X on
Y would differ—one researcher suggests a positive effect and the other a
negative. Continuous-time model parameters are independent of time, but
X→M
0
2
4
Lag (months)
6
8
Figure 2 Continuous time estimates of the magnitudes of three effects in a
mediation model.
Longitudinal Data Analysis
5
can be used to solve for the expected discrete time parameters for many
possible lags. Had both researchers fit a continuous-time model to their
data, they could produce the lines in the figure. These lines represent the
expected cross-lag effects in a cross-lag panel model; they are calculated
from the continuous-time model parameters. In this particular case, a
continuous-time model that produces the same fit to the data as the discrete
cross-lag panel model was selected using the same number of estimated
parameters. Consequently, the continuous-time model does not produce
results that differ from the discrete time results; instead, it replicates both
sets of results. The discrete time models are similar to a microscope focusing
in on the results for a particular lag, while continuous-time models offer the
potential of understanding how relations between variables change across a
range of lags (Deboeck & Preacher, in press).
PLANNED MISSING DATA DESIGNS
Nearly all longitudinal research studies can benefit from the use of
planned missing data designs. Various designs are available, including
the two-method and the multiform designs. These designs rely on modern
techniques for treating missing data, including multiple imputation (MI) and
full information maximum likelihood (FIML) estimation. Planned missing
data designs have numerous advantages when properly implemented. One
key advantage of these designs is that the data are missing completely at
random, which means there is no bias in the parameter estimates. Depending on a number of factors, the relative efficiency of these designs can be
degraded relative to an equivalent complete case data collection design.
The cost to increase the power and to achieve the same relative efficiency is
usually offset by the cost savings of using a planned missing element.
The two-method design is an exception to the efficiency/cost balance. It is
a design that allows an increase in power relative to the analogous complete
case scenario. The two-method design is predicated on the idea that a measure exists that is unbiased but also is expensive to collect (e.g., classroom
observations, cortisol assays, Wechsler assessments, clinical assessments,
etc.). The design is also predicated on the existence of a cheaper measure
of the same construct but this measure is biased (e.g., teacher-report of
classroom behavior, self-report of stress, a multiple-choice tool of intellective
functioning, self-report of any clinical symptomology). When both measures
are given, only a random subsample of participants is given the expensive
measure (all participants receive the cheaper measure). The missing values
on the expensive measure are imputed using MI (or FIML estimation is
used). Then a bifactor SEM model is fit such that the indicators of the
cheap and expensive measure load onto the focal construct of interest and
6
EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES
a bias factor is specified to extract the bias from the indicators of the cheap
measure. The two-method design, thereby, yields maximum validity in the
measurement of the focal construct while increasing the size and power of
the overall sample. Longitudinal applications of the two-method design can
offer further benefits (Garnier-Villarreal, Rhemtulla, & Little, 2014).
The multiform designs can come in many different variations. The simplest
and most common is the three-form planned missing design. As the name
implies, three different forms of a questionnaire protocol are created such that
a significant proportion of the items are missing from a given form. The key
to this design is to create four different sets of variables, which are referred
to as the X, A, B, and C blocks or sets, respectively. The X block contains
key items that are administered to all participants. The A, B, and C blocks
are paired into AB, AC, and BC groupings to create the three forms of item
sets: XAB, XAC, and XBC. Participants are randomly assigned to a form. The
multiform designs provide cost savings and validity enhancements over a
corresponding complete case approach.
In terms of cross-section applications of the multiform design, two key
design elements much be kept in mind. First, the efficiency of these designs
is enhanced by placing items into sets such that the between-set correlations
among the items is as high as the item content will allow. For example, a
seven-item measure of positive affect would have one item assigned to the
X block and two items each would be assigned to the A, B, and C blocks,
respectively. Placing all items into a single set would have the tendency to
reduce the between-set correlation and reduce the efficiency of the design
relative to a complete case approach. The cost saving of a shorter protocol
can often be used to adjust the sample size to have the same or equivalent
power as a complete case approach. A second design trade-off to consider is
the gain in validity that is associated with the multiform designs. Because
fatigue and burden are reduced, constructs are more validly measured
than complete case approaches. In addition, exposure reactivity is reduced,
thereby increasing the validity of item responses. In terms of longitudinal
applications of these designs, using different forms at different time points
provides the greatest validity increases by reducing test-retest effects
(Jorgensen et al., 2014).
GROWTH CURVE MODELS
Latent growth curve models are quite ubiquitous as an emerging model for
longitudinal data. Its popularity, however, is not without problems. Growth
curve models reflect restricted, parsimonious models of the mean–structures
information in a longitudinal data set. Although numerous models can be
Longitudinal Data Analysis
7
specified and estimated, they all address questions regarding the interindividual differences of intraindividual change (Nesselroade, 1984, 1991).
This class of model, however, may not be appropriate for a given research
question. Questions regarding predictors of change, for example, are not
well examined in the context of growth curve models. When specifying a
growth curve model, key considerations include, spacing of measurement
occasions, specifying the location of the intercept (centering), and choosing
the functional form of the slope construct. Other considerations, include
model fit evaluation, residual stationarity, and factorial invariance (Little,
2013; Preacher, Wichman, MacCallum, & Biggs, 2008). Results from fitting
a multivariate growth curve model are depicted in Figure 3 and will be
used to guide the discussion of these issues (Little, 2013, for sample and
constructs details). Here, a measurement model is specified with factorial
invariance constraints in place and a linear growth curve is fit to these data.
As mentioned, the timing of measurements is perhaps the most neglected,
yet critical element of a longitudinal design. Here, we emphasize that all longitudinal models are only as good as the data collection design will allow. If
the design does not adequately reflect the nature of a change processes, the
statistical analysis tool will be compromised in its ability to detect and reveal
1
−0.10
2.08
0.28
Intercept
1.0
Negative
affect
0.23 time 1
0.94
0.31 P1
−0.04
1.04
0.23 P2
−0.11
0.0
0.0
1
1.02
0.37 P3
0.15
1.0
−0.04
3.0
2.0
1.04
0.22 P2
Slope
r : −0.79
1.0
0.0
1.0
1.0
Negative
affect
time 2 0.23
0.94
0.19 P1
0.02
−0.05
Negative
0.0
0.0
affect
1
time 3 0.23
1.02
0.27 P3
−0.11
0.94
1.04
0.18 P1
0.10 P2
−0.04
−0.11
0.15
1.02
0.26 P3
0.15
Negative
affect
time 4 0.23
0.94
1.04
0.10 P1
0.11 P2
−0.04
−0.11
1.02
0.21 P3
0.15
1
Figure 3 A growth curve model fit across four waves with factorial invariance
constraints and effects coded scaling constraints to retain the meaningful metric of
the observed indicators. Source: (from Little, 2013). Little, T. D. (2013).
Longitudinal structural equation modeling. New York, NY: Guilford.
8
EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES
any information about change or growth. Too often, data collection intervals
are selected on the basis of convenience or tradition rather than any depth
of theoretical consideration (Selig, Preacher, & Little, 2012). Pilot work is an
essential element of a high-quality longitudinal design because it can reveal
the pace with which a construct changes. Such information would allow a
more refined measurement design that would allow the statistical models to
more readily characterize the magnitude and strength of any change effect
under scrutiny.
Another important consideration is model fit evaluation. Complexity arises
as growth curve models involve both mean and covariance structures. The
misfit in the covariance structures can potentially mask the misfit in the mean
structures. To better identify the sources of misfit, Wu and West (2010) highlight the need to test model fit separately for the mean structures versus the
covariance structures. They recommended specifying the covariance structures as saturated, which would remove misfit from the covariance structures, to isolate the misfit in the mean structures. In addition, there are two
types of mean structures in growth curve models: one for individual change
trajectories and the other for average change trajectories (Wu, West, & Taylor, 2009). Wu and West (2013) showed that the traditional fit indices are poor
in detecting misfit in the mean structures for individual change trajectories
and suggested using some new indices such as concordance correlations and
residual plots similar to those used in regression analysis in detecting misspecifications in mean structures.
One key assumption of latent growth curve models is that the construct
under scrutiny is factorially invariant across time (i.e., no evidence of
differential item functioning) and that the indicators are tau-equivalent
(equal loading magnitudes on the construct). Unfortunately, most applications of latent growth curve models are specified to the manifest variable
representation of the construct (Figure 3). This simplified growth curve
model suffers from the same problems as manifest-variable ANOVA or
ANCOVA regarding the tenability of the assumptions. On the other hand, a
latent variable approach as reflected in Figure 2 directly assesses the factorial
invariance assumption and makes no assumption on the nature of the items
representing a construct. Thus, an emerging trend is the use of a rigorous
measurement model to undergird the application of latent variable growth
curve models. The effects-coded method of scaling and identification (Little,
Slegers, & Card, 2006) allows the mean and variance of the growth curve
factors to be estimated in the metric of the measurement scale, thereby
producing accurate population estimates of the intercept and slope values
in a nonarbitrary metric.
A common specification dilemma is whether to constrain the residual
variances of the indicators of the latent growth curve constructs to be equal
Longitudinal Data Analysis
9
across time (i.e., the stationarity assumption or whether the residual information should be allowed to vary across the measurement time points. The
(weak) stationarity assumption, as defined in classical time series analysis,
also assumes constant means over time and constant correlations between
observations over a given lag. These assumptions are often not tested in
structural equation modeling. Fundamentally, the stationarity assumption
is an assumption that the process underlying one’s data is not changing as it
is being measured. If this is the case, changes in the distributional properties
within and between observations are not expected to occur. As Little (2013)
explains, when a growth curve model is specified using manifest constructs
at each time point (as in Figure 3), the stationarity assumption may not be
tenable because the residual variances are the sum of the random error and
the time-specific variance. When the latent growth curve model is specified
using the latent variable approach (as in Figure 2), the time-specific variance
of the construct at each time point is separated from the error information.
Testing the stationarity of this time specific residual is, thereby, a reasonable
and unconfounded test (Figure 4) (Little, 2013).
Some hybrid models that attempt to combine elements of a panel model
and a latent growth curve model have been introduced in the literature,
1
α1
ψ1,1
α2
ψ2,2
ψ2,1
Intercept
λ1,1
λ2,1
Slope
λ4,1
λ3,1
λ1,2
λ2,2
λ3,2
λ4,2
Time 1
Time 2
Time 3
Time 4
θ1,1
θ2,2
θ3,3
θ4,4
τ1
τ2
τ3
τ4
1
Figure 4 A simple growth curve model with parameter labels. Note. The
parameters denoted with the Greek letter lambda are fixed to specific values to
provide the basis loadings that are used to define the meaning of the two factors.
Source: (from Little, 2013). Little, T. D. (2013 ). Longitudinal structural equation
modeling. New York, NY: Guilford.
10
EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES
but these designs have been highly questioned in their utility; thus, their
application is trending down (Little, 2013). The primary problem with
such models is that they attempt to fit a model for the mean structures,
which uses the variance–covariance information to estimate variability
in the mean–structure parameters (the growth curve model). When the
covariance–structures information is simultaneously used to estimate
stability and change relationships among the measured constructs the
mean–structures information appears to become biased. That is, in some
cases the mean–structures model simultaneously competes for information
that the panel model needs to accurately reflect the parameters of that
model. This issue is complicated by the fact that the latent parameters
labeled “intercept” and “slope” are not in fact latent intercepts and slopes,
but rather a combination of parameters including the autoregressive effect
(Hamaker, 2005), and do not lend themselves to the interpretations expected.
In summary, considerable future research is still needed to provide clearer
guidance on using a latent growth curve model for addressing a longitudinal
research question. Future directions included improvements in model specification guidelines and model fit evaluation.
CUTTING-EDGE RESEARCH
TIMING OF MEASUREMENTS AND TEMPORAL DESIGN
An emerging trend is the treatment of measurement intervals and how time
can be represented in various longitudinal models. For example, Selig et al.
(2012) recently introduced the lag as moderator model to capture the fact that
measurement intervals often vary at the level of the individual and this variance in interval can have implications for understanding the nature of a longitudinal association. Very simply, the lag as moderator model is applicable
when the interval between measurement occasions is not precisely the same
for all individuals. Quite often, a stated goal of, say, 6 months is made. The
actual intervals, however, can vary by days, weeks, and sometimes months
given all the logistical issues in capturing an exact 6-month interval between
measurement occasions. The differences in lag can be coded and included as
a measured predictor in a longitudinal model in order to assess the degree
that variability in interval lag influences the estimated strength of an association between any two variables in a longitudinal model. The newly coded
lag variable is simply included as part of an interaction term to examine the
moderation of an association by the lag in measurements (for details, see
Selig et al., 2012; similar information is available in continuous time models,
Deboeck & Preacher, in press).
Longitudinal Data Analysis
11
Lag information can also be used to refine the estimates of growth in
a growth curve modeling framework. The software package, Mplus, for
example, has “t” scores, which function as definition variables do. Here, the
time lag around a target data collection date can be added or subtracted
from the target date for each individual. This score, which represents, the
true time interval for a given person is assigned to a given basis weight
loading. This assigned value allows the true variance in the interval of
measurement to be the fixed value for each person. Because the fixed value
of the basis weight changes for each individual, the true change in the scores
is estimated more accurately.
Often, the interval of measurement for each person is not controlled by the
investigator. This selective influence needs to be considered when the effect
of lag is interpreted. For example, the lag in measurement could be related
to a factor such as income in that more affluent persons participate later than
less affluent persons, particularly when monetary incentives for each measurement occasion are provided. Or, the lag in measurement could reflect
conscientiousness of the participant wherein very conscientious persons participate earlier than scheduled than lower conscientious persons. If measurement lag is more happenstance and related more to the researcher and less
to the participant’s discretion, the more likely measurement lag would be
unconfounded with an alternative interpretation. To the degree possible, reasons for the deviation from the target lag can be coded and used as potential
time-varying covariates.
The lag in measurement can become a key design element for a given study.
Here, the lag for a measurement interval would be randomly assigned, which
would allow the investigator to code lag as mentioned earlier and use lag to
examine the potential moderation of an effect between any two variables in
a longitudinal model.
Intensive within Person Modeling. A final emerging trend is in the area
of intensive modeling of a given individual. Although such modeling
approaches have a long history (Cattell, 1952), their recent emergence has
been spurred by both methodological advances and theoretical refinements
that are concerned with dynamic growth and change. Originally discussed
under the rubric of p-technique factor analysis and dynamic p-technique,
these models are multivariate time-series techniques that utilized latent variable modeling techniques as applied to a person’s or set of persons’ intensive
repeated measurements. These models are also covered under the general
rubric of state-space modeling, and applying differential equation models
(i.e., continuous time models) is also expanding (e.g., Boker, Leibenluft,
12
EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES
Deboeck, Virk, & Postolache, 2008, Deboeck, 2011; Deboeck & Bergeman,
2013).
A key motivation behind the advances in intensive per person models is
the issue of ergodic generalizability (Molenaar, 2004; Molenaar & Campbell,
2009). Ergodic theory, an area of mathematics, focuses on the degree to which
relationships among variables that are identified in the population generalize to the level of the person. The conditions in which such generalizability
is warranted are extremely limited. Population parameter estimates derived
from a sample of persons will generalize to a given person in the population
when persons in the population are homogeneous and when the modeled
process is time invariant (i.e., stability of the process can be assumed). If
the population has known or unknown heterogeneity in it, or if the change
process is not time or age universal, then identified relationship will not generalize to a given person. The results of such analyses are not uninformative,
but they do possess limited generalizability for practitioners and providers
who must use knowledge to make person-level decisions.
Intensive per person modeling has evolved as a way to build models that
characterize a given individual. When a set of individuals are examined in
an intensive per person approach, the ideographic results of model can be
compared across individuals to test (quite powerfully) which parameters of
a model are invariant across persons and which are unique to one or some
subset of the persons. The balance between a fully ideographic and fully
nomothetic universe of generalizability can be achieved through a shift in
focus: collect more data on fewer persons and using the modeling procedures described here will allow both nomothetic generalization as well as
provide an estimate of the frequency and nature of ideographic exceptions.
Big sample issues of invariance, model fit/evaluation, power, and the like are
in play when intensive per person data are modeled. The application of some
of these issues, however, can shift. For example, Nesselroade and colleagues
have introduced the idea of the ideographic filter. Here, the “measurement
model” for a given person is allowed to vary to capture the ideographic
characteristics of a given person. Invariance is then tested and examined as
expectations about the associations among the constructs that are estimated
for each person.
KEY ISSUES FOR FUTURE RESEARCH
Although big data and machine learning are emerging as trends in the
analysis of the massive amounts of data that are now easily generated by
technologically enhanced data collection protocols, the models and design
issues that we have highlighted here are still quite relevant for advancing
Longitudinal Data Analysis
13
our understanding of longitudinal processes. The emerging trends highlighted in each category are still in need of further refinement and guidance.
For example, most of the models that we highlighted currently rely on
maximum likelihood estimation of the model parameters. Advances in
Bayesian estimation procedures will likely add even greater precision and
power for these models. Researchers whose research domain has matured
to a level of complex theorization will benefit from the advances in the measurement, design, and analysis procedures we have highlighted here. The
prospective and original data collection efforts of tomorrow’s longitudinal
studies will contain a confluence of the ideas we have highlighted here.
The findings from big data explorations that are becoming popular will
need to have further testing and refinement from future studies. Bringing
sophistication to the prospective original research of the future will yield
results that maximize generalizability at the level of persons, subgroups, and
time.
Each model we described can also be estimated in the context of hierarchically nested data structures or a mixture distribution framework to identify
unobserved heterogeneity. Future directions here include developing software and measurement tools to assess critical characteristics at all levels of a
hierarchically nested data structure and include more levels in the analysis
model. The statistical and mathematical theory is developed—unfortunately,
the software tools that fit such models have lagged behind but they are evolving. For example, the mixture distribution tools that are available will continue to be refined to give better guidance on how many groups to extract
and how the parameters of such models can be interpreted.
ACKNOWLEDGMENT
This work was supported by grant NSF 1053160 (Wei Wu & Todd D. Little,
co-PIs).
REFERENCES
Bergstrom, A. R. (1990). Continuous time econometric modeling. New York, NY: Oxford
University Press.
Boker, S. M., Leibenluft, E., Deboeck, P. R., Virk, G., & Postolache, T. T. (2008).
Mood oscillations and coupling between mood and weather in patients with rapid
cycling bipolar disorder. International Journal of Child Health and Human Development, 1(2), 181–202.
Bollen, K. A. (1989). Structural equations with latent variables (Wiley Series in Probability and Mathematical Statistics.). New York, NY: Wiley.
Cattell, R. B. (1952). The three basic factor-analytic research designs—their interrelations and derivatives. Psychological Bulletin, 49, 499–551.
14
EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES
Deboeck, P. R. (2011). Modeling non–Linear dynamics. In M. R. Mehl & T. S. Conner (Eds.), The handbook of research methods for studying daily life (pp. 440–458).
New York, NY: Guilford Press.
Deboeck, P. R., & Bergeman, C. S. (2013). The reservoir model: A differential equation
model of psychological capacity. Psychological Methods, 18(2), 237–256.
Deboeck, P. R., & Preacher, K. J. (in press). No need to be discrete: A method for
continuous time mediation analysis. Structural Equation Modeling. Manuscript submitted for publication.
Garnier-Villarreal, M., Rhemtulla, M., & Little, T. D. (2014). Two-method planned
missing designs for longitudinal research. International Journal of Behavioral Development, 38, 411–422. doi:10.1177/0165025414542711
Gollob, H. F., & Reichardt, C. S. (1987). Taking account of time lags in causal models.
Child Development, 58, 80–92.
Gollob, H. F., & Reichardt, C. S. (1991). Interpreting and estimating indirect effects
assuming time lags really matter. In L. M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions
(pp. 243–259). Washington, DC: American Psychological Association.
Hamaker, E. L. (2005). Conditions for the equivalence of the autoregressive latent trajectory model and a latent growth curve model with autoregressive disturbances.
Sociological Methods Research, 33, 404–416.
Jorgensen, T. D., Rhemtulla, M., Schoemann, A. M., McPherson, B., Wu, W., &
Little, T. D. (2014). Optimal assignment methods in three-form planned missing
data designs for longitudinal panel studies. International Journal of Behavioral Development, 38(5), 397–410.
Little, T. D. (2013). Longitudinal structural equation modeling. New York, NY: Guilford.
Little, T. D., Slegers, D. W., & Card, N. A. (2006). A non-arbitrary method of identifying and scaling latent variables in SEM and MACS models. Structural Equation
Modeling, 13, 59–72. doi:10.1207/s15328007sem1301_3
McCrorie, J. R., & Chambers, M. J. (2006). Granger causality and the sampling of
economic processes. Journal of Econometrics, 132, 311–336.
Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology, this time forever. Measurement, 2,
201–218.
Molenaar, P. C. M., & Campbell, C. G. (2009). The new person-specific paradigm in
psychology. Current Directions in Psychological Science, 18, 112–117.
Nesselroade, J. R. (1984). Concepts of intraindividual variability and change: Impressions of Cattell’s influence on lifespan development psychology. Multivariate
Behavioral Research, 19, 269–286.
Nesselroade, J. R. (1991). Interindividual differences in intraindividual change. In L.
M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change. Washington,
DC: American Psychological Association.
Oud, J. H. L. (2007). Continuous time modeling of reciprocal relationships in the
cross-lagged panel design. In S. Boker & M. Wenger (Eds.), Data analytic techniques
for dynamical systems in the social and behavioral sciences (pp. 87–129). Mahwah, NJ:
Lawrence Erlbaum Associates.
Longitudinal Data Analysis
15
Oud, J. H. L., & Jansen, R. A. R. G. (2000). Continuous time state space modeling of
panel data by means of SEM. Psychometrika, 65, 199–215.
Preacher, K. J., Wichman, A. L., MacCallum, R. C., & Biggs, N. E. (2008). Latent growth
curve modeling. Thousand Oaks, CA: Sage.
Schoemann, A. M., Gallagher, M. W., & Little, T. D. (2015). Difference scores. In R.
Cautin & S. Lilienfeld (Eds.), Encyclopedia of clinical psychology. New York, NY:
Wiley-Blackwell.
Selig, J. P., Preacher, K. J., & Little, T. D. (2012). Modeling time-dependent association
in longitudinal data: A lag as moderator approach. Multivariate Behavioral Research,
47, 697–716. doi:10.1080/00273171.2012.715557
Voelkle, M. C., Oud, J. H. L., Davidov, E., & Schmidt, P. (2012). An SEM approach to
continuous time modeling of panel data: Relating authoritarianism and anomia.
Psychological Methods, 17, 176–192.
Wu, W., West, S., & Taylor, A. (2009). Evaluating model fit for growth curve models:
Integration of fit indices from SEM and MLM frameworks. Psychological methods,
14, 183–201.
Wu, W., & West, S. (2010). Sensitivity of fit indices to misspecification in growth curve
models. Multivariate Behavioral Research, 45, 420–452.
Wu, W., & West, S. G. (2013). Detecting misspecification in mean structures for
growth curve models: performance of pseudo R2 and concordance correlation
coefficients. Structural Equation Modeling, 20, 455–478.
TODD D. LITTLE SHORT BIOGRAPHY
Dr. Todd D. Little is a Professor of Quantitative Methods, Director of the
Institute for Measurement, Methodology, Analysis, and Policy (IMMAP)
at Texas Tech University. Little has worked at the Max Planck Institute
for Human Development’s Center for Lifespan Studies (1991–1998) and
Yale University’s Department of Psychology (1998–2002) and at the University Kansas (2002–2013). In 2001, Little was elected to the Society for
Multivariate Experimental Psychology. In 2009, he was elected President
of APA’s Division 5 (Evaluation, Measurement, and Statistics). Little is a
Fellow in AAAS as well as APA and APS. He organizes and teaches in
the internationally renowned “Stats Camps” each June (see statscamp.org
for details). In 2009, he received the W.T. Kemper award for excellence in
Teaching at KU and in 2013 he received the Cohen award for distinguished
contributions for teaching and mentorship from APA’s Division 5. He is the
author of Longitudinal Structural Equation Modeling (2013).
PASCAL DEBOECK SHORT BIOGRAPHY
Dr. Pascal Deboeck is an associate professor of Quantitative Psychology and
research affiliate in the Center of Research Method and Data Analysis at
16
EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES
the University of Kansas. He completed his PhD in Quantitative Psychology at the University of Notre Dame in 2007. His research focuses on the
development and application of methods for the analysis of intraindividual
time series. In particular, he has worked to develop and apply derivatives,
differential equation modeling, and dynamical systems concepts to intraindividual time series.
WEI WU SHORT BIOGRAPHY
Dr. Wei Wu is an assistant professor of Quantitative Psychology and research
affiliate in the Center of Research Method and Data Analysis at the University
of Kansas. She completed her PhD in Quantitative Psychology from Arizona
State University in 2008. Wu is a specialist in structural equation modeling,
longitudinal data analysis, and missing data estimation. She currently serves
as co-PI on a grant funded by NSF developing and evaluating planned missing data designs in longitudinal studies.
RELATED ESSAYS
Social Epigenetics: Incorporating Epigenetic Effects as Social Cause and
Consequence (Sociology), Douglas L. Anderton and Kathleen F. Arcaro
To Flop Is Human: Inventing Better Scientific Approaches to Anticipating
Failure (Methods), Robert Boruch and Alan Ruby
Repeated Cross-Sections in Survey Data (Methods), Henry E. Brady and
Richard Johnston
Ambulatory Assessment: Methods for Studying Everyday Life (Methods),
Tamlin S. Conner and Matthias R. Mehl
Models of Nonlinear Growth (Methods), Patrick Coulombe and James P.
Selig
Quantile Regression Methods (Methods), Bernd Fitzenberger and Ralf
Andreas Wilke
The Evidence-Based Practice Movement (Sociology), Edward W. Gondolf
Meta-Analysis (Methods), Larry V. Hedges and Martyna Citkowicz
The Use of Geophysical Survey in Archaeology (Methods), Timothy J.
Horsley
Network Research Experiments (Methods), Allen L. Linton and Betsy Sinclair
Structural Equation Modeling and Latent Variable Approaches (Methods),
Alex Liu
Data Mining (Methods), Gregg R. Murray and Anthony Scime
Remote Sensing with Satellite Technology (Archaeology), Sarah Parcak
Quasi-Experiments (Methods), Charles S. Reichard
Digital Methods for Web Research (Methods), Richard Rogers
Longitudinal Data Analysis
17
Virtual Worlds as Laboratories (Methods), Travis L. Ross et al.
Modeling Life Course Structure: The Triple Helix (Sociology), Tom Schuller
Content Analysis (Methods), Steven E. Stemler
Person-Centered Analysis (Methods), Alexander von Eye and Wolfgang
Wiedermann
Translational Sociology (Sociology), Elaine Wethington