Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Aylward, G. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Aylward, G. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Journal of Pediatric Psychology, Vol. 27, No. 1, 2002, pp. 37-45
© 2002 Society of Pediatric Psychology

Methodological Issues in Outcome Studies of At-Risk Infants

Glen P. Aylward, PhD

School of Medicine, Southern Illinois University

All correspondence should be sent to Glen P. Aylward, SIU School of Medicine, Dept. of Pediatrics, P.O. Box 19658, Springfield, Illinois 62794-9658. E-mail: gaylward{at}siumed.edu .


    Abstract
 Top
 Abstract
 Introduction
 Conceptualization and Design...
 Subject Populations
 Procedural Issues
 Measurement/Outcome
 Conclusions
 References
 
Objective: To identify methodologic problems found in follow-up studies of infants at biologic and environmental risk and provide solutions and recommendations.

Methods: This article is a literature review.

Results: Problems fall into four groupings: (1) conceptualization/design issues, 2) subject population concerns, 3) procedural issues, and 4) measurement/outcome concerns.

Conclusions: Main-effect models are not useful; confounding and mediating variables must be identified. In addition, the following are needed: alternative analytic techniques, more precise subject selection and characterization of risk factors, geographically defined samples, broadened scope of outcome measures, and use of epidemiologic techniques.

Key words: developmental outcome; follow-up; high risk; infants; methodology.


    Introduction
 Top
 Abstract
 Introduction
 Conceptualization and Design...
 Subject Populations
 Procedural Issues
 Measurement/Outcome
 Conclusions
 References
 
Pediatric and clinical child psychologists are increasingly involved in follow-up of infants at biological and environmental risk (Aylward, 1997bGo). This is due in part to declining mortality rates and the resultant expanding numbers of surviving infants at increased risk for developmental morbidity. In addition to a focus on major disabilities (severe mental retardation, cerebral palsy, sensory impairments, and epilepsy), interest has also focused on the evaluation of more subtle, high prevalence/low severity dysfunctions: learning, attention, and behavioral problems; borderline intelligence; poor visual motor integration; deficits in spatial relations; reading and mathematics problems; language disorders; and deficits in "executive" behaviors.

Unfortunately, these follow-up studies often contain methodologic problems that compromise findings. Approximately 10 years ago, we undertook a meta-analysis of 80 follow-up studies published over the preceding decade (Aylward, Pfeiffer, Wright, & Verhulst, 1989Go) and pooled results of 4,006 infants < 2500 g or smaller and 1,568 controls. Eleven major problems were identified in these low birthweight follow-up studies. Similar problems have been cited in more recent reviews (McCormick, 1997aGo; Tyson & Broyles, 1996Go) with the inclusion of several recurrent issues: (1) use of gestational age versus birthweight, (2) changes in test instruments, (3) use of neuropsychological "batteries," (4) changes in medical procedures (e.g., surfactant, steroids, ventilation), (5) application of sensitivity and specificity measures, and (6) the need to include quality of life measures. Comparability across studies is further reduced because of a lack of central focus or framework for actual data collection due to the diversity of purposes for follow-up (Johnson, 1997Go).

The following discussion highlights problems found in follow up of at-risk infants and contains suggestions for improvement. These problems fall into four broad areas: (1) conceptualization/design issues, (2) subject populations, (3) procedural issues, and (4) measurement/outcome.


    Conceptualization and Design Issues
 Top
 Abstract
 Introduction
 Conceptualization and Design...
 Subject Populations
 Procedural Issues
 Measurement/Outcome
 Conclusions
 References
 
Cause-Effect Inferences
In developmental follow-up studies, cause-effect inferences must be tempered by alternative explanations of observed effects that could be produced by confounding influences. Random assignment is not possible in studies of "naturally occurring" conditions such as extremely low birthweight (ELBW). In most follow-up studies, the predominant conceptualization of causal inference is (1) a condition (e.g., ELBW, drug exposure) leads to (2) some type of neurodevelopmental outcome. Unfortunately, this simplistic attribution of causality often is flawed on two ends.

First, multiple factors are associated with conditions such as ELBW (McCormick, 1997aGo). These include severity of neonatal course (days in hospital, other conditions), sociodemographic factors (socio-economic status [SES] and social support, race), subsequent illness (asthma, hospitalizations), maternal physical and mental health, and environmental exposures to positive and negative experiences (lead, smoking in household, intervention). In fact, some suggest birthweight is best conceptualized as a marker of concomitant factors that influence outcome. Second, at the outcome end there are neurodevelopmental, cognitive, behavioral, health, and social issues. Various studies have shown that front-end factors that influence outcome vary, depending on the time of assessment, type of early risk factor, and type of outcome measured. In sum, the situation is far more complex than a main-effect, "A->B" model.

Spurious correlations may add much uncertainty to the model; this is particularly concerning given that the data in most outcome studies are correlational or descriptive. Depending on interpretation, Type I or Type II errors could result. Inclusion of confounding variables will reduce the error term and thereby decrease Type I errors. Moreover, measures selected to represent potential confounds must be reliable and valid in and of themselves, because measurement error in the control variables may detract from the validity of any causal inferences that can be drawn (Jacobson & Jacobson, 1996Go). When a perinatal variable and a potential confound such as environmental risk are included in multivariate outcome analysis, a portion of the variance will be attributed to the more reliable predictor solely because it was measured more accurately. Conversely, even if a confounding variable such as environment is very influential, if it is measured inaccurately, its impact will be underestimated. In general, unreliable measurement at the front end may produce Type II errors, whereas improper measurement of a confounding outcome variable may increase variability and produce spurious correlations (Type I error). Type II error is a particular problem when investigators fail to detect subclinical behavioral or developmental deficits because of insensitive test instruments.

This situation argues for use of measures with demonstrated reliability and validity. Moreover, care must be taken to separate potential confounding from mediating variables. Although both may reduce the attributable influence of a particular perinatal variable on outcome, interpretation of this influence depends on a priori categorization of which variables theoretically are expected to function as mediators and which as potential confounders; treatment of a mediator as a confounding variable may lead to the incorrect inference of a spurious correlation or a Type II error (e.g., Jacobson & Jacobson, 1996Go). Both may be tested by adding a control variable to the multivariate analysis. However, in the case of VLBW and environmental influences, interpretation of a reduction in the effect of birthweight on some outcome measure after environment is added depends on the hypothesized function of this control variable (environment).

Selection of control variables should be determined both on a conceptual basis and by univariate correlations that indicate at least a weak relationship between the variable and the outcome measure of interest. Multiple regression (stepwise and hierarchical) is often useful, but is problematic when multicollinearity exists due to highly correlated predictor variables, or when many correlated outcome variables are measured. Structural equation modeling (interrelations among composite "latent" variables are derived from multiple measures), partial least squares methods (sometimes considered a variant of structural equation modeling; permits detection of basic underlying patterns of association between constructs [Carmichael-Olson, Streissguth, Bookstein, Barr, & Sampson, 1994Go]), the use of LISREL in path analyses (how well a hypothesized model fits the actual data), and growth curve modeling are recognized techniques used to delineate more specifically the role of confounding and mediating variables (Keith, 1993Go; Landry, Smith, Miller-Loncar, & Swank, 1997Go). Growth modeling is of particular interest: individual differences in development are examined in terms of rate of change (slope) and changes in the actual rate of change (curvature); this technique could allow determination of whether mediating variables differentially affect growth of cognitive or other abilities.

Control Groups
When inferences are made regarding the outcome of infants from an identified group or those receiving a particular medical intervention, these infants should be compared to some other group to make such inferences meaningful (Kiely & Paneth, 1981Go). Traditionally, a full-term control group is used, drawn from similar geographic and social circumstances. However, choice of the type of comparison group depends on the purpose of the study and the hypotheses being tested. For example, when considering the incidence of disability in infants born at <800 g, use of a full-term comparison group would not be very informative. A comparison group of infants with birthweights between 800 and 1000 g, drawn from the same population, would be more appropriate. Determination of a control group when evaluating the efficacy of a new procedure is more straightforward.

In the case of ELBW infants, within-group comparisons could be employed, based on arrays of medical/biologic pre- and perinatal factors, or contrasts between those who have done well on a particular outcome measure versus those who have not. Sample stratification can be used when a high degree of confound exists between a particular perinatal variable and one or more background variables. This has been successfully accomplished in studies where medical risk and biological risk are dichotomized (high/low), thereby yielding four possible stratifications. "Oversampling" of infants manifesting a condition under consideration that occurs less frequently (e.g., Grade IV intraventricular hemorrhage [IVH]), also assists in comparisons and decreases the possibility of Type II errors. Conversely, oversampling may be misleading if one were to consider the impact of the risk factor on the overall population.


    Subject Populations
 Top
 Abstract
 Introduction
 Conceptualization and Design...
 Subject Populations
 Procedural Issues
 Measurement/Outcome
 Conclusions
 References
 
Birthweight and Gestational Age
Prior to the 1990s, infants were primarily grouped by birthweight versus gestational age because of the uncertainty of the obstetric estimation of gestational age and the questionable utility of the postnatal assessment, particularly in very small infants (Hack & Fanaroff, 1988Go). However, fetal ultrasound has improved gestational age estimation, and gestational age is a stronger determinant of organ/system maturation and viability than is birthweight. Moreover, infants of very low birthweight may be (1) extremely premature babies (gestational age) with AGA birthweights, (2) less premature babies with SGA (<3rd percentile) birthweights, or (3) older pre-term and term infants with extreme SGA birth weights (Touwen, 1986Go). This distinction is necessary because the ultimate survival and outcome of infants included in these groups can vary markedly. Therefore, both birthweight and gestational age should be considered in outcome studies, with particular care taken to ensure that only AGA infants are included in specific birthweight categories if this is the benchmark used for grouping subjects.

Medical/Biologic Risk
More precise characterization of the neonatal medical experience or biologic risk is necessary to compare outcomes across hospitals, for benchmarking, and to control for population differences. Various risk scores and neonatal admission severity scores for physiologic status and intensity of therapeutic intervention have been developed (e.g., Pollack et al., 2000Go). These scores may be used to stratify sub-groups or can be controlled statistically in regression or ANCOVA analyses. Because the three major sources of morbidity in the neonatal period are intracranial events, pulmonary immaturity, and infections (McCormick, 1989Go), severe ultrasound abnormality, septicemia, necrotizing enterocolitis, chronic lung disease/bronchopulmonary dysplasia (BPD), hyperbilirubinemia, apnea of prematurity, retinopathy of prematurity, and indicators of asphyxia, such as seizures, should be included in the selected risk index. The number of days of hospitalization after birth is often used as a marker variable; however, it is disporportionately affected by smaller infants. If used, this variable should be adjusted for birthweight or gestational age.

Medical/biologic variables fall into three broad areas: admission status (how "sick" the infant is, typically measured by variables such as birthweight, gestational age, Apgar score), medical response or intervention (ventilation, tertiary/secondary level of care), and sequelae at discharge (need for oxygen, neurosensory deficit, chronic illness). Each of these areas should be considered in follow up. Postdischarge medical status should also be noted, as subsequent hospitalizations are associated with lower verbal, visual-perceptual and visual motor scores, and less positive teacher ratings.

Sources of Bias
Samples. Small, single-hospital samples may yield data with limited applicability because of the variations in routine medical care. For example, the incidence of cerebral palsy can vary fourfold between different neonatal intensive care units (NICUs), and outcomes may differ in terms of whether the NICU is located in a hospital with a training program (marker for teaching hospital) and the volume of babies admitted (proxy for experience) (McCormick, 1997bGo). Although use of control groups drawn from the same hospital population can minimize this effect to some degree, pooling data from a geographically defined sample is more appropriate. Geographically defined studies are sounder because the numbers are larger, inferences are more secure, and hospital selection bias is minimized. Regional data, or those derived from nationwide collaborative networks, are most useful (Vermont-Oxford Trials Network, 1993Go). The importance of proper selection of the patient population cannot be underestimated, as the incidence of any outcome strongly depends on the "denominator" (i.e., study population) used (Escobar, Littenberg, & Petitti, 1991Go).

Age Cohort. The age cohort is important due to rapidly evolving changes in medical interventions (Hack & Fanaroff, 1999Go). For example, 30- to 40-year-old data on asphyxia obtained from the National Perinatal Collaborative Study have questionable relevance today. In terms of contemporary long-term follow-up, by the time school-age data on a particular cohort are collected and analyzed, practice changes in treatment may have occurred (e.g., assisted ventilation in the delivery room, surfactant, and prenatal and postnatal steroids). This argues for clear delineation of medical practices at the time of enrollment in follow-up studies and timely data analyses.

Subject Loss. Subject loss can bias the estimation of rate of handicap in follow-up studies (Tyson & Broyles, 1996Go). Dropout rates as high as 40% to 50% have been reported over the first year in indigent populations. Risk for dropout increases in larger, less sick babies; those from lower SES households; babies born to single, young mothers; and those not born at a tertiary care hospital. Caretakers of infants with identified problems or disabilities are more compliant with regard to follow-up attendance (Aylward, Hatcher, Stripp, Gustafson, & Leavitt, 1985Go; Campbell et al., 1993Go), thereby potentially inflating rates of disability in samples with a high dropout rate. Subject loss of 10% per year should be anticipated, this arguing for power analyses to secure ample subject samples. In addition, the convergent validity of other potentially useful data, such as those provided by home health visitors, primary care physicians, and parent report, should be explored as a means of reducing subject loss (Johnson, 1997Go).


    Procedural Issues
 Top
 Abstract
 Introduction
 Conceptualization and Design...
 Subject Populations
 Procedural Issues
 Measurement/Outcome
 Conclusions
 References
 
Environmental Factors
The Hollingshead Index (1975Go) was used in approximately a third of the meta-analytic studies, whereas maternal education was the most frequently used single marker variable. However, many children are exposed to both biologic and environmental risk, and this combination is sometimes referred to as "double jeopardy" or "double hazard" (Escalona, 1982Go; Parker, Greer, & Zuckerman, 1988Go). Here, nonoptimal biologic and environmental risks work synergistically to affect later functioning (Aylward, 1990Go, 1992Go). However, there is a ceiling effect in which a severe biologic risk will minimize environmental influences. Stated differently, the sickest infants are least responsive to environmental influences (Aylward, 1996Go; Bendersky & Lewis, 1994Go).

SES (maternal education and occupational status) is an insufficient marker for environmental quality. Social support, which includes tangible components (e.g., housing) and intangible components (attitudes, encouragement), should also be considered. The environment involves both process (proximal aspects experienced most directly; mother-infant interaction) and status features (distal and broader, involving aspects experienced more indirectly; social class; location of residence). Process or proximal environmental variables are more predictive early on; status or distal factors are more predictive later (Aylward, 1992Go). Environmental effects become increasingly apparent between 18 and 36 months, with 24 months cited frequently. Environmental variables influence verbal and general cognitive outcome whereas medical/biologic factors are more strongly related to neurologic and perceptual-performance function (Aylward, 1996Go; Bendersky & Lewis, 1994Go). Medical/biologic factors tend to determine whether a developmental problem occurs, but environmental factors temper or exacerbate the degree of problem (Hunt, Cooper, & Tooley, 1988Go).

Negative components of the environment have a synergistic or additive effect on infants who are biologically vulnerable vis-à-vis the transactional (Sameroff & Chandler, 1975Go) or "risk-route" models (Aylward & Kenny, 1979Go). Procedurally, infants can be stratified on some environmental measure (e.g., by quartiles), or environmental effects can be partialled out in statistical procedures. If possible, process and status aspects need to be measured. Because of the changing complexity and composition of contemporary environments, valid, more recently developed measures comparable across studies and administered quickly should be employed (see Aylward, 1997).

Correction for Prematurity
The consensus is that correction for prematurity should occur, arguably up to 2 years of age (Hunt & Rhodes, 1977Go). However, some investigators suggest that correction not be utilized or that that it be applied in an incremental fashion (e.g., half correction), depending on the infant's gestational age, age at time of measurement, and area of function being assessed (Blasko, 1989Go); Miller, Debowitz, & Palmer, 1984). Arguments for incremental correction currently are not convincing. Imprecise gestational age estimation, concomitant medical issues, and a lack of consensus whether to correct to 37 or 40 weeks are additional confounds. Until a "correction algorithm" is devised, correction through 2 years is recommended.


    Measurement/Outcome
 Top
 Abstract
 Introduction
 Conceptualization and Design...
 Subject Populations
 Procedural Issues
 Measurement/Outcome
 Conclusions
 References
 
Selection of Outcome Measures
Because there is no true "gold standard" in developmental assessment, terms such as "sensitivity" and "specificity" are misapplied. Instead, "co-positivity" and "co-negativity" are more appropriate in situations where scores on one test are compared to those obtained on a reference standard. The Bayley Scales of Infant Development (BSID; Bayley, 1969Go) and the more recent BSID-II (Bayley, 1993Go) are traditionally considered the best criterion measures. However, analogous to changes in medical procedures having an effect on outcome studies, changes in outcome measures have a similar effect by limiting comparisons. Mean IQ/DQ scores on a given test are estimated to increase 3 to 5 points per decade (Flynn, 1999Go). Therefore, the mean score of a test developed three decades ago conceivably could increase by as much as 15 points. Such adjustments must be considered when comparing scores in different age cohorts or longitudinally with different versions of the same test, because change could be due to real improvement or be an artifact of the different properties of the two instruments. Therefore, use of outdated tests such as the Stanford-Binet Form LM in current protocols, is not appropriate, nor is it useful to compare this with the more contemporary version of the same test.

Additional controversy surrounds the BSID-II itself (Gauthier, Bauer, Messinger, & Closius, 1999Go; Matula, Gyurke, & Aylward, 1997Go; Ross & Lawson, 1997Go; Washington, Scott, Johnson, Wendel, & Hay, 1998Go). If corrected age is used to determine the beginning item set, scores tend to be lower because the child is not automatically given credit for passing the earlier item set. The potential to generate several alternative developmental index scores may limit comparability across studies that use BSID-II scores in research protocols.

Length/Duration of Follow-Up
A minimum of at least 3 years' follow-up appears necessary to identify problems of moderate severity and to measure IQ. However, subtler, high prevalence, low severity learning difficulties may not become apparent until later, making follow-up into early school age the most desirable practice. School entry is also an attractive end point because health and other problems can be better defined at this age (McCormick, 1989Go; Vohr & Msall, 1997Go). If the end point of follow-up is in itself a time of significant change and variability (such as 12 months of age), outcome measurement might be further compromised (e.g., the child who walks at 15 versus 12 months). In such situations, it is difficult to separate a delay, disorder, or deficit.

Selection of Outcomes
Traditionally, major handicaps were the primary focus in outcome studies. Interest then shifted toward more subtle learning, attention, and behavioral dysfunctions, and borderline IQ. Most recently, there has been a major emphasis on a broader, multi-dimensional conceptualization of outcome and health, including functional abilities, health status, and health-related quality of life (HRQL; McCormick, 1989Go, 1997aGo; Saigal et al., 1996Go). Children at early biologic risk subsequently have poorer health (e.g., bronchopulmonary dysplasia), related restrictions in ability to engage in usual childhood activities, slower physical growth, and poorer social-emotional development—all of which are not "traditional" morbidity measures yet translate into compromised school performance and other sequelae. As mentioned previously, documenting the child's health postdischarge is also critical (Tyson & Broyles, 1996Go).

To evaluate outcome more precisely, in addition to "traditional" measures, profiles of the following areas need to be documented: health status; physical issues/limitations due to health; functional status or quality of life including adaptive behavior and day-to-day living; behavioral problems; social competency; gross, fine, and visual motor skills; and academics. Emphasis on functional measures is relatively new, due in part to difficulty in defining and measuring functional limitations and then relating these limitations to performance status at school, home, and the community (Msall, DiGaudio, & Duffy, 1993Go). However, during infancy and early childhood, parents must act as a proxy for the child, thereby increasing the possibility of bias (Hack, 1999Go).

As a result, issue-specific outcome measures should be folded into a basic outcome framework that could be compared across studies. This basic framework should include a follow-up protocol with standardized age at assessment, areas covered, and techniques used. However, more study-specific, narrow-band foci could also be employed. This approach would allow for investigation of specific deficits pertinent to the purpose of the follow-up study in conjunction with more "standard" cognitive, behavioral/social, functional, and health-related outcomes that would be of interest across studies (e.g., Taylor, Klein, Schatschneider, & Hack, 1998Go). The challenge is to accomplish this in a reasonable amount of time and at an acceptable cost.

Outcome Analyses
The correlation coefficient often is used in descriptive investigations relating perinatal variables and outcomes, or between two developmental scores obtained at different times. Unfortunately, this statistic is subject to the problem of restriction of range, where a fairly homogeneous distribution of scores can produce a low correlation. Moreover, correlations do not provide information regarding individual developmental patterns. Siegel (1985Go) emphasizes the need to predict ranges of scores, rather than exact scores. Correlations are misleading in that regard, as they assume a level of measurement precision not achieved in psychologic tests, environmental measures, or biomedical variables. Additionally, if risk factors and outcomes have differing distributions, an artificial cap may be placed on correlations and variance.

Because group means may mask individual patterns of cognitive development, and biologic risk groups are heterogeneous in terms of biomedical and sociodemographic variables, cluster analysis is attractive (Koller, Lawson, Rose, Wallace, & McCarton, 1997Go; Liau & Brooks-Gunn, 1993). This technique allows identification of homogeneous subsets of children with similar developmental patterns. These clusters of infants could be compared on variables "internal" to the cluster (e.g., risk or cognitive scores); variables "external" to the clusters (biomedical and sociodemographic) could be compared across clusters (Koller et al., 1997Go). This type of analysis has been used for cognitive development and holds promise with neuromotor, functional, and health outcomes as well.

Other useful outcome analyses relating to measures of effect are derived from developmental epidemiologic studies (Scott, Mason, & Chapman, 1999Go). Here, the interest is on differences in proportions of cases rather than differences in means or variance accounted for. This approach yields qualitatively different information about relationships among risk factors and developmental outcomes than is obtained through more traditional analyses. The risk-ratio, typically used in prospective, longitudinal cohorts, reflects the relative increase in the probability of a negative outcome when the infant experienced a risk condition (e.g., ELBW with IVH versus ELBW without IVH, compared in terms of spastic diplegia). Effect of a risk factor (ELBW and IVH) is compared to some other referent group (ELBW).

The odds ratio is typically used in casecontrolled retrospective studies in which infants are chosen based on whether they exhibit the outcome of interest (spastic diplegia), and data are gathered regarding previous exposure to a risk factor (IVH). The ratio is the increased odds of a negative outcome in infants who experienced a risk factor, relative to those who did not experience the risk factor. This is particularly useful when a condition is relatively rare (e.g., Grade IV IVH). Logistic regression could be employed in this analysis. Both of these techniques are sometimes considered "relative risk," although this is not universally endorsed; if the incidence of an outcome is rare (<2%), the risk and odds ratio values become similar (Scott et al., 1999Go). These measures of effect are inherently different from regression/ANOVA models, as small differences in means between two groups can nonetheless lead to a larger difference in the proportion of extreme cases in these groups (i.e., a factor associated with a small mean decrease in IQ nonetheless may account for a larger number of children with mental retardation).

Receiver operating characteristic (ROC) curves can provide a qualitative measure of a test's diagnostic performance (Centor & Schwartz, 1985Go) or the accuracy of a variable or grouping of variables in predicting outcome. Here the true-positive ratio rate is plotted against the false-positive rate for different threshold values. Points along the diagonal line indicate an equal chance of positive/negative outcome; the higher the ROC curve is from this line, the better the prediction. The area under the curve (AUC) is a quantitative measure of this discrimination, ranging from.5 (chance) to 1.0 (perfect discrimination). This technique has only recently been used in developmental outcome studies (Pollack et al., 2000Go).

Effect sizes need to be reported in outcome studies, as a small p value does not necessarily imply an important finding—it simply indicates the null hypothesis is not true (McCartney & Rosenthal, 2000Go). Traditional p values should be accompanied by estimates of both the size and direction of an effect. Two types of effect size estimates exist: r Family (assessed via correlation) and d Family (comparison of group means). Both are applicable in outcome studies and are more practically useful than binary decisions based on significance/nonsignificance. Effect sizes can be biased by measurement error, methodological choices (e.g., within- versus between-subjects designs) or by minimizing error terms (see McCartney & Rosenthal, 2000Go).

Criteria
There is a lack of consistency in terms of diagnostic criteria. Arguments are made both for and against viewing data as categorical or continuous. For example, in the meta-analysis of LBW studies, had a binary "normal/not normal" categorization been employed, no group differences would have been detected. However, viewing the data in a continuous fashion yielded a 6-point difference between LBW and control infants. It would appear that analyses of continuous data require decisions to include or delete severely involved infants; either option would alter results. Use of categorical methods allows inclusion of these babies but masks more subtle findings. Floor effects and missing data are particularly problematic. "Outliers" whose raw scores cannot be converted into scaled scores (as in the case of a BSID-II score <50) often are "censored" or excluded, and a high frequency of censoring may occur in populations with severely affected infants. As a solution, imputed values may be used (e.g., a score of 49 is recorded to indicate an unscalable score) and data analyzed using standard methods (see Lindsey, O'Donnell, & Brouwers, 2000Go). Means, corrected for censoring, can be compared to means based on inclusion of imputed values to verify that imputation is appropriate. With the BSID-II, extrapolated raw scores may be considered as an alternative (see Black & Matula, 2000Go). It is recommended that the mean IQ (and SD) and effect sizes and confidence intervals for each group, the proportion of mental retardation and borderline intelligence, and the proportion of major disorders (CP, blind, deaf) be reported. Comparisons excluding children with major handicaps provide insight as to how children who survive without major handicap fare.


    Conclusions
 Top
 Abstract
 Introduction
 Conceptualization and Design...
 Subject Populations
 Procedural Issues
 Measurement/Outcome
 Conclusions
 References
 
A promising area of research includes relating routine brain imaging techniques, such as cranial ultrasound, and less frequently employed techniques, such as cerebral blood flow (PET or SPECT), oxygen or glucose metabolism (PET), and functional activity of the brain (echoplanar or FMRI), to outcomes. Use of biochemical markers such as pro-inflammatory cytokines and protective oligotrophins and neurotrophins (Dammann & Leviton, 1999Go) may enhance predictive accuracy in outcome studies. These investigations will require expanded collaboration among pediatric psychologists and other disciplines. It appears that a national consensus is needed with regard to elements and procedures necessary in basic follow-up protocols, to allow comparability across studies and pooling of data. In summary, those involved in developmental follow-up must realize that their data will influence actual decisions made for real children; it is imperative that these data be accurate and based on sound methodology.


    Acknowledgments
 
Special thanks to Steven J. Verhulst for his helpful manuscript review.

Received December 1, 1999; revision received July 1, 2000; accepted October 1, 2000


    References
 Top
 Abstract
 Introduction
 Conceptualization and Design...
 Subject Populations
 Procedural Issues
 Measurement/Outcome
 Conclusions
 References
 
Aylward, G. P. (1990). Environmental influences and the developmental outcome of children at risk. Infants and Young Children, 2, 1 -9.

Aylward, G. P. (1992). The relationship between environmental risk and developmental outcome. Journal of Developmental and Behavioral Pediatrics, 13, 222-229.[ISI][Medline]

Aylward, G. P. (1996). Environmental risk, intervention and developmental outcome. Ambulatory Child Health, 2, 161 -170.

Aylward, G. P. (1997a). Environmental Influences: Considerations for early assessment and intervention. In S. M. Clancy Dollinger & L. F. Dilalla (Eds.), Assessment and intervention issues across the life span (pp. 9-33). Mahwah, NJ: Lawrence Erlbaum

Aylward, G. P. (1997b). Infant and early childhood neuropsychology. New York: Plenum.

Aylward, G. P., Hatcher, R. P., Stripp, B., Gustafson, N. F., & Leavitt, L. A. (1985). Who goes and who stays: Subject loss in a multicenter, longitudinal follow-up study. Journal of Developmental and Behavioral Pediatrics, 6, 3 -8.[ISI][Medline]

Aylward, G. P., & Kenny, T. J. (1979). Developmental follow-up: Inherent problems and a conceptual model. Journal of Pediatric Psychology, 4, 331-343.[Abstract/Free Full Text]

Aylward, G. P., Pfeiffer, S. I., Wright, A., & Verhulst, S. J. (1989). Outcome studies of low birth weight infants published in the last decade: A metaanalysis. Journal of Pediatrics, 115, 515-521.[ISI][Medline]

Bayley, N. (1969). Bayley Scales of Infant Development. San Antonio, TX: The Psychological Corporation.

Bayley, N. (1993). Bayley Scales of Infant Development. 2nd ed. San Antonio, TX: The Psychological Corporation.

Bendersky, M., & Lewis, M. (1994). Environmental risk, biological risk, and developmental outcome. Developmental Psychology, 30, 484 -494.

Black, M. M., & Matula, K. (2000). Essentials of Bayley Scales of Infant Development-II assessment. New York: John Wiley.

Blasko, P. A. (1989). Preterm birth: To correct or not to correct. Developmental Medicine and Child Neurology, 31, 816-826.[ISI][Medline]

Campbell, M. K., Halinda, E., Curlyle, M. J., Fox, A. M., Turner, L. A., & Chance, G. W. (1993). Factors predictive of follow-up clinic attendance and developmental outcome in a regional cohort of very low birth weight infants. American Journal of Epidemiology, 138, 704 -713.[Abstract/Free Full Text]

Carmichael-Olson, H., Streissguth, A. P., Bookstein, F. L., Barr, H. M., & Sampson, P. D. (1994). Developmental research in behavioral teratology: Effects of prenatal alcohol exposure on child development. In S. L. Friedman & H. C. Haywood (Eds.), Developmental follow-up (pp. 67 -112). New York: Academic Press.

Centor, R. M., & Schwartz, J. S. (1985). An evaluation of methods for estimating the area under the receiver operating characteristic (ROC) curve. Medical Decision Making, 5, 149-156.[Free Full Text]

Dammann, O., & Leviton, A. (1999). Brain damage in preterm newborns: Might enhancement of developmentally regulated endogenous protection open a door for prevention? Pediatrics, 104, 541-550.[Abstract/Free Full Text]

Escalona, S. K. (1982). Babies at double hazard: Early development of infants at biologic and social risk. Pediatrics, 70, 670 -676.[Abstract/Free Full Text]

Escobar, G. J., Littenberg, B., & Petitti, D. B. (1991). Outcome among surviving very low birthweight infants: A meta-analysis. Archives of Disease in Childhood, 66, 204-211.[Abstract]

Flynn, J. R. (1999). Searching for justice. The discovery of IQ gains over time. American Psychologist, 54, 5-20.

Gauthier, S. M., Bauer, C. R., Messinger, D. S., & Closius, J. M. (1999). The Bayley Scales of Infant Development II: Where to start? Journal of Developmental and Behavioral Pediatrics, 20, 75-79.[ISI][Medline]

Hack, M. (1999). Consideration of the use of health status, functional outcome, and quality-of-life to monitor neonatal intensive care. Pediatrics, 103, 319 -328.

Hack, M., & Fanaroff, A. A. (1988). How small is too small? Considerations in evaluating the outcome of the tiny infant. Clinics in Perinatology, 15, 773 -788.[ISI][Medline]

Hack, M., & Fanaroff, A. A. (1999). Outcome of children of extremely low birthweight and gestational age in the 1990's. Early Human Development, 53, 193 -218.[ISI][Medline]

Hollingshead, A. B. (1975). Four-factor index of social status. Working paper. New Haven, CT.

Hunt, J. V., Cooper, B. A., & Tooley, W. H. (1988). Very low birth weight infants at 8 and 11 years of age: Role of neonatal illness and family status. Pediatrics, 82, 596-603.[Abstract/Free Full Text]

Hunt, J. V., & Rhodes, L. (1977). Mental development of preterm infants during the first year. Child Development, 48, 204 -210.[ISI][Medline]

Investigators of the Vermont-Oxford trials Network Database Project. (1993). The Vermont-Oxford trials network: Very low birth weight outcomes for 1990. Pediatrics, 91, 540-545.[Abstract/Free Full Text]

Jacobson, J. L., & Jacobson, S. W. (1996). Methodological considerations in behavioral toxicology in infants and children. Developmental Psychology, 32, 390-403.

Johnson, A. (1997). Follow-up studies: A case for a standard minimum data set. Archives of Disease in Childhood, 76, F61 -F63.

Keith, T. Z. (1993). Latent variable structural equation models: LISREL in special education research. Remedial and Special Education, 14, 36 -46.

Kiely, J. L., & Paneth, N. (1981). Follow-up studies of low-birthweight infants: Suggestions for design, analysis, and reporting. Developmental Medicine and Child Neurology, 23, 96-99.[ISI][Medline]

Koller, H., Lawson, K., Rose, S. A., Wallace, I., & McCarton, C. (1997). Patterns of cognitive development in very low birth weight children during the first six years of life. Pediatrics, 99, 383 -389.[Abstract/Free Full Text]

Landry, S. H., Smith, K. E., Miller-Loncar, C. L., & Swank, P. R. (1997). Predicting cognitive-language and social growth curves from early maternal behaviors in children at varying degrees of biological risk. Developmental Psychology, 33, 1040 -1053.[ISI][Medline]

Liaw, F. R., & Brooks-Gunn, J. (1993). Patterns of low-birth weight children's cognitive development. Developmental Psychology, 29, 1024 -1035.

Lindsey, J. C., O'Donnell, K., & Brouwers, P. (2000). Methodological issues in analyzing psychological test scores in pediatric clinical trials. Journal of Developmental and Behavioral Pediatrics, 21, 141 -151.[ISI][Medline]

Matula, K., Gyurke, J. S., & Aylward, G. P. (1997). Response to commentary: Bayley Scales-II. Journal of Developmental and Behavioral Pediatrics, 18, 112-113.

McCartney, K., & Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71, 173 -180.[ISI][Medline]

McCormick, M. C. (1989). Long-term follow-up of infants discharged from neonatal intensive care units. Journal of the American Medical Association, 261, 1767 -1772.[Abstract]

McCormick, M. C. (1997a). The outcome of very low birth weight infants: Are we asking the right questions? Pediatrics, 99, 869 -876.[Free Full Text]

McCormick, M. C. (1997b). Quality of care: An overdue agenda. Pediatrics, 99, 249 -250.[Free Full Text]

Miller, G., Dubowitz, L. M. S., & Palmer, P. (1984). Follow-up of pre-term infants: Is correction of the developmental quotient for prematurity helpful? Early Human Development, 9, 137 -144.[ISI][Medline]

Msall, M. E., DiGaudio, K. M., & Duffy, L. C. (1993). Use of functional assessment in children with developmental disabilities. Physical Medicine and Rehabilitation Clinics of North America, 4, 517 -527.

Parker, S., Greer, S., & Zuckerman, B. (1988). Double jeopardy: The impact of poverty on early child development. Pediatric Clinics of North America, 35, 1227 -1240.[ISI][Medline]

Pollack M. M., Koch, M. A., Bartel, D. A., Rapoport, I., Dhanireddy, R., El-Mohandes, A. A., Harkavy, K, & Subramanian, K. N. (2000). A comparison of neonatal mortality risk prediction models in very low birth weight infants. Pediatrics, 105, 1051 -1057.[Abstract/Free Full Text]

Ross, G., & Lawson, K. (1997). Using the Bayley-II: Unresolved issues in assessing the development of prematurely born children. Journal of Developmental and Behavioral Pediatrics, 18, 109 -111.[ISI][Medline]

Saigal, S., Feeny, D., Rosenbaum, P., Furlong, W., Burrows, E., & Stoskopf, B. (1996). Self-perceived health status and health-related quality of life of extremely low-birth-weight infants at adolescence. Journal of the American Medical Association, 276, 453-459.[Abstract]

Sameroff, A. J., & Chandler, M. J. (1975). Reproductive risk and the continuum of caretaking casualty. In F. D. Horowitz (Ed.), Review of child development research (vol. 4, pp. 187-244). Chicago: University of Chicago Press.

Scott, K. G., Mason, C. A., & Chapman, D. A. (1999). The use of epidemiologic methodology as a means of influencing public policy. Child Development, 70, 1263 -1272.

Siegel, L. S. (1985). Biological and environmental variables as predictors of intellectual functioning at 6 years. In S. Harel & N. Anastasiow (Eds.) The at-risk infant: Psycho/socio/medical aspects (pp. 65-73). Baltimore: Brookes.

Taylor, H. G., Klein, N., Schatschneider, C., & Hack, M. (1998). Predictors of early school age outcomes in very low birth weight children. Journal of Developmental and Behavioral Pediatrics, 19, 235 -243.[ISI][Medline]

Touwen, B. C. L. (1986). Very low birth weight infants. European Journal of Pediatrics, 145, 460.[ISI][Medline]

Tyson, J. E., & Broyles, R. S. (1996). Progress in assessing the long-term outcome of extremely low-birth-weight infants. Journal of the American Medical Association, 276, 492-493.[ISI][Medline]

Vohr, B. R. & Msall, M. E. (1997). Neuropsychological and functional outcomes of very low birth weight infants. Seminars in Perinatology, 21, 202 -220.[ISI][Medline]

Washington, K., Scott, D. T., Johnson, K. A., Wendel, S., & Hay, A. E. (1998). The Bayley Scales of Infant Development-II and children with developmental delays: A clinical perspective. Journal of Developmental and Behavioral Pediatrics, 19, 346-349.[ISI][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Arch Pediatr Adolesc MedHome page
A. H. Whitaker, J. F. Feldman, J. M. Lorenz, S. Shen, F. McNicholas, M. Nieto, D. McCulloch, J. A. Pinto-Martin, and N. Paneth
Motor and cognitive outcomes in nondisabled low-birth-weight adolescents: early determinants.
Arch Pediatr Adolesc Med, October 1, 2006; 160(10): 1040 - 1046.
[Abstract] [Full Text] [PDF]


Home page
PediatricsHome page
P. H. Casey, L. Whiteside-Mansell, K. Barrett, R. H. Bradley, and R. Gargus
Impact of Prenatal and/or Postnatal Growth Problems in Low Birth Weight Preterm Infants on School-Age Outcomes: An 8-Year Longitudinal Evaluation
Pediatrics, September 1, 2006; 118(3): 1078 - 1086.
[Abstract] [Full Text] [PDF]


Home page
PediatricsHome page
M. Hack, H. G. Taylor, D. Drotar, M. Schluchter, L. Cartar, D. Wilson-Costello, N. Klein, H. Friedman, N. Mercuri-Minich, and M. Morrow
Poor Predictive Validity of the Bayley Scales of Infant Development for Cognitive Function of Extremely Low Birth Weight Children at School Age
Pediatrics, August 1, 2005; 116(2): 333 - 341.
[Abstract] [Full Text] [PDF]


Home page
J Pediatr PsycholHome page
S. L. Wilson and M. M. Cradock
Review: Accounting for Prematurity in Developmental Assessment and the Use of Age-Adjusted Scores
J. Pediatr. Psychol., December 1, 2004; 29(8): 641 - 649.
[Abstract] [Full Text] [PDF]


Home page
J Pediatr PsycholHome page
G. P. Aylward
Presidential Address. Prediction of Function From Infancy to Early Childhood: Implications for Pediatric Psychology
J. Pediatr. Psychol., October 1, 2004; 29(7): 555 - 564.
[Abstract] [Full Text] [PDF]


Home page
JAMAHome page
G. P. Aylward
Cognitive Function in Preterm Infants: No Simple Answers
JAMA, February 12, 2003; 289(6): 752 - 753.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Aylward, G. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Aylward, G. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?