Skip Navigation


Journal of Pediatric Psychology Advance Access originally published online on June 3, 2008
Journal of Pediatric Psychology 2008 33(10):1076-1084; doi:10.1093/jpepsy/jsn055
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
33/10/1076    most recent
jsn055v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Karazsia, B. T.
Right arrow Articles by van Dulmen, M. H. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Karazsia, B. T.
Right arrow Articles by van Dulmen, M. H. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society of Pediatric Psychology. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Regression Models for Count Data: Illustrations using Longitudinal Predictors of Childhood Injury*

Bryan T. Karazsia, MA and Manfred H. M. van Dulmen, PHD

Kent State University

All correspondence concerning this article should be addressed to Bryan T. Karazsia, Department of Psychology, Kent State University, Kent, OH 44242, USA. E-mail: bkarazsi{at}kent.edu


    Abstract
 Top
 Abstract
 Method
 Results
 Discussion
 Appendix
 Acknowledgments
 References
 
Objective To offer a practical demonstration of regression models recommended for count outcomes using longitudinal predictors of children's medically attended injuries. Method Participants included 708 children from the NICHD child care study. Measures of temperament, attention, parent–child relationship, and safety of physical environment were used to predict medically attended injuries. Results Statistical comparisons among five estimation methods revealed that a zero-inflated Poisson (ZIP) model provided the best fit with observed data. ZIP models simultaneously model dichotomous and continuous outcomes of count variables, and different constellations of predictors emerged for each aspect of the estimated model. Conclusions This study offers a practical demonstration of techniques designed to handle dependent count variables. The conceptual and statistical advantages of these methods are emphasized, and Stata script is provided to facilitate adoption of these techniques.

Key words: count data; injury; regression.


Count data with a preponderance of zeros are frequently analyzed by pediatric psychologists. Common examples of such count data include number of patient hospitalizations (Logan, Radcliffe, & Smith-Whitley, 2002Go), frequency of adolescent alcohol use (Audrain-McGovern, Rodriguez, Tercyak, Neuner, & Moss, 2006Go), and number of childhood injuries (Morrongiello, Ondejko, & Littlejohn, 2004Go; Schwebel, Brezausek, Ramey, & Ramey, 2004Go). Distributions of such data violate fundamental assumptions of many commonly used multivariate statistical techniques [e.g., ordinary least squares (OLS) regression], leading to results that do not accurately reflect the observed data (Hammer & Landau, 1981Go). Fairly recently, statistical techniques that overcome these problems have been developed (Hall, 2000Go; Lambert, 1992Go). Even though these techniques are better suited to handle count data on a dependent variable than for example OLS regression, few pediatric psychologists are familiar with these techniques. The goal of the present article is therefore to illustrate the use of these techniques by offering a practical demonstration using prospective data from the National Institute of Child Health and Human Development (NICHD) Study of Early Child Care.

Understanding Count Data
A count refers to the number of specified events that occur in a given interval of time. By definition, count data consist of only nonnegative integers. The specified event can include any behavior of interest, and counts are utilized frequently in the field of pediatric psychology. For example, in a recent analysis of service use among adolescents with sickle cell disease, Logan and colleagues (2002Go) reported frequencies of hospitalizations over a one year period. Data collected from medical chart reviews were summed to create a single variable depicting the number of hospitalizations. As is common with count variables, the authors reported that >50% of participants had not been hospitalized (Logan et al., 2002Go). In other words, because such a large number of individuals had not experienced this event, we would refer to this count variable as being zero-inflated. Other recent examples within pediatric psychology of such zero-inflated count data include adolescent substance use (Audrain-McGovern et al., 2006Go), number of sexual partners (Prinstein, Meade, & Cohen, 2003Go), and children's history of injuries (Hagan & Kuebli, 2007Go).

Although common, analysis of count outcomes presents unique challenges (Atkins & Gallop, 2007Go). When target behaviors are relatively rare, the resulting distributions are highly skewed with a preponderance of zeros. Such distributions violate fundamental assumptions of OLS regression, most notably normality of residuals.1 As a consequence, resulting sample statistics differ from true population parameters (Hammer & Landau, 1981Go), and the skew can lead to inaccurate standard errors and an increase in Type I or Type II error rates (Gardner, Mulvey, & Shaw, 1995Go).

Potential "Solutions"
Traditionally, researchers have used two solutions to deal with zero-inflated count data. First, researchers have opted to transform such data. A square root transformation has been recommended for count data (Johnson & Wichern, 1998Go), though several problems with transformations of count variables are documented (see Sturman, 1999Go for review). Most notably, they do not address the high preponderance of zeros, so meaningless values are predicted (e.g., negative values even though counts can be only positive; Hammer & Landau, 1981Go; Harrison & Hulin, 1989Go). In addition, transformed data are more difficult to interpret than nontransformed data (Tabachnick & Fidell, 2007Go).

Another commonly used approach is to dichotomize data into groups: those who performed the behavior (nonzero counts) and those who did not (zero counts). For example, one may be interested in the factors that predict whether or not adolescents are hospitalized. This approach is problematic because dichotomization ignores meaningful variation, and as such, occasions to which dichotomization can be applied are rare (MacCallum, Zhang, Preacher, & Rucker, 2002Go).

Alternative Models
Fortunately, numerous models have been developed specifically for count data (Long & Freese, 2006Go; Sano, Jeong, Acock, & Zvonkovic, 2005Go). These models can handle nonnormality on the dependent variable and do not require the researcher to either dichotomize or transform the dependent variable. We focus on four of these models (Atkins & Gallop, 2007Go; Long & Freese, 2006Go; Sano et al., 2005Go): Poisson, negative binomial, zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB).

Poisson
The Poisson distribution was developed to model discrete counts, and because it is similar to linear regression in many respects, it is relatively easy to interpret.2 This distribution becomes increasingly positively skewed as the mean of the dependent variable decreases (Long & Freese, 2006Go), reflecting a common property of count data.

The apparent simplicity of Poisson comes with two restrictive assumptions (Sturman, 1999Go). First, the variance and mean of the count variable are assumed to be equal. In reality, however, the variance is usually much greater than the mean (i.e., overdispersion; Cameron & Trivedi, 1986Go) and therefore Poisson models—though widely used to handle count data—may not be well suited to handle some types of count outcomes. Another restrictive assumption of Poisson models is that occurrences of the specified behavior are assumed to be independent of each other. This assumption is also frequently violated. For example, in the case of children's injuries, past injurious experiences are known to be related to future injury risk (Jacques & Finney, 1994Go).

Negative binomial
The negative binomial distribution is similar to the Poisson distribution, but the assumption of independence of observations is lifted, reflecting the notion that the extent to which a participant engages in repeated occurrences may be influenced by individual differences (Sturman, 1999Go). Further, the variance and mean are not assumed to be equal, so overdispersion is no longer problematic. These assumptions aside, the similarity between negative binomial and Poisson distributions is demonstrated by the fact that the negative binomial distribution converges to the Poisson distribution when the variance and mean are equal (i.e., equidispersion; Sturman, 1999Go). Statistical comparisons between Poisson and negative binomial regression models confirm that in most cases the negative binomial better represents observed counts than Poisson (Hausman, Hall, & Griliches, 1984Go).

Zero-inflated Models
There are, however, some situations where a major source of overdispersion is a preponderance of zero counts, and the resulting overdispersion cannot be modeled accurately with negative binomial estimation. In such scenarios, one can use zero-inflated (Poisson or negative binomial) estimation methods. Zero-inflated techniques permit the researcher to answer two questions that pertain to low base rate-dependent variables: (a) what predicts whether or not the behavior occurs, and (b) if the behavior occurs, what predicts frequency of occurrence? In other words, two regression equations are created: one predicting whether the count occurs and a second one predicting differences on the occurrence of the count (Long & Freese, 2006Go). Additionally, zero-inflated models have a statistical advantage to standard Poisson and negative binomial models in that they model the preponderance of zeros as well as the distribution of positive counts simultaneously. Unfortunately, there is not a specific frequency of zero counts or ratio of zero to nonzero counts that can be used to determine if a particular distribution is zero-inflated. However, researchers can utilize post hoc analyses to determine which model most accurately reflects the observed distribution (UCLA Statistical Consulting Group, 2008Go). These post hoc comparisons are explained in more detail below.

With regard to predicting whether the count occurs, zero-inflated models first explore the prediction of two latent (unobserved) groups: an "always zero group" (e.g., individuals who are never hospitalized) and a "not always zero group" (e.g., individuals who may be hospitalized). The "always zero group" contains individuals who cannot be hospitalized (perhaps their access is restricted). Remaining individuals will be in the "not always zero group" because they have potential to be hospitalized. Individuals in this group may or may not have a count of zero. That is, their probability of being hospitalized is greater than zero, but they may never become ill.

It is important to note that it is possible for individuals to have a zero count for different reasons (Sano et al., 2005Go). Some individuals will have a zero count because their access was restricted, while others will have a zero count because they were never ill. Still other adolescents may have access and become ill. While these differences are not modeled with standard Poisson and negative binomial, zero-inflated models first account for the excessive zeros by predicting group membership [an unobserved (latent) dichotomous outcome] based on the constellation of predictors included in the model and then predicting frequency of counts for only those in the "not always zero group" (a continuous outcome). The latter process is akin to a standard Poisson or negative binomial model, but in this case it occurs after consideration of the excessive zeros. A ZIP will reflect data accurately when overdispersion is caused by a preponderance of zeros. If overdispersion is attributed to factors beyond the inflation of zeros, a ZINB model is more appropriate (Long & Freese, 2006Go).

The Present Study
The purpose of the present study is to present a practical application of the aforementioned models using longitudinal data about children's unintentional injuries. Results based on OLS, Poisson, negative binomial, ZIP, and ZINB models are presented in consideration of which technique provides the "best" fit with observed data.

Background Information
Unintentional injuries are the leading cause of death of children in most industrial countries (National Safety Council, 2004Go), and research on unintentional injuries is gaining increased focused among public health professionals and policymakers (National Center for Injury Prevention and Control, 2006Go; Schwebel & Gaines, 2007Go). Injury researchers frequently count the number of injuries that children sustain in a given time period (Hagan & Kuebli, 2007Go; Morrongiello et al., 2004Go, Schwebel et al., 2004Go). Among factors known to impact child risk for injury, child sex and child temperament are two well-established predictors. Boys experience up to four times as many nonfatal injuries than girls (Morrongiello & Hogg, 2004Go). In terms of temperament, constellations of behaviors described as impulsive and hyperactive are related to child risk for injury (Schwebel & Gaines, 2007Go). Previous research also suggests that child risk for injury is affected by parental and environmental variables (Morrongiello, 2005Go; Schwebel et al., 2004Go). Positive parenting has been identified as a protective factor against children's injuries, even among children with more difficult temperaments (Schwebel et al., 2004Go). Additionally, unsafe environments that contain many physical hazards contribute to increased risk of injury (Rivara & Barber, 1985Go). This ecological framework guided selection of variables included as predictors of medically attended injuries.


    Method
 Top
 Abstract
 Method
 Results
 Discussion
 Appendix
 Acknowledgments
 References
 
Participants
Data for the present study came from the NICHD Study of Early Child Care. Children and families were recruited from 10 hospitals across the US shortly after birth of the target child. Complete descriptions of the study protocol and procedures are available elsewhere (NICHD Early Child Care Research Network, 2000Go). The NICHD study followed children and families through the first 15 years of the target child's life. In the present study, variables assessed when children were 54 months of age were examined as predictors of unintentional injuries that occurred from the second through sixth grades. These time points were chosen because we wanted to identify precursors of injury events, as opposed to variables that might be associated with injury events concurrently or retrospectively. Given the time lag between independent variables and the dependent count outcome, variables that emerge as significant predictors of injuries highlight potential targets for prevention efforts.

When the NICHD study commenced, 1,364 families were enrolled, of which 844 completed all assessments at age 54 months, and 708 completed all injury reports through the sixth grade. Only participants who had complete data on all variables utilized in present analyses were included in the present study. Independent samples t-tests revealed no statistically significant differences between families with and without complete injury data for the predictor variables used in this study. The ethnic background of the 708 mothers included in present analyses was: 86.9% Caucasian, 9.2% African American, and 3.9% other races. The mean age of mothers at the time of the target child's birth was 28.93 years (SD = 5.46). Ethnic composition reported for children was: 86.0% Caucasian, 9.0% African American, and 5.0% other ethnicities. This subsample included 356 boys (50.3%) and 352 girls (49.7%).

Measures
Child Temperament
Maternal reports of a positive anticipation aspect of child temperament were assessed with a modified version of the Child Behavior Questionnaire (CBQ; Rothbart, Ahadi, & Hershey, 1994Go).3 The NICHD Study utilized a shortened version of this questionnaire that contained 80 items. Average scores from the abbreviated Approach/Anticipation Scale (10 of the original 13 items) were used in the present analyses (Cronbach's {alpha} =.70 with the full NICHD sample; Research Triangle Institute, 1999aGo). Higher scores indicate a greater propensity to approach situations with excitement and anticipation of a pleasurable outcome (Sample item: My child ..."gets so worked up before an exciting event that s[he] has trouble sitting still").

Child Attention
The Continuous Performance Task (CPT; Rosvold, Mirsky, Sarason, Bransome, & Beck, 1956Go) provided a standardized measure of reaction time in the target child. It is a widely used computer generated task in which children are asked to press a button each time a target stimulus is presented. The CPT is a reliable and valid measure of children's attention (Halperin, Sharma, Greenblatt, & Schwartz, 1991Go). In the present study, children completed this task in a study laboratory, and the mean time that elapsed from the time the stimulus presented until the child pressed the button was used as an index of attention. Shorter average time spans (i.e., reaction times) were used as indicators of increased attention.

Quality of Mother–Child Relationship
Maternal report of the quality of the mother–child relationship was obtained with 27 items from the Parent–Child Relationship Scale (PCRS; Pianta, 1994Go). The 27 items were summed to provide an index of maternal views of their relationship with children, with higher scores indicating a more positive relationship between the mother and the target child (Cronbach's {alpha} =.84 with the full NICHD sample; Research Triangle Institute, 1999bGo).

Safety of Physical Environment
Safety of the home environment was assessed with the seven-item Physical Environment scale of the H.O.M.E. Inventory (Caldwell & Bradley, 1984Go). Scores on this scale range from 0 to 7, with higher scores reflecting a physical environment that is free of hazards and safe for children (e.g., "Outside play environment appears safe"). The complete H.O.M.E. Inventory contains 55 items that are scored in a binary (Yes/No) format during a 60–90 min semi-structured home interview. Alpha coefficients for all scales are >.90, and inter-observer agreement is >90% (Caldwell & Bradley, 1984Go).

Injury History
Mothers completed interviews during the fall and spring semesters when the target child was in the second grade and once each year when the child was in the third through sixth grades. Mothers were asked if the target child received an injury that required medical attention since the previous interview (Yes/No). Empirical evidence suggests that such reports of injury history are reliable (Pless & Pless, 1995Go). For the present study, the outcome variable was a sum of all reported injuries that required medical attention during this time span. The distribution of injuries had substantial skew (1.74) and kurtosis (3.15), with a mean of 0.54 and SD of 0.84. The modal number of injuries was 0 (455 cases), followed by 1 (178 cases), 2 (64 cases), 3 (20 cases), 4 (5 cases), and 5 (1 case).


    Results
 Top
 Abstract
 Method
 Results
 Discussion
 Appendix
 Acknowledgments
 References
 
Results are presented and discussed in the following order: 1—determination of which model provided the best fit with observed data, and 2—identification and interpretation of significant predictors. All regression analyses were conducted in Stata version 8.0 (StataCorp., 2003Go). These methods are available in other programs (Atkins & Gallop, 2007Go), though capabilities vary among software packages (Appendix A). Means, SDs, and intercorrelations among predictor variables are presented in Table I.


View this table:
[in this window]
[in a new window]

 
Table I. Means, Standard Deviations, and Correlations of Independent Variables (N = 708)

 
Determining Appropriate Model
Results from the OLS, Poisson, and negative binomial regression models are presented in Table II. With OLS regression, analysis of a plot of predicted values versus residuals revealed that the residuals were not normally distributed, and as such, results based on the OLS approach cannot be trusted (Tabachnick & Fidell, 2007Go). With respect to deciding between the Poisson and negative binomial models, the negative binomial is appropriate when the outcome variable is overdispersed (i.e., mean and variance differ significantly from each other). When conducting a negative binomial regression model, Stata automatically computes a likelihood-ratio (LR) test that examines the null hypothesis that the dispersion parameter is equal to zero. In the present analysis, this test was statistically significant, {chi}2(1) = 21.87, p <.001, indicating that the dependent count variable is overdispersed. Thus, the observed data are better explained by the negative binomial than the Poisson model.


View this table:
[in this window]
[in a new window]

 
Table II. Summary of OLS, Poisson, and negative binomial Regression Analyses Predicting Children's Injury History (N = 708)

 
Additionally, two zero-inflated models were analyzed: ZIP and ZINB (Table III). If the number of zeros in the count distribution is excessive, then the ZIP or ZINB will more accurately reflect the data than the negative binomial model. The Vuong test compares the ZIP model to the standard Poisson model by testing the null hypothesis that both models are equally similar to the observed distribution. The resulting z-value was positive and statistically significant (z = 2.98, p <.01), demonstrating that the ZIP model more accurately reflects observed data than standard Poisson. If overdispersion is not accounted for by the ZIP model, then there may be other aspects of the distribution that contribute to overdispersion, in which case the ZINB model will be most appropriate (Long & Freese, 2006Go). Since ZIP and ZINB models are nested, they can be compared using the same LR test described previously for comparing standard Poisson and negative binomial (Long & Freese, 2006Go). Though Stata does not compute this statistic automatically for zero-inflated models, it can be generated with Stata script (Long & Freese, 2006Go; Appendix B). This statistic was not statistically significant (z = –2.89, p =.50), indicating that the source of overdispersion is likely an excess of zeros, which is modeled appropriately with ZIP.


View this table:
[in this window]
[in a new window]

 
Table III. Summary of ZIP and Negative Binomial Regression Analyses Predicting Children's Injury History (N = 708)

 
As can be seen in Tables II and III, results can vary across models. For example, child sex emerged as the only significant predictor of children's injuries in the Poisson and negative binomial models; it was not a statistically significant predictor in either of the zero-inflated models. Additionally, two different dependent variables are considered in the zero-inflated models. The present results indicate that the ZIP model most accurately reflected the data, so only results from the ZIP regression can be considered for interpretation. Figure 1 presents a visual depiction of the difference between observed and predicted probabilities for each model. On this graph, values above zero on the y-axis denote more observed counts than predicted, while those below zero indicate less observed counts than predicted.


Figure 1
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Comparisons among observed versus predicted probabilities among count models (N = 708).

 
Identifying and Interpreting Significant Predictors
Stata also automatically generates a LR test that tests the null hypothesis that all parameters equal zero. The resulting statistic indicated that the probability of obtaining the observed distribution if the null hypothesis is true was <.05. Therefore, we can conclude with confidence that at least one coefficient differs significantly from zero. As can be seen in Table III, two coefficients are generated for each variable. Coefficients on the bottom half of the table predict the dichotomous outcome of group membership (i.e., "not always zero" vs. "always zero" groups). Only the approach/anticipation factor emerged as a significant predictor of this dichotomous outcome. There was a trend (p =.054) for physical environment. The top group of coefficients predicts the continuous frequency of injuries for individuals in the "not always zero group". The positive anticipation factor and child attention emerged as statistically significant predictors of this continuous outcome. Child sex did not emerge as a statistically significant predictor of either the dichotomous or continuous outcome.

It can be useful to interpret statistically significant results in terms of exponentials of the coefficient (eb). For the dichotomous outcome, the exponential indicates the factor change in odds for every unit increase in the respective independent variable (Long & Freese, 2006Go). For the present results, an individual's chance for membership in the "always zero" group (no injuries) decreases by a factor of.46 for every one unit increase in positive anticipation, holding all other variables constant. Among the children in the "not always zero" group, an individual's chance for experiencing fewer injuries decreases by factors of.78 and.99 with every one unit increase in positive anticipation and attention, respectively. Directionality and magnitude of other coefficients cannot be interpreted because they do not differ significantly from zero.


    Discussion
 Top
 Abstract
 Method
 Results
 Discussion
 Appendix
 Acknowledgments
 References
 
This article demonstrated the usefulness of fairly recent statistical advances to overcome unique problems associated with the prediction of count data with a preponderance of zeros. A practical demonstration of techniques that can be used to identify predictors of count outcomes was offered. Conceptual and statistical advantages of Poisson, negative binomial, ZIP, and ZINB were discussed. As illustrated, accuracy and nature of results tend to vary depending on the specific model utilized. In other words, some models reflected the observed data more accurately than others, and the longitudinal predictors of unintentional injury differed depending on whether Poisson, negative binomial, ZIP, or ZINB was used. The differences in predictors across these models illustrate the importance of carefully considering which model best represents the observed count data. Of course, the process of choosing statistical approaches to data analysis is dependent on research questions as well as nuances of observed data. As such, adoption of these techniques in future research should be guided, at least in part, by specific questions asked by scholars in pediatric psychology. We encourage researchers to consider how characteristics of their data might influence their selection of statistical methodologies (MacCallum et al., 2002Go). In doing so, it may be useful to compare results across the four models discussed in this article when using count outcomes (UCLA Statistical Consulting Group, 2008Go). It seems plausible that as these methods are adopted within specific subfields, a particular technique may become the new tradition. For example, it may be that distributions of injury counts will always be estimated more accurately with ZIP versus ZINB models. As such, it may become redundant to test repeatedly among the models once a particular tradition is established. Stata script is provided in Appendix B to facilitate consideration of each of these approaches.

We illustrated how these methods can be applied to count outcomes in the pediatric literature using children's medically attended injuries as an example. Findings from this study support the idea that antecedents and correlates of unintentional injury are multifaceted in nature (Morrongiello et al., 2004Go; Schwebel et al., 2004Go). While previous research suggests that child sex is a robust predictor of children's unintentional injuries (Morrongiello & Hogg, 2004Go; Schwebel et al., 2004Go), the present results indicate that the importance of child sex as a predictor decreases substantially in the context of additional attributes of children, such as aspects of child temperament and attention. This pattern of results emerged only when the model that best reflected observed data was utilized (i.e., ZIP). This finding was very surprising, given the substantial body of literature that documents sex differences in childhood injury rates across a variety of demographics (National Center for Injury Prevention and Control, 2006Go). Of course, future research is needed to replicate this finding and identify additional mechanisms that may explain sex differences in rates of children's injuries (Hagan & Kuebli, 2007Go; Morrongiello & Hogg, 2004Go). The present findings have important implications for injury prevention efforts because they highlight potential targets that may explain why boys are at an increased risk of injury than girls.

While the present analyses were conducted with longitudinal data from a large sample, they are limited in several respects. Behavioral risk factors that place children at risk for injury are complex (see Schwebel & Gaines, 2007Go, for review), and only one aspect of children's temperament was included in this study. Second, analyses of physical environment were limited to the child's home environment, even though reports of children's injuries occurred during their elementary school years. It is possible that environment did not emerge as a significant predictor because children of this age spend increasing amounts of time away from home (Schwebel & Brezausek, 2007Go). Third, generalizability of these findings is limited by the definition of injuries used in this study. It is possible that different determinants of injury exist for injuries of varying degrees of severity. Future research is needed to examine this issue in more detail. In light of these limitations, it is our hope that the demonstration of statistical methods presented in this study will propel future work on children's unintentional injuries and other topics in pediatric psychology that rely on count outcomes.


    Appendix
 Top
 Abstract
 Method
 Results
 Discussion
 Appendix
 Acknowledgments
 References
 

Appendix A. Comparisons of Statistical Software Programs

LISREL Mplus R SAS SPSS Stata

Poisson Yes Yes Yes Yes Yes Yes
Negative binomial Yes Yes Yes Yes Yes Yes
ZIP No Yes Yes Yes No Yes
ZINB No Yes Yes Yes No Yes

Comparisons of these programs are based on the most recent version of each software package.


Appendix B.
Stata Script
    Poisson.
        . poisson InjSum chsex CBQapprch CPTtime PCRSpos HOMEphys
    negative binomial.a
        . nbreg InjSum chsex CBQapprch CPTtime PCRSpos HOMEphys
    Zero-Inflated Poisson (ZIP).b
        . zip InjSum chsex CBQapprch CPTtime PCRSpos HOMEphys, inf(chsex CBQapprch CPTtime PCRSpos HOMEphys) vuong
    Zero-Inflated negative binomial (ZINB).c
        . zinb InjSum chsex CBQapprch CPTtime PCRSpos HOMEphys, inf(chsex CBQapprch CPTtime PCRSpos HOMEphys)
    Comparison of ZIP and ZINB (adapted from Long & Freese, 2006Go).
        . quietly zinb InjSum chsex CBQapprch CPTtime PCRSpos HOMEphys, inf(chsex CBQapprch CPTtime PCRSpos HOMEphys) nolog
        . scalar llzinb = e(ll)
        . scalar lr = –2*(llzip-llzinb)
        . scalar pvalue = chiprob(1,lr)/2
        . scalar lnalpha = –.7968841
        . if (lnalpha < –20) scalar pvalue = 1
        . di as text "Likelihood-ratio test comparing ZIP to ZINB:" as res %8.3f lr as text
            "Prob> =" as res %5.3f pvalue

aLR test is generated automatically.

bScript for Vuong tests is included.

cLR test is not generated automatically for zero-inflated models (see script for comparing ZIP and ZINB).

InjSum, Summation of injuries from second through sixth grades; chsex, child sex; CBQapprch, Child Behavior Questionnaire–approach/anticipation subscale; CPTtime, Continuous Performance Test–Time elapsed; PCRSpos, Parent–Child Relationship Scale––Total positive relationship subscale; HOMEphys, H.O.M.E. physical environment subscale.


    Acknowledgments
 Top
 Abstract
 Method
 Results
 Discussion
 Appendix
 Acknowledgments
 References
 
This study was conducted by the NICHD Early Child Care Research Network supported by NICHD through a cooperative agreement that calls for scientific collaboration between the grantees and the NICHD staff. Secondary analysis of data was supported by funding from the Kent State University Research Council. The authors would also like to thank Dan J. Neal, PhD for assistance with earlier drafts of this article, and J. Scott Long, PhD for aiding our interpretation of results.

Conflicts of interest: None declared.


    Footnotes
 
*Portions of this article were presented at the 2008 National Conference in Child Health Psychology, Miami, FL. Back

1 While there is no explicit assumption about distributions of dependent variables in OLS regression (Tabachnick & Fidell, 2007Go), they have a strong influence on the distribution of residuals (Atkins & Gallop, 2007Go). Back

2 The mathematical equations of all regression models discussed herein are detailed in Long & Freese (2006Go). Back

3 Consistent with previous research on pediatric injury, we considered using the Extraversion or Inhibition subscales of the CBQ (Schwebel & Plumert, 1999Go). However, in order to demonstrate these techniques, we opted to use the Anticipation scale because it was the only scale that emerged as a significant predictor of injury history. The fact that the Extraversion and Inhibition subscales did not predict injury history prospectively is consistent with analyses by Schwebel and Plumert (1999Go). Back

Received December 13, 2007; revision received May 5, 2008; accepted May 13, 2008


    References
 Top
 Abstract
 Method
 Results
 Discussion
 Appendix
 Acknowledgments
 References
 
Audrain-McGovern J, Rodriguez D, Tercyak KP, Neuner G, Moss HB. The impact of self-control indices on peer smoking and adolescent smoking progression. Journal of Pediatric Psychology (2006) 31:139–151.[Abstract/Free Full Text]

Atkins DC, Gallop RJ. Rethinking how family researchers model infrequent outcomes: A tutorial on count regression and zero-inflated models. Journal of Family Psychology (2007) 21:726–735.[CrossRef][Web of Science][Medline]

Caldwell BM, Bradley RH. Home observation for measurement of the environment (1984) Little Rock, AR: University of Arkansas at Little Rock.

Cameron AC, Trivedi PK. Econometric models based on count data: Comparisons and applications of some estimators and tests. Journal of Applied Econometrics (1986) 1:29–53.[CrossRef]

Gardner W, Mulvey EP, Shaw EC. Regression analyses of counts and rates: Poisson, overdispersed poisson, and negative binomial models. Psychological Bulletin (1995) 118:392–404.[CrossRef][Web of Science][Medline]

Hagan LK, Kuebli J. Mothers’ and fathers’ socialization of preschoolers’ physical risk taking. Journal of Applied Developmental Psychology (2007) 28:2–14.[CrossRef][Web of Science]

Hall DB. Zero-inflated poisson and binomial regression with random effects: A case study. Biometrics (2000) 56:1030–1039.[CrossRef][Web of Science][Medline]

Halperin JM, Sharma V, Greenblatt E, Schwartz ST. Assessment of the continuous performance test: Reliability and validity in a nonreferred sample. Psychological Assessment (1991) 3:603–608.[CrossRef]

Hammer TH, Landau JC. Methodological issues in the use of absence data. Journal of Applied Psychology (1981) 66:574–581.[CrossRef][Web of Science]

Harrison DA, Hulin CL. Investigations of absenteeism: Using event history models to study the absence-taking process. Journal of Applied Psychology (1989) 74:300–316.[CrossRef][Web of Science]

Hausman J, Hall BH, Griliches Z. Econometric models for count data with an application to the patents-R&D relationship. Econometrica (1984) 52:909–938.[CrossRef][Web of Science]

Jacques D, Finney J. Previous injuries and behavior problems predict children's injuries. Journal of Pediatric Psychology (1994) 19:79–89.[Abstract/Free Full Text]

Johnson RA, Wichern DW. Applied multivariate statistical analysis (1998) Englewood Cliffs, NJ: Prentice Hall.

Lambert D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics (1992) 34:1–14.[CrossRef][Web of Science]

Logan DE, Radcliffe J, Smith-Whitley K. Parent factors and adolescent sickle cell disease: Associations with patterns of health service use. Journal of Pediatric Psychology (2002) 27:475–484.[Abstract/Free Full Text]

Long JS, Freese J. Regression models for categorical dependent variables using stata (2006) 2nd, ed. College Station, TX: Stata Press.

MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of dichotomization of quantitative variables. Psychological Methods (2002) 7:19–40.[CrossRef][Web of Science][Medline]

Morrongiello BA. Caregiver supervision and child-injury risk: I. Issues in defining and measuring supervision; II. Findings and directions for future research. Journal of Pediatric Psychology (2005) 30:536–552.[Abstract/Free Full Text]

Morrongiello BA, Hogg K. Mothers’ reactions to children misbehaving in ways that can lead to injury: Implications for gender differences in children's risk taking and injuries. Sex Roles (2004) 50:103–118.[CrossRef][Web of Science]

Morrongiello BA, Ondejko L, Littlejohn A. Understanding toddlers’ in-home injuries: I. Context, correlates, and determinants. Journal of Pediatric Psychology (2004) 29:415–431.[Abstract/Free Full Text]

National Center for Injury Prevention and Control. CDC injury fact book (2006) Atlanta, GA: Centers for Disease Control and Prevention.

National Safety Council. Injury facts: 2004 edition (2004) Itasca, IL: Author.

NICHD Early Child Care Research Network. The interaction of child care and family risk in relation to child development at 24 and 36 months. Applied Developmental Science (2000) 6:144–156.[CrossRef]

Pianta. Parent-Child Relationship Scale (P-CRS). (1994) Unpublished Scale.

Pless CE, Pless IB. How well they remember: The accuracy of parent reports. Archives of Pediatric and Adolescent Medicine (1995) 149:553–558.[Abstract/Free Full Text]

Prinstein MJ, Meade CS, Cohen GL. Adolescent oral sex, peer popularity, and perceptions of best friends’ sexual behavior. Journal of Pediatric Psychology (2003) 28:243–249.[Abstract/Free Full Text]

Research Triangle Institute. Children's behavior questionnaire – fifty-four month mother/alternate caregiver questionnaire: Child care data report 214 (1999a) Research Triangle Park, NC: NICHD Early Child Care Study.

Research Triangle Institute. Caregiver's relationship with child – fifty-four month caregiver questionnaire: Child care data report 243 (1999b) Research Triangle Park, NC: NICHD Early Child Care Study.

Rivara FP, Barber M. Demographic analysis of childhood pedestrian injuries. Injuries (1985) 76:375–381.

Rosvold HE, Mirsky AE, Sarason I, Bransome ED Jr, Beck LH. A continuous performance test of brain damage. Journal of Consulting Psychology (1956) 20:343–350.[CrossRef][Web of Science][Medline]

Rothbart MK, Ahadi SA, Hershey KL. Temperament and social behavior in childhood. Merrill-Palmer Quarterly (1994) 40:21–39.

Sano Y, Jeong Y, Acock AC, Zvonkovic AM. Working with count data: Practical demonstration of Poisson, negative binomial, and zero-inflated regression models. In: Proceedings of the 35th Annual Theory Construction and Research Methodlology Workshop of the National Council of Family Relations, (2005) Phoenix,: AZ.

Schwebel DC, Brezausek CM. The role of context in risk for pediatric injury: Influences from the home and child care environments. Merrill-Palmer Quarterly (2007) 53:105–130.

Schwebel DC, Brezausek CM, Ramey SL, Ramey CT. Interactions between child behavior patterns and parenting: Implications for children's unintentional injury risk. Journal of Pediatric Psychology (2004) 29:93–104.[Abstract/Free Full Text]

Schwebel DC, Gaines J. Pediatric unintentional injury: Behavioral risk factors and implications for prevention. Journal of Developmental and Behavioral Pediatrics (2007) 28:245–254.[CrossRef][Web of Science][Medline]

Schwebel DC, Plumert JM. Longitudinal and concurrent relations among temperament, ability estimation, and injury proneness. Child Development (1999) 70:700–712.[CrossRef][Web of Science][Medline]

StataCorp. Stata Statistical Software: Release 8 (2003) College Station, TX: StataCorp LP.

Sturman MC. Multiple approaches to analyzing count data in studies of individual differences: The propensity for type I errors, illustrated with the case of absenteeism prediction. Educational and Psychological Measurement (1999) 59:414–430.[Abstract/Free Full Text]

Tabachnick BG, Fidell LS. Using multivariate statistics, 5th ed (2007) Boston: Allyn and Bacon.

UCLA: Academic Technology Services, Statistical Consulting Group. Stata annotated output: Zero-inflated poisson regression. Retrieved April 30, 2008, from http://www.ats.ucla.edu/stat/stata/output/Stata_zip.htm.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Pediatr PsycholHome page
B. T. Karazsia and M. H. M. van Dulmen
Assessing Injuries with Proxies: Implications for Understanding Concurrent Relations and Behavioral Antecedents of Pediatric Injuries
J. Pediatr. Psychol., May 18, 2009; (2009) jsp036v1.
[Abstract] [Full Text] [PDF]


Home page
J Pediatr PsycholHome page
D. C. Schwebel and C. M. Brezausek
Brief Report: Unintentional Injury Risk among Children with Sensory Impairments
J. Pediatr. Psychol., April 22, 2009; (2009) jsp033v1.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
33/10/1076    most recent
jsn055v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Karazsia, B. T.
Right arrow Articles by van Dulmen, M. H. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Karazsia, B. T.
Right arrow Articles by van Dulmen, M. H. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?