Skip to main content

Developing a predictive tool for psychological well-being among Chinese adolescents in the presence of missing data



Multi-dimensional behavioral rating scales like the CBCL and YSR are available for diagnosing psychosocial maladjustment in adolescents, but these are unsuitable for large-scale usage since they are time-consuming and their many sensitive questions often lead to missing data. This research applies multiple imputation to tackle the effects of missing data in order to develop a simple questionnaire-based predictive instrument for psychosocial maladjustment.


Questionnaires from 2919 Chinese sixth graders in 21 schools were collected, but 86% of the students were missing one or more of the variables for analysis. Fifteen (10 training, 5 validation) samples were imputed using multivariate imputation chain equations. A ten-variable instrument was constructed by applying stepwise variable selection algorithms to the training samples, and its predictive performance was evaluated on the validation samples.


The instrument had an AUC of 0.75 (95% CI: 0.73 to 0.78) and a calibration slope of 0.98 (95% CI: 0.86 to 1.09). The prevalence of psychosocial maladjustment was 18%. If a score of > 1 was used to define a negative test, then 80% of the students would be classified as negative. The resulting test had a diagnostic odds ratio of 5.64 (95% CI: 4.39 to 7.24), with negative and positive predictive values of 88% and 43%, and negative and positive likelihood ratios of 0.61 and 3.41, respectively.


Multiple imputation together with internal validation provided a simple method for deriving a predictive instrument in the presence of missing data. The instrument's high negative predictive value implies that in populations with similar prevalences of psychosocial maladjustment test-negative students can be confidently excluded as being normal, thus saving 80% of the resources for confirmatory psychological testing.

Peer Review reports


China has undergone rapid urbanization and economic development in the past three decades with its urban population increasing from 18% to 46% between 1978 and 2008 [1]. However, accompanied with these changes, there has also been a disintegration of traditional family and social-supportive networks (e.g. divorce rates have risen from 0.07% in 1990 to 0.17% in 2008 [1]), contributing to greater stress among children and adolescents [2]. In a meta-analysis of 40 studies, self-reported anxiety levels were observed to have increased 0.7 standard deviations from 1992 to 2005, and anxiety levels were positively correlated with the Gini coefficient, divorce rate, unemployment rate, and crime rate [3]. Another analysis reported urban living to be a risk factor for drug use and casual sex [4]. Depression, social problems, and substance abuse were more prevalent among adolescents lacking family and community social capital [57], and children of rural-to-urban migrant workers were more prone to separation anxiety and depression because of heightened parental-child conflicts and discrimination at school [8]. The ignorance or under-detection of the effects of such turmoil and stresses on adolescents can translate into serious societal problems, while early intervention can be effective and beneficial [9, 10].

Different multi-dimensional rating scales have been developed for detecting adolescents with behavioral abnormalities [11, 12], and the ones that have been most frequently used include the Achenbach System of Empirically Based Assessment [13], Conners' Comprehensive Behavior Rating Scales [14], Behavioral Assessment System for Children [15], and the Revised Behavior Problem Checklist [16]. These tools are however unsuitable for large-scale screening since they are time-consuming and susceptible to non-response due to their many sensitive questions. The latter drawback can lead to large quantities of missing data in the analysis, resulting not only in a major loss of statistical power but possibly biased results. Instead, an effective instrument should be simple to administer, easy to answer, specific to its target population, and minimize the number of sensitive questions with labeling effects. This research addresses these concerns by considering the effects of missing data when developing a simple indigenous predictive tool for a large cohort of Chinese adolescents to assess their psychological adjustment as measured by several multi-dimensional behavioral rating scales.



The grade 6 students from twenty-one middle schools in Shanghai, China, were recruited, and altogether 2919 students (out of 2956) participated in the study. The schools were chosen to cover the span of academic levels, with 6 schools belonging to level I (high), 10 schools belonging to level II (middle), and 5 schools belonging to level III (low). The parents of these students were also asked to fill out a psychological assessment of their children. However, two schools were unable to follow this part of the protocol, and likewise only 2229 parents participated in the survey. Ethics approval was obtained and granted by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences.


Psychosocial adjustment was assessed using the Chinese validated versions of the 113-item syndrome scale of the Child Behavior Checklist (CBCL), the 112-item syndrome scale of the Youth Symptom Rating (YSR) [17, 18], and the Mental Health Inventory (MHI) [19]. Students were classified as having psychosocial maladjustment if he/she was above the gender specific 95th percentile on either one of the three multi-component scales. The 95th percentiles for the YSR and CBCL were based on normal values obtained from a Chinese reference population in Hong Kong [20], while the 95th percentile for the MHI were based on the current sample of Shanghai students. The decision to define the outcome measure primarily using Achenbach's CBCL and YSR scales was due to their comprehensive assessment of behavioral problems, solid psychometric properties, ability to collect information from different sources, extensive use in clinical research and practice, and availability of Chinese translations and Chinese reference norms [20, 21].

A Student Information Form (SIF) consisting of 80 questions was developed to obtain information on the demographic, familial, health, academic, and social supportive characteristics of the students. The questions were based on literature review of risk factors for adolescents' problems, and were solicited from a panel of psychiatrists, psychologists, social workers, school principals and teachers, parents, and epidemiologists. An earlier version, the Hong Kong Student Information Form, had been developed and used in the Hong Kong Understanding Adolescents Project [2224].

Data Analysis

The purpose of the analysis was to develop and validate a psychosocial maladjustment predictive tool using a small number of questions from the SIF, while considering the effects of missing data. A typical strategy of dealing with missing data is to simply eliminate any observation that has incomplete data, but this type of "listwise deletion" or "complete case analysis" can result in the lost of many observations and statistical power when the analysis involves multiple variables [25]. Moreover, biased estimates are produced unless the data are missing completely at random [26, 27]; i.e. the observations that have missing values are a random sample of the entire cohort. In practice, the reason data are missing may be related to different variables collected in a study; e.g. depressed and anxious students may be less likely to complete a psychological survey. An appropriate analysis should then use these variables to model the likelihood of having missing data, and incorporate the uncertainty associated with the modeling process. In this study, multiple imputation (MI) [28] was used to address these two concerns, and the multivariate imputation by chained equations (MICE) [2931] algorithm implemented by the IVEware software [32] was used to perform the imputations. Fifteen imputed datasets were generated; ten datasets were used to "train" or develop the prognostic model for psychosocial maladjustment, and the remaining five datasets were used to validate the model. The stepwise variable selection method in logistic regression (with entry and exit cutoffs both set at p = 0.05) was used to select the SIF variables associated with psychosocial maladjustment for each of the ten training datasets. SIF variables selected with at least 70% frequency in the ten regression models were then combined into an additive SIF score using the mean of the regression coefficients of the SIF variables as weights. The area under the receiver-operating characteristic curve (AUC) and the calibration slope (i.e. the regression coefficient in the logistic regression of psychosocial maladjustment on SIF score) of this score were then calculated by combining the respective estimates across the five imputed validation datasets using PROC MIANALYZE [33]. The AUC serves as a discriminative measure of the score's ability in distinguishing between high versus low risk students. The calibration slope measures how well predicted probabilities agree with the observed probabilities, and equals one in the ideal case. Slopes of less than one imply over-optimistic predictions where low predictions are too low and high predictions are too high [34]. For illustrative purposes, a convenient cutoff was also chosen to dichotomize the score, and sensitivity, specificity, positive and negative predictive values, likelihood ratio of a positive test and negative test, and diagnostic odds ratio were calculated.


The characteristics of the cohort of Shanghai 6th graders are presented in Table 1. The mean age was 11.9 years and 53% of them were male. Around 4.8% of the students felt unfavorably about their family, 6.7% had unfavorable opinions about their school, 3.8% had unfavorable ratings about their health, and 5.6% rated their social skills and social support environment as being unfavorable. The amount of missing data varied substantially across different variables. The median amount of missing data was 1.8% (range: 1.3% to 11.9%) among the SIF items, with 58.5% of the students missing at least one SIF item. The three case defining multi-component scales, CBCL, YSR, and MHI, had 59.3%, 27.1%, and 27.7% of their data missing, respectively, thus rendering 70% of the students not having a psychosocial maladjustment case definition. These students who had missing case definitions were more likely to be non-local students with lower academic rankings, less likely to argue with parents, and rated their social skills and social support environment more favorably. After applying MI the prevalence of psychosocial maladjustment was estimated to be 18.4% (95% CI: 16.5% to 20.2%), which was significantly lower than the 24.5% observed in the original sample.

Table 1 Characteristics of Cohort of 2919 Shanghai Students

Ten SIF variables were chosen based on applying the stepwise selection algorithm to the training datasets, and a SIF-Predictive Tool (SIF-PT) was constructed from these 10 variables (Table 2). Being male, having more positive feelings towards the family, spending less time on homework during weekends, having a good appetite, having mostly friends of the same sex, and having none or only one karaoke bar around the neighborhood all contributed positively to the SIF-PT. In contrast, being often ridiculed and ignored by classmates, regularly having difficulties in mathematics, and sleeping less than 8 hours a day were negatively associated with the SIF-PT. Higher SIF-PT scores were associated with a lower likelihood of psychosocial maladjustment. In the validation samples, the proportion of psychosocial maladjustment was 2.4% for students with SIF-PT scores greater than 3, but increased to 60.1% for those with scores of zero or less. The SIF-PT had an AUC of 0.75 (95% CI: 0.73 to 0.78) and its calibration slope was 0.98 (95% CI: 0.86 to 1.09) (Table 3). In order to contrast the MI approach with the listwise deletion approach of dealing with missing data, a stepwise logistic regression of psychosocial maladjustment on the SIF variables was also performed. This complete-case analysis had a sample size of only 415 and selected nine variables, three of which were included in the SIF-PT. The composite score constructed from this analysis had an AUC of 0.72 (95% CI: 0.69 to 0.76) and a calibration slope of 0.32 (95% CI: 0.26 to 0.38) in the validation samples.

Table 2 Definition of Student Information Form (SIF) Predictive Tool
Table 3 Characteristics of SIF Predictive Tool

When a SIF-PT score of > 1 was used to define a negative test, 79.8% of the students were classified as test negative, which is roughly the same percentage of students without psychosocial maladjustment. The resulting diagnostic test had a specificity of 86.0% (95% CI: 84.5% to 87.4%), sensitivity of 48.0% (95% CI: 42.8% to 53.1%), positive predictive value of 43.1% (95% CI: 38.9% to 47.3%), and negative predictive value of 88.2% (95% CI: 86.5% to 89.8%) in the validation samples (Table 3). Higher cutoffs can increase the sensitivity and reduce the number of false negatives, but the tradeoff is lower specificity and more false positives. For example, a cutoff of 2 yielded a test with a sensitivity of 83% but a specificity of 49%. In general, fixing a specific cutoff value is a difficult decision since the choice depends on the case prevalence in the target population and the costs of false positive and false negative classifications [35]. If equal costs and a 50% case prevalence were assumed, then the optimal cutoff can be obtained by maximizing the Youden index (or equivalently, minimizing the false positive plus false negative rates). The resulting test with this optimal cutoff of 1.6 had a sensitivity of 68.4% (95% CI: 63.7% to 73.1%), specificity of 67.9% (95% CI: 65.8% to 69.9%), positive predictive value of 32.0% (95% CI: 29.2% to 34.8%), and negative predictive value of 90.6% (95% CI: 89.0% to 92.3%) in the validation samples.


Missing data is a critical issue in this research. Although the ideal solution to dealing with missing data is not to have any, practical constraints make this difficult to comply. For example, two schools chose not to administer the CBCL since they think it would over-burden the parents, and thus 24% of the sample started without CBCL information. Also, there was a substantial amount of incomplete forms since many parents felt certain questions in the CBCL were intrusive. The response rate might have been improved if interviewers individually administered the CBCL to each parent, but due to limited resources the form could only be administered in a group setting. Ultimately, 70% of the outcome variable ended up missing. In such situations, the unguarded use of stepwise variable selection methods can select incorrect subsets of items and lead to inflated model performance. MI, however, can be used to account for the effects of missing data, and the fifteen imputed datasets were separated into training and validation sets to minimize the inclusion of irrelevant items and properly assess the performance of the SIF-PT. MI assumes that missing values may be dependent on observed variables but not on unobserved variables; i.e. missing at random [28], and the occurrence of missing data was verified to be associated with various student characteristics and behavior. Obviously, missing data may be related to unobserved variables; i.e. missing not at random [26]. However, appropriate analyses of data not missing at random highly depend on the choice of the postulated missing data model [36]. On the other hand, MI may still yield good estimates and standard errors even when the missing at random assumption is at fault [37]. Similar to previous studies [38, 39], the MICE algorithm was employed for imputing the missing data as it provided greater stability and flexibility when handling many categorical variables. Amber, Omar, and Royston [29] also found that the MICE procedure yielded predictions with low bias and good coverage. Ten imputed datasets were used to develop the SIF-PT, and five additional imputed datasets were used to validate it in terms of its predictive ability. Clark and Altman [38] also developed their ovarian cancer prognostic model based on ten imputed datasets, and Heymans et al. [39] found that similar models were selected using ten versus one hundred imputations. A 70% threshold and a 0.05 significance level for selecting the SIF variables were adopted since the 70% cutoff has been shown to provide reasonable discriminative and calibrative properties [39], and the 5% significance level was found to be suitable for data where about half of the predictors were non-influential [40].

Although MI was successfully applied for analysis, its validity cannot be guaranteed especially with such large amounts of missing data as in the current study. Even further simulations can only provide anecdotal evidence since one can never ascertain the true values of the missing data for a specific study. However, the benefits of MI in this study lie in its theoretical and practical advantages over other common methods of handling missing data [28]. For example, a complete-case analysis discarded almost 86% of the cases, and the results are likely biased since the occurrence of missing data was dependent on variables like residence status, academic standing, relationship with parents, and quality of social support. The composite score obtained from this complete-case analysis had only three variables in common with the ten-variable SIF-PT. Its calibration slope estimate was also severely biased downward from one, indicating that it will be overly optimistic for prediction purposes.

In this study, the SIF-PT was internally validated using five imputed datasets. A good tool should be customized to the socio-cultural background of its target population in order to maximize predictability. For example, the small percentage (< 5%) of unfavorable attitudes towards one's family and the large percentage (21%) of students spending at least 5 hours doing homework during weekends are more characteristic of Asian children rather than those in North America or Europe. Moreover, the question concerning the number of karaoke bars around the child's neighborhood is distinctive to the type of social environment urban Chinese encounter. On the other hand, the tool should also have sufficient flexibility to encompass target groups beyond the original sampling frame; e.g. 7th or 8th graders, and other Chinese cities besides Shanghai. Admittedly, it is not easy to balance between these two competing objectives. Likewise, the external validity/generalizability of the SIF-PT to other Chinese cities awaits further research.


Psychosocial maladjustment among adolescents can have serious consequences, and efforts at early detection and prevention are essential. Standardized rating scales like the CBCL and YSR are time-consuming, and their sensitive nature makes them susceptible to non-response. Such checklists are therefore inappropriate for large-scale evaluation, and the SIF Predictive Tool was developed to handle these deficiencies. Comprising of ten questions relating to the student's family, school, health, and social environment, it can be easily and quickly administered and is significantly associated with the risk of psychosocial maladjustment. In the validation samples, students with SIF-PT scores greater than three had a 2.4% risk of psychosocial maladjustment, while those with scores of zero or less showed a 25-fold increase in risk.

The SIF-PT's high negative predictive value implies that for populations with around 18% prevalence of psychosocial maladjustment one can forgo administering the CBCL, YSR, and MHI to test-negative students since they can be accurately predicted to be without psychosocial maladjustment. For example, psychological testing costs can be saved for 80% of the population who have SIF-PT scores greater than one since 88 out of 100 of these test-negative students can be correctly diagnosed as without psychosocial maladjustment. In general, for each individual student, a likelihood ratio can be derived from his/her SIF-PT score, and Bayes rule can be applied to compute the student's predictive probability of psychosocial maladjustment. For example, the likelihood ratios for SIF-PT scores ≤ 0, 0.1 to 1, 1.1 to 1.5, 1.6 to 2, 2.1 to 2.5, 2.5 to 3, and > 3 were 6.85, 2.46, 1.16, 0.83, 0.48, 0.22, and 0.11, respectively, in the validation samples. For a student with an ambivalent prior diagnosis (i.e., a 0.5 pre-test probability of psychosocial maladjustment), his/her post-test probability will be 0.87 and 0.71 for SIF-PT scores ≤ 0 and from 0.1 to 1, respectively, or 0.32, 0.18, and 0.10 for SIF-PT scores from 2.1 to 2.5, 2.5 to 3, and > 3, respectively. The former post-test probabilities support a psychosocial maladjustment diagnosis, while the latter post-test probabilities serve to exclude the possibility of maladjustment.


  1. 1.

    National Bureau of Statistics of China: China Statistics Yearbook 2009. 2009, Beijing: China Statistics Publisher

    Google Scholar 

  2. 2.

    Levine KA, Zhu K: The changing context of China: Emerging issues for school social work. Int Soc Work. 2010, 53: 339-352. 10.1177/0020872809359751.

    Article  Google Scholar 

  3. 3.

    Xin Z, Zhang L, Liu D: Birth cohort changes of Chinese adolescents' anxiety: A cross-temporal meta-analysis, 1992 - 2005. Pers Indiv Differ. 2010, 48: 208-212. 10.1016/j.paid.2009.10.010.

    Article  Google Scholar 

  4. 4.

    Yang X, Luo H: Migration, urbanization, and drug use and casual sex in China: A multi-level analysis. Environ Plann A. 2009, 41: 581-597. 10.1068/a40297.

    Article  Google Scholar 

  5. 5.

    Wu Q, Xie B, Chou CP, Palmer PH, Gallaher PE, Johnson CA: Understanding the effect of social capital on the depression of urban Chinese adolescents: An integrative framework. Am J Comm Psycho. 2010, 45: 1-16. 10.1007/s10464-009-9284-2.

    Article  Google Scholar 

  6. 6.

    Liu X, Guo C, Okawa M, Zhai J, Li Y, Uchiyama M, Neiderhiser J, Kurita H: Behavioral and emotional problems in Chinese children of divorced parents. J Am Acad Child Psy. 2000, 39: 896-903. 10.1097/00004583-200007000-00019.

    CAS  Article  Google Scholar 

  7. 7.

    Unger J, Li Y, Johnson A, Gong J, Chen X, Li C, Trinidad DR, Tran NT, Lo AT: Stressful life events among adolescents in Wuhan, China: Associations with smoking, alcohol use, and depressive symptoms. Int J Behav Med. 2001, 8: 1-18. 10.1207/S15327558IJBM0801_01.

    Article  Google Scholar 

  8. 8.

    Wong D, Li CY: Correlates of psychological well-being of children of migrant workers in Shanghai, China. Soc Psych Psych Epid. 2009, 44: 815-824. 10.1007/s00127-009-0003-y.

    Article  Google Scholar 

  9. 9.

    Kraag G, Zeegers MP, Hosman C, Abu-Saad HH: School programs targeting stress management in children and adolescents: A meta-analysis. J Sch Psychol. 2006, 44: 449-72. 10.1016/j.jsp.2006.07.001.

    Article  Google Scholar 

  10. 10.

    Yu DL, Seligman ME: Preventing depressive symptoms in Chinese children. Prevention and Treatment. 2002, 5: 9-

    Article  Google Scholar 

  11. 11.

    Reitman D, Hummel R, Franz D, Gross AM: A review of methods and instruments for assessing externalizing disorders: Theoretical and practical considerations in rendering a diagnosis. Clin Psychol Rev. 1998, 18: 555-584. 10.1016/S0272-7358(98)00003-8.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    McMahon RJ, Frick PJ: Conduct and oppositional disorders. Assessment of childhood disorders. 2007, New York: Guilford Press, 4

    Google Scholar 

  13. 13.

    Achenbach TM, Rescorla LA: Manual for the ASEBA school-age forms and profiles. 2001, Burlington: University of Vermont Research Center for Children, Youth, & Families

    Google Scholar 

  14. 14.

    Conners CK: Manual for Conners' rating scales-revised. 1997, North Tonawanda, NY: Multi-health Systems, Inc

    Google Scholar 

  15. 15.

    Reynolds CR, Kamphaus RW: BASC-2: Behavior assessment system for children, second edition manual. 2004, Circle Pines, MN: American Guidance Service

    Google Scholar 

  16. 16.

    Quay HC, Peterson DR: Revised behavior problem checklist, PAR edition: Professional manual. 1996, Odessa, FL: Psychological Assessment Resources

    Google Scholar 

  17. 17.

    Achenbach TM: Integrative guide to the 1991 CBCL/4-18, YSR, and TRF profiles. 1991, Burlington: University of Vermont Department of Psychiatry

    Google Scholar 

  18. 18.

    Leung PWL, Kwong SL, Tang CP, Ho TP, Hung SF, Lee CC, Hong SL, Chiu CM, Liu WS: Test-retest reliability and criterion validity of the Chinese version of CBCL, TRF and YSR. J Child Psychol Psychiatry. 2006, 47: 970-73. 10.1111/j.1469-7610.2005.01570.x.

    Article  PubMed  Google Scholar 

  19. 19.

    Wu SC, Krause NM, Chiang TL, Wu HY: The structure of the Mental Health Inventory among Chinese in Taiwan. Med Care. 1992, 30: 659-76. 10.1097/00005650-199208000-00001.

    Article  PubMed  Google Scholar 

  20. 20.

    Leung PWL, Ho TP, Hung SF, Lee CC, Tang CP: Child Behaviour Checklist - CBCL - Hong Kong norm. 1998, The Chinese University of Hong Kong (Unpublished manuscript)

    Google Scholar 

  21. 21.

    Achenbach TM, McConaughy SH: Empirically based assessment of child and adolescent psychopathology: Practical applications. 1997, Thousand Oaks: Sage University Press, 2

    Google Scholar 

  22. 22.

    Education and Manpower Bureau: User manuals for Understanding Adolescent Project. 2004, Hong Kong SAR Government

  23. 23.

    Lee TY, Shek DTL, Kwong WM: Chinese approaches to understanding and building resilience in at-risk populations. Child Adolesc Psychiatr Clin N Am. 2007, 16: 377-92. 10.1016/j.chc.2006.12.001.

    Article  PubMed  Google Scholar 

  24. 24.

    Wong KY, Lee TY: Professional discourse among social workers working with at-risk adolescents in Hong Kong: risk or resilience?. Pathways to resilience: A handbook of theory, methods, and intervention. 2005, Thousand Oaks: SAGE University Press

    Google Scholar 

  25. 25.

    Allison PD: Missing data. 2002, Thousand Oaks: SAGE University Papers

    Book  Google Scholar 

  26. 26.

    Schafer JL, Graham JW: Missing data: Our view of the state of the art. Psychol Methods. 2002, 7: 147-77.

    Article  PubMed  Google Scholar 

  27. 27.

    Peugh JL, Enders CK: Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev Educ Res. 2004, 74: 525-56. 10.3102/00346543074004525.

    Article  Google Scholar 

  28. 28.

    Rubin DB: Multiple imputation for nonresponse in surveys. 1987, New York: John Wiley & Sons

    Book  Google Scholar 

  29. 29.

    Ambler G, Omar RZ, Royston P: A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res. 2007, 16: 277-98. 10.1177/0962280206074466.

    Article  PubMed  Google Scholar 

  30. 30.

    Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol. 2001, 27: 85-95.

    Google Scholar 

  31. 31.

    Van Buuren S, Brand JPL, Groothius-Oudshoorn CGM, Rubin DB: Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006, 76: 1049-64. 10.1080/10629360600810434.

    Article  Google Scholar 

  32. 32.

    Raghunathan TE, Solenberger PW, Van Hoewyk J: IVEware: Imputation and Variance Estimation Software User Guide. 2002, Michigan: Survey Research Center, Institute for Social Research, University of Michigan

    Google Scholar 

  33. 33.

    SAS Institute Inc: SAS/STAT® 9.1 user's guide. 2004, Cary: SAS Institute Inc

    Google Scholar 

  34. 34.

    Miller ME, Hui SL, Tierney WM: Validation techniques for logistic regression models. Stat Med. 1991, 10: 1213-26. 10.1002/sim.4780100805.

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Zweig MH, Campbell G: Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clin Chem. 1993, 39: 561-577.

    CAS  PubMed  Google Scholar 

  36. 36.

    Graham JW: Missing data analysis: Making it work in the real world. Annu Rev Psychol. 2009, 60: 549-76. 10.1146/annurev.psych.58.110405.085530.

    Article  PubMed  Google Scholar 

  37. 37.

    Collins LM, Schafer JL, Kam CM: A comparison of inclusive and restrictive strategies in modern missing-data procedures. Psychol Methods. 2001, 6: 330-51.

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Clark TG, Altman DG: Developing a prognostic model in the presence of missing data, an ovarian cancer case study. J Clin Epidemiol. 2003, 56: 28-37. 10.1016/S0895-4356(02)00539-5.

    Article  PubMed  Google Scholar 

  39. 39.

    Heymans MW, Van Buuren S, Knol DL, Mechelen WV, de Vet HCW: Variable selection under multiple imputation using the bootstrap in a prognostic study. BMC Med Res Methodol. 2007, 7: 33-42. 10.1186/1471-2288-7-33.

    Article  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Ambler G, Brady AR, Royston P: Simplifying a prognostic model: a simulation study based on clinical data. Stat Med. 2002, 21: 3803-22. 10.1002/sim.1422.

    Article  PubMed  Google Scholar 

Pre-publication history

  1. The pre-publication history for this paper can be accessed here:

Download references


This work was partially supported by The Youth Foundation, Hong Kong, and the Shanghai Leading Academic Discipline Project #B118. The authors wish to thank Xu Zhening, the researchers at the Counseling Service of East China Normal University, Shanghai, PRC, and all the teachers for their dedicated contribution to the study.

Author information



Corresponding author

Correspondence to Henry S Lynn.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All authors made substantial contribution to the manuscript. BYT participated in the design of this research and in the writing of the manuscript. HSL planned the design of this study, performed the analyses, and drafted the manuscript. All authors have read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Lynn, H.S., Tsang, B.Y. Developing a predictive tool for psychological well-being among Chinese adolescents in the presence of missing data. BMC Med Res Methodol 11, 119 (2011).

Download citation


  • Missing data
  • Multiple imputation
  • Psychological maladjustment
  • Predictive tool