Skip to main content

Table 2 Average area under the receiver operating characteristic curve (AUC) of models fit separately with groups of risk factors for all participants, asthmatics, and non-asthmatics, for 50 different random training sets and their corresponding holdout test datasets

From: Understanding the importance of key risk factors in predicting chronic bronchitic symptoms using a machine learning approach

  Risk factor groupingsa AUC: CV AUC: Across- participants test set AUC: Within- participants test set Across-subject test set accuracy at the optimal thresholdc
All participants All predictors 0.77 0.78 0.75 0.74
Sociodemographic 0.56 0.56 0.58 0.55
Indoor/home exposures 0.54 0.55 0.56 0.60
Traffic/Air pollution exposures 0.52 0.53 0.52 0.55
Symptoms/medication use 0.75 0.76 0.73 0.75
Asthma/eczema 0.68 0.69 0.67 0.71
BCP (lag 1) onlyb   0.71 0.68 0.79
BCP (lag 1) and traffic/air pollution exposures 0.71 0.70 0.68 0.79
Top 10 risk factors 0.77 0.78 0.75 0.75
Asthmatics All predictors 0.70 0.71 0.69 0.67
Sociodemographic 0.52 0.55 0.54 0.52
Indoor/home exposures 0.50 0.54 0.54 0.52
Traffic/Air pollution exposures 0.49 0.51 0.52 0.51
Symptoms/medication use 0.70 0.71 0.69 0.67
Asthma/eczema 0.54 0.56 0.56 0.50
BCP (lag 1) onlyb   0.68 0.67 0.68
BCP (lag 1) and traffic/air pollution exposures 0.67 0.68 0.67 0.68
Top 10 risk factors 0.70 0.71 0.68 0.67
Non-Asthmatics All predictors 0.71 0.71 0.70 0.76
Sociodemographic 0.54 0.55 0.56 0.49
Indoor/home exposures 0.52 0.54 0.56 0.51
Traffic/Air pollution exposures 0.51 0.52 0.51 0.57
Symptoms/medication use 0.69 0.70 0.68 0.77
Asthma/eczema 0.55 0.57 0.57 0.71
BCP (lag 1) onlyb   0.67 0.64 0.81
BCP (lag 1) and traffic/air pollution exposures 0.67 0.66 0.64 0.84
Top 10 risk factors 0.71 0.72 0.69 0.75
  1. aVariables in each risk factor grouping are listed in the text
  2. bCross validation was not able to apply to the GBM models with 1 predictor variable. Thus, CV AUC and optimal number of tree based on cross validation were not produced. The total number of 2000 trees was used in the GBM models with 1 predictor variable
  3. cThe optimal threshold was determined by using the predicted probabilities from the cross-validation set