Skip to main content

Table 2 Average area under the receiver operating characteristic curve (AUC) of models fit separately with groups of risk factors for all participants, asthmatics, and non-asthmatics, for 50 different random training sets and their corresponding holdout test datasets

From: Understanding the importance of key risk factors in predicting chronic bronchitic symptoms using a machine learning approach

 

Risk factor groupingsa

AUC: CV

AUC: Across- participants test set

AUC: Within- participants test set

Across-subject test set accuracy at the optimal thresholdc

All participants

All predictors

0.77

0.78

0.75

0.74

Sociodemographic

0.56

0.56

0.58

0.55

Indoor/home exposures

0.54

0.55

0.56

0.60

Traffic/Air pollution exposures

0.52

0.53

0.52

0.55

Symptoms/medication use

0.75

0.76

0.73

0.75

Asthma/eczema

0.68

0.69

0.67

0.71

BCP (lag 1) onlyb

 

0.71

0.68

0.79

BCP (lag 1) and traffic/air pollution exposures

0.71

0.70

0.68

0.79

Top 10 risk factors

0.77

0.78

0.75

0.75

Asthmatics

All predictors

0.70

0.71

0.69

0.67

Sociodemographic

0.52

0.55

0.54

0.52

Indoor/home exposures

0.50

0.54

0.54

0.52

Traffic/Air pollution exposures

0.49

0.51

0.52

0.51

Symptoms/medication use

0.70

0.71

0.69

0.67

Asthma/eczema

0.54

0.56

0.56

0.50

BCP (lag 1) onlyb

 

0.68

0.67

0.68

BCP (lag 1) and traffic/air pollution exposures

0.67

0.68

0.67

0.68

Top 10 risk factors

0.70

0.71

0.68

0.67

Non-Asthmatics

All predictors

0.71

0.71

0.70

0.76

Sociodemographic

0.54

0.55

0.56

0.49

Indoor/home exposures

0.52

0.54

0.56

0.51

Traffic/Air pollution exposures

0.51

0.52

0.51

0.57

Symptoms/medication use

0.69

0.70

0.68

0.77

Asthma/eczema

0.55

0.57

0.57

0.71

BCP (lag 1) onlyb

 

0.67

0.64

0.81

BCP (lag 1) and traffic/air pollution exposures

0.67

0.66

0.64

0.84

Top 10 risk factors

0.71

0.72

0.69

0.75

  1. aVariables in each risk factor grouping are listed in the text
  2. bCross validation was not able to apply to the GBM models with 1 predictor variable. Thus, CV AUC and optimal number of tree based on cross validation were not produced. The total number of 2000 trees was used in the GBM models with 1 predictor variable
  3. cThe optimal threshold was determined by using the predicted probabilities from the cross-validation set