Skip to main content

Table 2 Summary of methods performance

From: Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia

Method Stability of model selection Incorporating model uncertainty Computational efficiency (running time)a
I. STEPWISE REGRESSION METHODS
Backward elimination (AIC) Moderate Do not incorporate model uncertainty in the estimation of regression coefficients and standard errors. Model selection: 5.4 s
Estimation of SE with bootstrapb: 30.9 s
Backward elimination (BIC) Very poor Model selection: 5.6 s
Estimation of SE with bootstrapb: 15.0 s
Backward elimination (LRT) Moderate Model selection: 5.1 s
Estimation of SE with bootstrapb: 19.2 s
Forward selection (AIC) Moderate Model selection: 2.8 s
Estimation of SE with bootstrapb: 28.5 s
Forward selection (BIC) Very poor Model selection: 1.9 s
Estimation of SE with bootstrapb: 13.8 s
Forward selection (LRT) Moderate Model selection: 3.1 s
Estimation of SE with bootstrapb: 19.8 s
II. PENALIZED REGRESSION METHODS
Lasso Poor (λmin) Model uncertainty is partially incorporated into the estimation and inference procedure via λ tuning step, and estimation of standard errors using bootstrap. Lasso algorithm: 0.02 s
Good (λ1se) 10-fold CV: 0.5 s
Estimation of SE with bootstrapb: 394.0 s
Adaptive lasso Good (λmin) Estimation of weights (ridge regression): 1.6 s
Good (λ1se) Adaptive lasso algorithm: 0.02 s
10-fold CV: 0.5 s
Estimation of SE with bootstrapb: 411.2 s
Adaptive elastic net Good (λmin) Estimation of weights (ridge regression): 1.6 s
Good (λ1se) Estimation of λ for L2 penalty (elastic net): 1.2 s
Adaptive elastic net algorithm: 0.2 s
10-fold CV: 1.4 s
Estimation of SE with bootstrapb: 3,265.3 s
III. BAYESIAN MODEL AVERAGING
Bayesian model averaging (using MCMC to search model space) PIPs of regression covariates inform model selection. Bootstrap gave selection frequencies that were almost identical to PIPs (data not shown). Model uncertainty is properly incorporated into the estimation of regression coefficients and their standard deviations (provided that MCMC chain converged and the algorithms managed to search the entire model space). 250.8 s
(1,000,000 iterations, chain converged)
  1. AIC Akaike Information Criterion, BIC Bayesian Information Criterion, CV cross-validation, LRT Likelihood Ratio Test, MCMC Markov Chain Monte Carlo, PIP posterior inclusion probability, SE standard error
  2. aThe analysis is run on a 1.7 GHz Intel(R) Core(TM) i5 processor with 4.00 GB of DDR3 memory
  3. bIn all cases of estimation of standard errors using bootstrap number of iterations = 2,000