Skip to main content

Table 2 Summary of methods performance

From: Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia

Method

Stability of model selection

Incorporating model uncertainty

Computational efficiency (running time)a

I. STEPWISE REGRESSION METHODS

Backward elimination (AIC)

Moderate

Do not incorporate model uncertainty in the estimation of regression coefficients and standard errors.

Model selection: 5.4 s

Estimation of SE with bootstrapb: 30.9 s

Backward elimination (BIC)

Very poor

Model selection: 5.6 s

Estimation of SE with bootstrapb: 15.0 s

Backward elimination (LRT)

Moderate

Model selection: 5.1 s

Estimation of SE with bootstrapb: 19.2 s

Forward selection (AIC)

Moderate

Model selection: 2.8 s

Estimation of SE with bootstrapb: 28.5 s

Forward selection (BIC)

Very poor

Model selection: 1.9 s

Estimation of SE with bootstrapb: 13.8 s

Forward selection (LRT)

Moderate

Model selection: 3.1 s

Estimation of SE with bootstrapb: 19.8 s

II. PENALIZED REGRESSION METHODS

Lasso

Poor (λmin)

Model uncertainty is partially incorporated into the estimation and inference procedure via λ tuning step, and estimation of standard errors using bootstrap.

Lasso algorithm: 0.02 s

Good (λ1se)

10-fold CV: 0.5 s

Estimation of SE with bootstrapb: 394.0 s

Adaptive lasso

Good (λmin)

Estimation of weights (ridge regression): 1.6 s

Good (λ1se)

Adaptive lasso algorithm: 0.02 s

10-fold CV: 0.5 s

Estimation of SE with bootstrapb: 411.2 s

Adaptive elastic net

Good (λmin)

Estimation of weights (ridge regression): 1.6 s

Good (λ1se)

Estimation of λ for L2 penalty (elastic net): 1.2 s

Adaptive elastic net algorithm: 0.2 s

10-fold CV: 1.4 s

Estimation of SE with bootstrapb: 3,265.3 s

III. BAYESIAN MODEL AVERAGING

Bayesian model averaging (using MCMC to search model space)

PIPs of regression covariates inform model selection. Bootstrap gave selection frequencies that were almost identical to PIPs (data not shown).

Model uncertainty is properly incorporated into the estimation of regression coefficients and their standard deviations (provided that MCMC chain converged and the algorithms managed to search the entire model space).

250.8 s

(1,000,000 iterations, chain converged)

  1. AIC Akaike Information Criterion, BIC Bayesian Information Criterion, CV cross-validation, LRT Likelihood Ratio Test, MCMC Markov Chain Monte Carlo, PIP posterior inclusion probability, SE standard error
  2. aThe analysis is run on a 1.7 GHz Intel(R) Core(TM) i5 processor with 4.00 GB of DDR3 memory
  3. bIn all cases of estimation of standard errors using bootstrap number of iterations = 2,000