Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia

Morozova, Olga; Levina, Olga; Uusküla, Anneli; Heimer, Robert

doi:10.1186/s12874-015-0066-2

Table 2 Summary of methods performance

From: Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia

Method	Stability of model selection	Incorporating model uncertainty	Computational efficiency (running time)^a
I. STEPWISE REGRESSION METHODS
Backward elimination (AIC)	Moderate	Do not incorporate model uncertainty in the estimation of regression coefficients and standard errors.	Model selection: 5.4 s
Backward elimination (AIC)	Moderate		Estimation of SE with bootstrap^b: 30.9 s
Backward elimination (BIC)	Very poor		Model selection: 5.6 s
Backward elimination (BIC)	Very poor		Estimation of SE with bootstrap^b: 15.0 s
Backward elimination (LRT)	Moderate		Model selection: 5.1 s
Backward elimination (LRT)	Moderate		Estimation of SE with bootstrap^b: 19.2 s
Forward selection (AIC)	Moderate		Model selection: 2.8 s
Forward selection (AIC)	Moderate		Estimation of SE with bootstrap^b: 28.5 s
Forward selection (BIC)	Very poor		Model selection: 1.9 s
Forward selection (BIC)	Very poor		Estimation of SE with bootstrap^b: 13.8 s
Forward selection (LRT)	Moderate		Model selection: 3.1 s
Forward selection (LRT)	Moderate		Estimation of SE with bootstrap^b: 19.8 s
II. PENALIZED REGRESSION METHODS
Lasso	Poor (λ_min)	Model uncertainty is partially incorporated into the estimation and inference procedure via λ tuning step, and estimation of standard errors using bootstrap.	Lasso algorithm: 0.02 s
	Good (λ_1se)		10-fold CV: 0.5 s
	Good (λ_1se)		Estimation of SE with bootstrap^b: 394.0 s
Adaptive lasso	Good (λ_min)		Estimation of weights (ridge regression): 1.6 s
	Good (λ_1se)		Adaptive lasso algorithm: 0.02 s
			10-fold CV: 0.5 s
			Estimation of SE with bootstrap^b: 411.2 s
Adaptive elastic net	Good (λ_min)		Estimation of weights (ridge regression): 1.6 s
	Good (λ_1se)		Estimation of λ for L2 penalty (elastic net): 1.2 s
			Adaptive elastic net algorithm: 0.2 s
			10-fold CV: 1.4 s
			Estimation of SE with bootstrap^b: 3,265.3 s
III. BAYESIAN MODEL AVERAGING
Bayesian model averaging (using MCMC to search model space)	PIPs of regression covariates inform model selection. Bootstrap gave selection frequencies that were almost identical to PIPs (data not shown).	Model uncertainty is properly incorporated into the estimation of regression coefficients and their standard deviations (provided that MCMC chain converged and the algorithms managed to search the entire model space).	250.8 s
Bayesian model averaging (using MCMC to search model space)			(1,000,000 iterations, chain converged)

AIC Akaike Information Criterion, BIC Bayesian Information Criterion, CV cross-validation, LRT Likelihood Ratio Test, MCMC Markov Chain Monte Carlo, PIP posterior inclusion probability, SE standard error
^aThe analysis is run on a 1.7 GHz Intel(R) Core(TM) i5 processor with 4.00 GB of DDR3 memory
^bIn all cases of estimation of standard errors using bootstrap number of iterations = 2,000

Back to article page

ISSN: 1471-2288

Contact us

General enquiries: journalsubmissions@springernature.com

BMC Medical Research Methodology

Contact us