Variable selection in social-environmental data: sparse regression and tree ensemble machine learning approaches

Table 1 Candidate methods

Abbreviation	Description	Selection rule	R packages
UNIV-BFN	Univariable models with Bonferroni-adjusted p-val	P < 5*10⁻⁵	base R [23]
LASSO-MIN	Lasso with λ chosen at the minimum prediction error	β ≠ 0	glmnet [24]
LASSO-1SE	Lasso with λ chosen at 1 SE above the minimum error	β ≠ 0	glmnet
ELNET-MIN	Elastic net, grid search for α (0.05–0.95 by 0.05), λ at min	β ≠ 0	glmnet
ELNET-1SE	Elastic net, grid search α (0.05–0.95 by 0.05), λ at 1 SE	β ≠ 0	glmnet
HCLST-CORR-SGL	Hierarchical clustering, groups with corr > 0.8, sparse group lasso	β ≠ 0	SGL [25]
HCLST-BOOT-SGL	Hierarchical clustering, groups from bootstrap, sparse group lasso	β ≠ 0	SGL, pvclust [16]
RF	Random Forests algorithm with bootstrap-based confidence intervals for the variable importance scores	99.995% CI > 0	randomForestSRC [26]
BAGGING	Similar to Random Forests, but with all variables considered candidates for splitting at each node	99.995% CI > 0	randomForestSRC
BART-LOCAL	Bayesian Additive Regression Trees, local criteria for Inclusion Proportion (IP)	IP > 0.95 quantile of local distribution	bartMachine [27]
BART-GLOBALSE	Bayesian Additive Regression Trees, global SE criteria for IP	IP > threshold from local distribution with global multiplier	bartMachine
BART-GLOBALMAX	Bayesian Additive Regression Trees, global Max criteria for IP	IP > 0.95 quantile of global max distribution	bartMachine

ISSN: 1471-2288