Binary classification of dyslipidemia from the waist-to-hip ratio and body mass index: a comparison of linear, logistic, and CART models

Background We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. Methods Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. Results Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60–80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. Conclusions There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.


Background
Central adiposity is a predictor of cardiovascular disease (CVD) independently of other major risk factors, including body mass index (BMI) [1,2]. Part of the relationship between central adiposity and CVD is mediated by a mod-ification of the metabolism of insulin and lipids [3]. Dyslipidemic individuals are more frequently "centrally obese" (e.g., with a high waist-to-hip circumference ratio (WHR)) [4][5][6]. These observations have been made in a variety of populations from developed [7][8][9] and less developed countries [9]. Apart from its interest for establishing a physiopathological causal link, this predictive association suggests the possibility of employing one or more anthropometric measurements of central adiposity as a first step in population screening for dyslipidemia [8,9]. Using inexpensive and readily obtainable anthropometric measurements instead of more costly and timeconsuming wet-or even dry-chemistry laboratory cholesterol measurements is relevant even in developed countries where an emerging epidemic of CVD is occurring amidst rising health care costs.
One objective of the present study was to attempt to improve upon previous statistical strategies for detecting dyslipidemia in the general population, with specific focus on the predictive power of the anthropometric measurements WHR and BMI. A second objective was to compare the performance of four statistical modeling approaches that can be employed for binary classification: linear regression [10], logistic classification [11], and classification and regression trees (CART) [12,13]. By can be employed we mean: (a) with a modest amount of effort using commercially available software (we used SAS [14] and S-Plus [15]); and (b) that it is possible to apply classification-type methods for a binary outcome to the results of regression-type methods for a continuous outcome. We also wondered how well competing methods perform in practice, as opposed to how well they are supposed to perform in theory.

Study populations and samples
Subjects participated in the World Health Organization (WHO) MONICA (MONItoring trends and determinants in CArdiovascular disease) project described in detail elsewhere [16]. Participating regions included Vaud-Fribourg and Ticino in Switzerland. Vaud and Fribourg are adjacent French-speaking cantons in the west/southwest, while Ticino is an Italian-speaking canton in the southeast. These regions had similar distributions of and correlations between the predictor and outcome variables employed in the statistical models (see Results). Accordingly, the classification performance of region-specific models was estimated by external validation on data from the other region, as well as by (biased) resubstitution.
The third independent 1992-93 MONICA surveys were used. In Vaud-Fribourg, 3,299 individuals aged 25-74 years were invited to participate, and 1,742 (53%) did so. In Ticino, 2,000 individuals aged 35-64 years were invited and 1,510 (76%) participated. Analyses in the present study were restricted to the age range 35-64 years common to both regions (Vaud-Fribourg n = 1,182, Ticino n = 1,510). In addition to WHR and BMI, the potential predictor variables examined were Gender, Age, current cigarette Smoking, and high blood pressure (HBP: diastolic BP ≥ 90 mm Hg or under hypertension treatedment). Linear and logistic regression (but not CART) models require complete data on the study subjects, unless missing data imputation techniques are employed. For convenience, we excluded subjects with missing data on any of the predictor variables. This reduced the final sample sizes by 5% in Vaud/Fribourg (n = 1,120) and by 6% in Ticino (n = 1,429).

Statistical models
Although the total serum cholesterol to high density lipoprotein cholesterol (TC/HDL-C) ratio is a continuous variable, we assumed that assessing the dyslipidemia classification performance of a predictive model would ultimately require comparing predicted binary values of dyslipidemia status. We applied five modeling approaches (Strategies 0-4) which reflected: no model (0); algebraically specified (1, 2) vs. unspecified (3,4) models; and regression-(1, 3) vs. classification-based (2, 4) models. Strategies 1-4 were expected to outperform the minimal benchmark Strategy 0.
Strategy 0: modal regional prevalence of dyslipidemia (no model) Individuals in a given region were classified as dyslipidemic or not dyslipidemic, depending on the observed modal (most frequent) dyslipidemia category either in the whole region or stratified by gender. Strategy 0 represented a "no model" approach in the sense that the additional predictor variables were ignored.

Strategy 1: linear regression
Additive linear models, where Y = TC/HDL-C ratio, {X 1 , X 2 , ... , X k } (k ≤ 6) = a subset of the predictor variables {WHR, BMI, Gender, Age, Smoking, HBP}, and e = Gaussian error with constant variance, were fitted. TC/HDL-C, WHR, BMI, and Age were analyzed as continuous variables, while Gender, Smoking, and HBP were analyzed as binary variables. An individual with estimated Y ≥ 5.0 was classified as dyslipidemic, or classified as not dyslipidemic otherwise.
Including all the predictor variables was termed the full model, while including only {WHR, BMI, Gender} was termed the reduced model. Both types of model were fitted separately by region. In addition, for women and men separately, {WHR, BMI, Age, Smoking, HBP} "full" and {WHR, BMI} "reduced " models also were fitted.
No formal predictor variable selection procedures, nor models with predictor variable product-interactions were employed. We simply wished to magnify any differences and facilitate comparisons between the algebraic linear regression (Strategy 1) vs. non-algebraic regression tree models (Strategy 3).

Strategy 2: logistic classification
For the same predictor variables as in Strategy 1, but with binary Y = 1 if TC/HDL-C ≥ 5.0, Y = 0 otherwise, additive logistic models where p = probability that Y = 1 for given values of the predictors and e = binomial error term, were fitted. This model assumes the relationship between log[p/(1-p)] and the predictor variables is linear. An individual with estimated p ≥ 0.50 was classified as dyslipidemic, or classified as not dyslipidemic otherwise.
As in Strategy 1, neither predictor variable selection nor specification of predictor variable product interactions were employed to magnify differences and facilitate comparisons between the algebraic logistic classification (Strategy 2) vs. non-algebraic classification tree models (Strategy 4).

Strategy 3: regression trees
For the same predictor variables and continuous Ys as in Strategy 1, regression tree models also were fitted. At each one-step-look-ahead of the "full" tree-growing process, the Ys were examined within all possible binary splits of each predictor variable to select the best single split for creating homogeneous groups with maximal betweengroup mean-squared errors. This process was continued until "optimality" of the groups at the final nodes ("leaves") of the tree was achieved. In practice, the full tree tends to be overly complex and idiosynchratic with respect to the data employed to "grow" it. Thus, a common recommendation [e.g, [17]] is to "prune" the full tree backwards through further criteria based on both maximal within-leaf homogeneity of the Ys and minimal tree size in order to produce a smaller pruned tree that is less subject to these drawbacks. It is also recommended [17] that the process be internally cross-validated, e.g., by randomly dividing the data into tenths, performing the pruning on the full tree grown with nine tenths and evaluating it on the remaining tenth of the data, and averaging the classification performance criteria (see below) from all ten 9:1 partitions of the data.
After following these recommendations, the estimated value of Y at each pruned tree leaf was taken to be the mean among those subjects comprising the leaf. All individuals in the leaf were classified as dyslipidemic if the estimated Y ≥ 5.0, or classified as not dyslipidemic otherwise.

Strategy 4: classification trees
For the same predictor variables and binary Ys as in Strategy 2, classification tree models also were fitted. The rationale, algorithms, and recommendations employed were similar to those for regression trees, with one important difference. An appealing recommendation [17] to employ both minimal misclassification rate (instead of. maximal within-leaf homogeneity of the Y 's) and minimal tree size optimality criteria to prune the full tree backwards was followed and internally cross-validated as described in Strategy 3.
The estimated value of Y at each pruned tree leaf was taken to be the modal category (dyslipidemic or not) among those subjects comprising the leaf. All individuals in a leaf were then classified in accord with the modal category.

Classification performance criteria
The classification performance of all models were compared in terms of five measures: (1) overall correct classification (total % agreement between observed and modelclassified dylipidemia status); (2) sensitivity (% with observed TC/HDL-C ≥ 5.0 and classified as such); (3) specificity (% with observed TC/HDL-C < 5.0 and classified as such); (4) positive predictive value (PPV, % classified as TC/HDL-C ≥ 5.0 and observed as such); (5) negative predictive value (NPV, % classified as TC/HDL-C < 5.0 and observed as such). For the Vaud-Fribourg and Ticino region-specific models, all five classification performance measures were estimated by resubstitution of the data from the same region as well as by external validation on the subjects from the other region.

Descriptive comparisons of the two study samples
The predictor and outcome variables in the Vaud-Fribourg and Ticino MONICA study samples are summarized in Table 1. Switzerland has a relatively high prevalence of dyslipidemia (especially among men) compared to other countries [18]. The Ticino subjects were on average two years older, had a slightly higher TC/HDL-C ratio and thus a higher prevalence of dyslipidemia, and had more current cigarette smokers (predominantly among men) than the Vaud-Fribourg subjects. On the other hand, the distributions of WHR and BMI were similar in both regions.
The correlation matrices for both regions indicated that the bivariate relationship patterns also were similar (Table  2). WHR, BMI, and Gender had the highest correlations with the TC/HDL-C ratio (continuous or binary), with noticeable attenuation of the gender-specific correlations between WHR and TC/HDL-C. Further, the correlations  between TC/HDL-C and Age, Smoking, and HBP were markedly stronger (albeit still low) among women than men. The highest correlation (r > 0.7) among the predictor variables was between WHR and Gender (see also Table 1). The next highest was between WHR and BMI (r ≥ 0.49, overall and gender-specific). These results indicated that WHR, BMI, and Gender would probably be the most important of the predictor variables examined.
Accordingly, 3-D perspective plots of TC/HDL-C ratio vs. WHR and BMI were obtained to visualize what the anthropometric measures were expected to predict (Figures 1, 2). The irregularities in the figures are striking; i.e., the surfaces are not very "smooth". Hence, smooth predictive functions for binary classification such as the additive, algebraically specified linear regression or logistic classification models might not have been expected to perform so well. On the other hand, the non-additive, non-algebraically specified CART models might have been expected to perform relatively better.

Overall classification models
The classification performance of the overall(both genders) models which included Gender as a predictor is summarized in Table 3. Each pruned regression and classification tree model listed was the smallest whose classification performance was equivalent to that of any larger tree. There were only minor differences in the predictor variables retained and the numbers of leaves between the CART models selected for the Vaud-Fribourg and Ticino samples (not shown). Likewise, the rankings of the predictor variables by their relative (nominal) statistical significance in the linear and logistic regression models differed slightly for two samples and between model types (not shown). As expected, WHR and BMI were among the two or three most important predictor variables in all models. On the whole, the classification results for all a All classified as non-dyslipidemic (modal category). b All classified as dyslipidemic (modal category). c Resubstitution estimate for Vaud-Fribourg data (n = 1,120 (572 women, 548 men)). d (Cross-validation estimate based on Ticino data (n = 1,429 (741 women, 688 men))). e Used (WHR) only; same classifications as 4-node, 5-node, 6-node, 7-node, and 9-node regression trees, which used (WHR, BMI), and same variable and classifications as 3-node regression tree. (Also same variable and classifications as for 2-node, full model regression tree.) f Used (WHR, BMI) only. (Also same variables and classifications as for 7-node, full model classification tree.) models were consistent between the two regions. Thus for brevity, only the resubstitution results for the Vaud-Fribourg models with external validation on the Ticino subjects are shown.
For both genders combined, regardless of measure, classification performance was a modest 60-80% for all models, and no clear preference among different models was discernible. Moreover, the reduced models performed nearly as well as the full models. Again for brevity, only results for the reduced models are shown. Kappa measures of agreement were also calculated, indicating 70-80% classification concordance between models, with a slight tendency for the linear and logistic models on the one hand, vs. CART models on the other, to agree more among themselves (75-80%) than with models of the other type (70%) (not shown otherwise). This tendency was not evident for the regression-per se vs. classification-per se models.
The overall classification rates in Table 3 were not uniform by gender. For Vaud-Fribourg women, the models had higher specificity and NPV, but lower sensitivity and PPV; for Vaud-Fribourg men these tendencies were reversed. Apparently, this "interaction" by gender was not "automatically detected" consistently nor particularly well by the overall tree-based models, none of which retained the Gender variable.

Gender-specific classification models
Classification performance for models fitted separately to each gender is shown in Table 4. The differences in classification rates relative to those of the corresponding overall models were at best uneven. The "improvements" of the 3-node, reduced model regression tree over the 2-node, reduced model regression tree (Table 3) for Vaud-Fribourg women notwithstanding, on balance any small to moderate gains in classification here (e.g., in sensitivity) were met by losses there (e.g., in specificity) for all types of model for both regions.
There were more inconsistencies in the predictor variables retained by the gender-specific CART models compared to the overall CART models between the two regions, especially for men (not shown). These inconsistencies were due in part to the necessarily smaller gender-specific sample sizes, as well as to idiosynchrasies in the observed sample data for the two regions (Figures 1, 2).

Discussion
In another study comparing Swiss and Seychelles Islands populations [9], several indicators of central adiposity (i.e., waist circumference and WHR) worked reasonably well when employed in logistic regression models for predicting dyslipidemia, either as individual predictors or in conjunction with other variables such as those employed in the present study. The predictive value of WHR for the Swiss populations served to corroborate the findings of Reeder et al. [8] in a Canadian population in the sense that similar variables and logistic models were employed in both studies.
Both of the latter studies attempted to quantify the predictive power of anthropometric measurements as first stage population screening indicators of dyslipidemia. However, neither study was particularly thorough in choosing the statistical methodology for the predictive models. For example, the (main) dependent variable, TC/HDL-C, although continuous, was analyzed as a binary variable with additive logistic regression models. Likewise, WHR and BMI, also continuous, were coded and employed in the logistic models as so-called "action level" dichotomies [1] (e.g., WHR ≥ 0.90 for men or WHR ≥ 0.80 for women was coded as "high" WHR by gender, BMI ≥ 27 was coded as "high" BMI for both genders, and "high" was contrasted with "low" WHR or BMI in the models). Thus, we wondered if more comprehensive statistical models would have led to improved classification.
The present findings are based on juxtaposing the results for the very simplest additive, algebraic, linear and logistic regression vs. the non-additive, non-algebraic CART models based on the relatively small set of predictor variables examined. They serve to some degree to indicate the limits of predictability of dyslipidemia by first stage population screening programs based on statistical models which focus primarily or exclusively on anthropometric measurements such as WHR and BMI. The observed relationships between the latter and our TC/HDL-C ratio-based dyslipidemia continuous or binary variables were at best moderately strong, hence dyslipidemia was only moderately predictable therefrom. Nonetheless, although their predictive power is far from perfect, even the models for first stage population screening purposes such as those studied here could lead to potential cost savings. This conclusion did not seem to depend on the TC/HDL-C ≥ 5.0 cutpoint we employed to define dyslipidemia, as the data suggested that the relationship is stable within the limits of a reasonable change.
Our reliance on the composite WHR and BMI measures in our models instead of the individual waist, hip, weight, and height measurements may not have optimally or even adequately captured the relationship between the latter variables and the TC/HDL-C ratio. However, our rationale was to investigate and attempt to improve upon the types of classification rules intended for use in population dyslipidemia screening that have been obtained in previous studies employing similar but more limited analytical approaches. BMI and WHR are routinely employed because they are directly related to clinical entities (i.e. peripheral overweight, central obesity, etc.). Moreover, the issue of partial relationship was addressed by examining models using waist circumference alone instead of WHR, but we found little difference in the results (not shown). These potential limitiations notwithstanding, external validation has recently been shown to be crucial for judging the merits of any predictive model [19,20]. The external validations of the various models estimated from the two different Swiss MONICA samples did provide some evidence of their predictive stability in these populations.
The overall (both genders) sensitivities and specificities of the various predictive models for the Swiss samples in this study were comparable to those obtained using only logistic regression models with WHR and BMI as coded by Reeder et al. [8] in Canadian samples, and by Paccaud et al. [9] in samples from Switzerland and the Seychelles. However, discrepancies in these measures and reversals by gender were more pronounced in the present study. It may be that our use of continuous versions of these variables in the models led to these differences.
The forward (backward) variable selection process inherent in full (pruned) CART modeling differs in an important way from the stepwise selection procedures that are commonly used with linear and logistic models. That is, a predictor variable selected for binary splitting at a given step may be "re-selected" at subsequent steps, or even "reremoved" as at previous steps. In essence, this difference is what makes tree-based models so-called "automatic interaction detectors" [21], and also why it is difficult to pre-or even post-specify tree-based models algebraically, but fortunately (perhaps) it is not necessary to do so to apply them in practice. A major feature of this approach is that no assumption of linearity between Y and the predictor variables (which can be categorical (binary or polychotomous), ordinal, or continuous) is required. Treebased models are obviously appealing because of these features.
Despite the expected advantages of CART models over their linear and logistic counterparts (also see [22]), as well as the evidently modest ability of WHR and BMI to predict dyslipidemia, we were somewhat disappointed with the comparative classification performance of the CART models for these particular data, especially because we had deliberately "handicapped" the linear and logistic modeling strategies by not applying any formal predictor variable selection methodology and by considering only strictly additive models.
On the other hand, the CART models did provide some corroboration of and further insights regarding the abovementioned "action levels" for WHR and BMI employed in the logistic models of Reeder et al. [8] and Paccaud et al. [9]. For example, consider the 3-node classification tree for Vaud-Fribourg women shown in Figure 3, and the 3node regression tree for Vaud-Fribourg men shown in Figure 4. A woman whose WHR ≥ 0.81 and (then) whose BMI ≥ 27.6 would be classified as dyslipidemic (i.e., estimated Y = 1). A man whose BMI ≥ 28.9 would immediately be classified as dyslipidemic (i.e., predicted Y = 6.73 ≥ 5.0), while a man whose BMI < 28.9 but (then) whose WHR ≥ 0.89 would also be classified as dyslipidemic (i.e., predicted Y = 5.68 ≥ 5.0). The cutpoints in these 3-node CART models are similar to the previous "action-levels", but are employed a bit differently for classification purposes depending on gender. Such details were much less apparent in the linear and logistic models.
Some additional improvements might have been obtained by incorporating differential costs of misclassification into the classification-tree (also logistic) models. However, these costs are not always easy to specify. This issue can alternatively be addressed indirectly by changing the (usual default) classification cut-off point from 0.50 to (say) p s = sample prevalence of dyslipidemia, and (in effect) classifying an individual as dyslipidemic only if their model-estimated posterior probability of being dyslipidemic exceeds their prior probability of being dyslipidemic (i.e., p s ). This latter approach was examined in the present study, but on balance the corresponding classification performance results were not much different from 3-node classification tree for Vaud-Fribourg women (n = 572) (gender-specific reduced {WHR, BMI} model in Table 4) Figure 3 3-node classification tree for Vaud-Fribourg women (n = 572) (gender-specific reduced {WHR, BMI} model in Table  4 6 3-node regression tree for Vaud-Fribourg men (n = 548) (gender-specific reduced {WHR, BMI} model in Table 4) Figure 4 3-node regression tree for Vaud-Fribourg men (n = 548) (gender-specific reduced {WHR, BMI} model in Table 4). Ovals: interior nodes; rectangles: terminal nodes (leaves). Numbers inside nodes are estimated mean values of TC/ HDL-C (+sums of squares about the mean values). Binary classification rule: TC/HDL-C ≥ 5.0, predict dyslipidemia; TC/ HDL-C < 5.0, predict no dyslipidemia).  9