- Research article
- Open Access
- Open Peer Review
Binary classification of dyslipidemia from the waist-to-hip ratio and body mass index: a comparison of linear, logistic, and CART models
- Michael C Costanza^{1}Email author and
- Fred Paccaud^{2}
https://doi.org/10.1186/1471-2288-4-7
© Costanza and Paccaud; licensee BioMed Central Ltd. 2004
- Received: 27 October 2003
- Accepted: 06 April 2004
- Published: 06 April 2004
Abstract
Background
We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements.
Methods
Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region.
Results
Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60–80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models.
Conclusions
There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.
Keywords
- Abdominal obesity
- classification and regression trees
- external validation
- dyslipidemia screening
- positive and negative predictive values
- sensitivity and specificity.
Background
Central adiposity is a predictor of cardiovascular disease (CVD) independently of other major risk factors, including body mass index (BMI) [1, 2]. Part of the relationship between central adiposity and CVD is mediated by a modification of the metabolism of insulin and lipids [3]. Dyslipidemic individuals are more frequently "centrally obese" (e.g., with a high waist-to-hip circumference ratio (WHR)) [4–6]. These observations have been made in a variety of populations from developed [7–9] and less developed countries [9]. Apart from its interest for establishing a physiopathological causal link, this predictive association suggests the possibility of employing one or more anthropometric measurements of central adiposity as a first step in population screening for dyslipidemia [8, 9]. Using inexpensive and readily obtainable anthropometric measurements instead of more costly and time-consuming wet- or even dry-chemistry laboratory cholesterol measurements is relevant even in developed countries where an emerging epidemic of CVD is occurring amidst rising health care costs.
One objective of the present study was to attempt to improve upon previous statistical strategies for detecting dyslipidemia in the general population, with specific focus on the predictive power of the anthropometric measurements WHR and BMI. A second objective was to compare the performance of four statistical modeling approaches that can be employed for binary classification: linear regression [10], logistic classification [11], and classification and regression trees (CART) [12, 13]. By can be employed we mean: (a) with a modest amount of effort using commercially available software (we used SAS [14] and S-Plus [15]); and (b) that it is possible to apply classification-type methods for a binary outcome to the results of regression-type methods for a continuous outcome. We also wondered how well competing methods perform in practice, as opposed to how well they are supposed to perform in theory.
Methods
Study populations and samples
Subjects participated in the World Health Organization (WHO) MONICA (MONItoring trends and determinants in CArdiovascular disease) project described in detail elsewhere [16]. Participating regions included Vaud-Fribourg and Ticino in Switzerland. Vaud and Fribourg are adjacent French-speaking cantons in the west/southwest, while Ticino is an Italian-speaking canton in the southeast. These regions had similar distributions of and correlations between the predictor and outcome variables employed in the statistical models (see Results). Accordingly, the classification performance of region-specific models was estimated by external validation on data from the other region, as well as by (biased) resubstitution.
The third independent 1992–93 MONICA surveys were used. In Vaud-Fribourg, 3,299 individuals aged 25–74 years were invited to participate, and 1,742 (53%) did so. In Ticino, 2,000 individuals aged 35–64 years were invited and 1,510 (76%) participated. Analyses in the present study were restricted to the age range 35–64 years common to both regions (Vaud-Fribourg n = 1,182, Ticino n = 1,510). In addition to WHR and BMI, the potential predictor variables examined were Gender, Age, current cigarette Smoking, and high blood pressure (HBP: diastolic BP ≥ 90 mm Hg or under hypertension treatedment). Linear and logistic regression (but not CART) models require complete data on the study subjects, unless missing data imputation techniques are employed. For convenience, we excluded subjects with missing data on any of the predictor variables. This reduced the final sample sizes by 5% in Vaud/Fribourg (n = 1,120) and by 6% in Ticino (n = 1,429).
Statistical models
Although the total serum cholesterol to high density lipoprotein cholesterol (TC/HDL-C) ratio is a continuous variable, we assumed that assessing the dyslipidemia classification performance of a predictive model would ultimately require comparing predicted binary values of dyslipidemia status. We applied five modeling approaches (Strategies 0–4) which reflected: no model (0); algebraically specified (1, 2) vs. unspecified (3, 4) models; and regression-(1, 3) vs. classification-based (2, 4) models. Strategies 1–4 were expected to outperform the minimal benchmark Strategy 0.
Strategy 0: modal regional prevalence of dyslipidemia (no model)
Individuals in a given region were classified as dyslipidemic or not dyslipidemic, depending on the observed modal (most frequent) dyslipidemia category either in the whole region or stratified by gender. Strategy 0 represented a "no model" approach in the sense that the additional predictor variables were ignored.
Strategy 1: linear regression
Additive linear models,
Y = b _{0} + b _{1} X _{1} + b _{2} X _{2} + ··· + b _{ k } X _{ k }+ e,
where Y = TC/HDL-C ratio, {X _{1}, X _{2}, ... , X _{ k }} (k ≤ 6) = a subset of the predictor variables {WHR, BMI, Gender, Age, Smoking, HBP}, and e = Gaussian error with constant variance, were fitted. TC/HDL-C, WHR, BMI, and Age were analyzed as continuous variables, while Gender, Smoking, and HBP were analyzed as binary variables. An individual with estimated Y ≥ 5.0 was classified as dyslipidemic, or classified as not dyslipidemic otherwise.
Including all the predictor variables was termed the full model, while including only {WHR, BMI, Gender} was termed the reduced model. Both types of model were fitted separately by region. In addition, for women and men separately, {WHR, BMI, Age, Smoking, HBP} "full" and {WHR, BMI} "reduced " models also were fitted.
No formal predictor variable selection procedures, nor models with predictor variable product-interactions were employed. We simply wished to magnify any differences and facilitate comparisons between the algebraic linear regression (Strategy 1) vs. non-algebraic regression tree models (Strategy 3).
Strategy 2: logistic classification
For the same predictor variables as in Strategy 1, but with binary Y = 1 if TC/HDL-C ≥ 5.0, Y = 0 otherwise, additive logistic models
log[p/(1 - p)] = b _{0} + b _{1} X _{1} + b _{2} X _{2} + ··· + b _{ k } X _{ k }+ e,
where p = probability that Y = 1 for given values of the predictors and e = binomial error term, were fitted. This model assumes the relationship between log[p/(1-p)] and the predictor variables is linear. An individual with estimated p ≥ 0.50 was classified as dyslipidemic, or classified as not dyslipidemic otherwise.
As in Strategy 1, neither predictor variable selection nor specification of predictor variable product interactions were employed to magnify differences and facilitate comparisons between the algebraic logistic classification (Strategy 2) vs. non-algebraic classification tree models (Strategy 4).
Strategy 3: regression trees
For the same predictor variables and continuous Ys as in Strategy 1, regression tree models also were fitted. At each one-step-look-ahead of the "full" tree-growing process, the Ys were examined within all possible binary splits of each predictor variable to select the best single split for creating homogeneous groups with maximal between-group mean-squared errors. This process was continued until "optimality" of the groups at the final nodes ("leaves") of the tree was achieved. In practice, the full tree tends to be overly complex and idiosynchratic with respect to the data employed to "grow" it. Thus, a common recommendation [e.g, [17]] is to "prune" the full tree backwards through further criteria based on both maximal within-leaf homogeneity of the Ys and minimal tree size in order to produce a smaller pruned tree that is less subject to these drawbacks. It is also recommended [17] that the process be internally cross-validated, e.g., by randomly dividing the data into tenths, performing the pruning on the full tree grown with nine tenths and evaluating it on the remaining tenth of the data, and averaging the classification performance criteria (see below) from all ten 9:1 partitions of the data.
After following these recommendations, the estimated value of Y at each pruned tree leaf was taken to be the mean among those subjects comprising the leaf. All individuals in the leaf were classified as dyslipidemic if the estimated Y ≥ 5.0, or classified as not dyslipidemic otherwise.
Strategy 4: classification trees
For the same predictor variables and binary Ys as in Strategy 2, classification tree models also were fitted. The rationale, algorithms, and recommendations employed were similar to those for regression trees, with one important difference. An appealing recommendation [17] to employ both minimal misclassification rate (instead of. maximal within-leaf homogeneity of the Y 's) and minimal tree size optimality criteria to prune the full tree backwards was followed and internally cross-validated as described in Strategy 3.
The estimated value of Y at each pruned tree leaf was taken to be the modal category (dyslipidemic or not) among those subjects comprising the leaf. All individuals in a leaf were then classified in accord with the modal category.
Classification performance criteria
The classification performance of all models were compared in terms of five measures: (1) overall correct classification (total % agreement between observed and model-classified dylipidemia status); (2) sensitivity (% with observed TC/HDL-C ≥ 5.0 and classified as such); (3) specificity (% with observed TC/HDL-C < 5.0 and classified as such); (4) positive predictive value (PPV, % classified as TC/HDL-C ≥ 5.0 and observed as such); (5) negative predictive value (NPV, % classified as TC/HDL-C < 5.0 and observed as such). For the Vaud-Fribourg and Ticino region-specific models, all five classification performance measures were estimated by resubstitution of the data from the same region as well as by external validation on the subjects from the other region.
Results
Descriptive comparisons of the two study samples
Comparisons of Swiss MONICA samples (ages 35–64 yrs).
Study Variable | Vaud-Fribourg ^{ a } | Ticino ^{ b } |
---|---|---|
Male Gender | 48.9% | 48.2% |
Age (yrs) ^{ c } | 47.8 ± 8.5 | 49.5 ± 8.2 |
Women | 47.9 ± 8.5 | 49.7 ± 8.3 |
Men | 47.8 ± 8.4 | 49.2 ± 8.1 |
TC/HDL-C ratio ^{ c } | 4.9 ± 1.7 | 5.1 ± 1.8 |
Women | 4.2 ± 1.3 | 4.4 ± 1.6 |
Men | 5.7 ± 1.8 | 5.8 ± 1.9 |
Dyslipidemia ^{ d } | 41.6% | 44.4% |
Women | 22.4% | 25.9% |
Men | 61.7% | 64.4% |
WHR ^{ c } | 0.85 ± 0.09 | 0.85 ± 0.08 |
Women | 0.78 ± 0.05 | 0.80 ± 0.06 |
Men | 0.92 ± 0.06 | 0.91 ± 0.05 |
BMI (kg/m ^{ 2 } ) ^{ c } | 25.6 ± 4.0 | 26.0 ± 4.3 |
Women | 24.6 ± 4.2 | 25.4 ± 4.9 |
Men | 26.5 ± 2.6 | 26.6 ± 3.4 |
Current Cigarette Smoking | 25.6 % | 31.1 % |
Women | 24.7% | 26.5% |
Men | 26.6% | 36.2% |
Hypertension ^{ e } | 21.7% | 22.9% |
Women | 14.7% | 17.0% |
Men | 29.0% | 29.2% |
Correlations among study variables in two Swiss MONICA samples (ages 35–64 yrs).
WHR | BMI | Age | Current Smoking | Hypertension ^{ d } | Gender | |
---|---|---|---|---|---|---|
TC/HDL-C ratio | 0.53 ^{a}/0.49 ^{b} | 0.41/0.36 | 0.14/0.09 | 0.11/0.13 | 0.19/0.20 | 0.43/0.38 |
Women | 0.37/0.42 | 0.36/0.41 | 0.27/0.30 | 0.15/0.14 | 0.20/0.23 | - |
Men | 0.32/0.27 | 0.36/0.34 | 0.06/-0.07 | 0.08/0.07 | 0.09/0.09 | - |
Dyslipidemia ^{ c } | 0.46/0.48 | 0.36/0.35 | 0.13/0.13 | 0.09/0.10 | 0.15/0.19 | 0.40/0.39 |
Women | 0.37/0.35 | 0.34/0.35 | 0.24/0.29 | 0.13/0.08 | 0.18/0.22 | - |
Men | 0.21/0.27 | 0.27/0.32 | 0.07/0.03 | 0.06/0.05 | 0.02/0.09 | - |
WHR | 0.53/0.49 | 0.19/0.21 | 0.03/0.10 | 0.24/0.24 | 0.77/0.72 | |
Women | 0.51/0.52 | 0.29/0.32 | 0.05/0. 02 | 0.20/0.28 | - | |
Men | 0.61/0.53 | 0.31/0.34 | 0.00/0.05 | 0.13/0.12 | - | |
BMI | 0.23/0.23 | -0.08/-0.07 | 0.27/0.26 | 0.24/0.13 | ||
Women | 0.31/0.31 | -0.08/-0.11 | 0.28/0.33 | - | ||
Men | 0.15/0.13 | -0.10/-0.04 | 0.21/0.17 | - | ||
Age | -0.12/-0.09 | 0.18/0.19 | -0.00/-0.03 | |||
Women | -0.15/-0.08 | 0.20/0.26 | - | |||
Men | -0.09/-0.10 | 0.17/0.15 | - | |||
Current Smoking ^{ d } | -0.02/-0.04 | 0.02/0.11 | ||||
Women | -0.03/-0.06 | - | ||||
Men | -0.01/-0.06 | - |
Overall classification models
Classification performance of overall (both genders) reduced {WHR, BMI, Gender} models for Vaud-Fribourg, with cross-validation on Ticino subjects.
(Strategy) Fitted Model | Total % Correct | Sensitivity | Specificity | + Predictive Value (PPV) | - Predictive Value (NPV) |
---|---|---|---|---|---|
Classifications of both genders | |||||
(0) No | 58 ^{c} | 0 | 100 | 0 | 58 |
Model ^{ a } | (56) ^{d} | (0) | (100) | (0) | (56) |
(1) Linear | 71 | 73 | 69 | 63 | 78 |
Regression | (72) | (78) | (68) | (66) | (79) |
(2) Logistic | 71 | 63 | 77 | 66 | 74 |
Classification | (72) | (67) | (77) | (70) | (74) |
(3) 2-Node | 72 | 58 | 82 | 69 | 73 |
Reg. Tree ^{ e } | (70) | (56) | (82) | (71) | (70) |
(4) 7-Node | 74 | 70 | 77 | 69 | 78 |
Class. Tree ^{ f } | (71) | (68) | (73) | (67) | (74) |
Classifications of women only | |||||
(0) No | 78 ^{c} | 0 | 100 | 0 | 78 |
Model ^{ a } | (74) ^{d} | (0) | (100) | (0) | (74) |
(1) Linear | 78 | 26 | 94 | 54 | 81 |
Regression | (75) | (39) | (88) | (53) | (80) |
(2) Logistic | 78 | 13 | 96 | 52 | 79 |
Classification | (76) | (27) | (94) | (60) | (79) |
(3) 2-Node | 78 | 8 | 98 | 59 | 79 |
Reg. Tree ^{ e } | (76) | (16) | (97) | (65) | (77) |
(4) 7-Node | 81 | 41 | 93 | 63 | 84 |
Class. Tree ^{ f } | (76) | (48) | (85) | (53) | (82) |
Classifications of men only | |||||
(0) No | 62 ^{c} | 100 | 0 | 62 | 0 |
Model ^{ b } | (64) ^{d} | (100) | (0) | (64) | (0) |
(1) Linear | 63 | 91 | 18 | 64 | 54 |
Regression | (69) | (95) | (22) | (69) | (70) |
(2) Logistic | 64 | 81 | 35 | 67 | 54 |
Classification | (68) | (84) | (38) | (71) | (57) |
(3) 2-Node | 65 | 77 | 46 | 70 | 55 |
Reg. Tree ^{ e } | (65) | (74) | (48) | (72) | (50) |
(4) 7-Node | 67 | 81 | 44 | 70 | 59 |
Class. Tree ^{ f } | (65) | (77) | (44) | (71) | (51) |
For both genders combined, regardless of measure, classification performance was a modest 60–80% for all models, and no clear preference among different models was discernible. Moreover, the reduced models performed nearly as well as the full models. Again for brevity, only results for the reduced models are shown. Kappa measures of agreement were also calculated, indicating 70–80% classification concordance between models, with a slight tendency for the linear and logistic models on the one hand, vs. CART models on the other, to agree more among themselves (75–80%) than with models of the other type (70%) (not shown otherwise). This tendency was not evident for the regression-per se vs. classification-per se models.
The overall classification rates in Table 3 were not uniform by gender. For Vaud-Fribourg women, the models had higher specificity and NPV, but lower sensitivity and PPV; for Vaud-Fribourg men these tendencies were reversed. Apparently, this "interaction" by gender was not "automatically detected" consistently nor particularly well by the overall tree-based models, none of which retained the Gender variable.
Gender-specific classification models
Classification performance of gender-specific reduced {WHR, BMI} predictive models.
(Strategy) Fitted Model | Total % Correct | Sensitivity | Specificity | + Predictive Value (PPV) | - Predictive Value (NPV) |
---|---|---|---|---|---|
Model based on Vaud-Fribourg women (n = 572), cross-validated on Ticino women (n = 741). | |||||
(0) No | 78 ^{c} | 0 | 100 | 0 | 78 |
Model ^{ a } | (74) ^{d} | (0) | (100) | (0) | (74) |
(1) Linear | 78 | 19 | 95 | 53 | 80 |
Regression | (76) | (33) | (91) | (55) | (79) |
(2) Logistic | 78 | 19 | 95 | 53 | 80 |
Classification | (75) | (32) | (91) | (54) | (79) |
(3) 3-Node | 80 | 40 | 92 | 59 | 84 |
Reg. Tree ^{ d } | (75) | (45) | (86) | (52) | (82) |
(4) 3-Node | 81 | 38 | 93 | 62 | 84 |
Class. Tree ^{ e } | (75) | (44) | (86) | (53) | (82) |
Model based on Vaud-Fribourg men (n = 548), cross-validated on Ticino men (n = 688) | |||||
(0) No | 62 ^{c} | 100 | 0 | 62 | 0 |
Model ^{ b } | (64) ^{d} | (100) | (0) | (64) | (0) |
(1) Linear | 63 | 88 | 24 | 65 | 55 |
Regression | (69) | (91) | (30) | (70) | (65) |
(2) Logistic | 64 | 86 | 29 | 66 | 55 |
Classification | (68) | (88) | (33) | (70) | (60) |
(3) 3-Node | 65 | 78 | 45 | 70 | 56 |
Reg. Tree ^{ d } | (66) | (76) | (47) | (72) | (52) |
(4) 5-Node | 68 | 78 | 51 | 72 | 59 |
Class. Tree ^{ f } | (67) | (78) | (47) | (73) | (54) |
There were more inconsistencies in the predictor variables retained by the gender-specific CART models compared to the overall CART models between the two regions, especially for men (not shown). These inconsistencies were due in part to the necessarily smaller gender-specific sample sizes, as well as to idiosynchrasies in the observed sample data for the two regions (Figures 1, 2).
Discussion
In another study comparing Swiss and Seychelles Islands populations [9], several indicators of central adiposity (i.e., waist circumference and WHR) worked reasonably well when employed in logistic regression models for predicting dyslipidemia, either as individual predictors or in conjunction with other variables such as those employed in the present study. The predictive value of WHR for the Swiss populations served to corroborate the findings of Reeder et al. [8] in a Canadian population in the sense that similar variables and logistic models were employed in both studies.
Both of the latter studies attempted to quantify the predictive power of anthropometric measurements as first stage population screening indicators of dyslipidemia. However, neither study was particularly thorough in choosing the statistical methodology for the predictive models. For example, the (main) dependent variable, TC/HDL-C, although continuous, was analyzed as a binary variable with additive logistic regression models. Likewise, WHR and BMI, also continuous, were coded and employed in the logistic models as so-called "action level" dichotomies [1] (e.g., WHR ≥ 0.90 for men or WHR ≥ 0.80 for women was coded as "high" WHR by gender, BMI ≥ 27 was coded as "high" BMI for both genders, and "high" was contrasted with "low" WHR or BMI in the models). Thus, we wondered if more comprehensive statistical models would have led to improved classification.
The present findings are based on juxtaposing the results for the very simplest additive, algebraic, linear and logistic regression vs. the non-additive, non-algebraic CART models based on the relatively small set of predictor variables examined. They serve to some degree to indicate the limits of predictability of dyslipidemia by first stage population screening programs based on statistical models which focus primarily or exclusively on anthropometric measurements such as WHR and BMI. The observed relationships between the latter and our TC/HDL-C ratio-based dyslipidemia continuous or binary variables were at best moderately strong, hence dyslipidemia was only moderately predictable therefrom. Nonetheless, although their predictive power is far from perfect, even the models for first stage population screening purposes such as those studied here could lead to potential cost savings. This conclusion did not seem to depend on the TC/HDL-C ≥ 5.0 cutpoint we employed to define dyslipidemia, as the data suggested that the relationship is stable within the limits of a reasonable change.
Our reliance on the composite WHR and BMI measures in our models instead of the individual waist, hip, weight, and height measurements may not have optimally or even adequately captured the relationship between the latter variables and the TC/HDL-C ratio. However, our rationale was to investigate and attempt to improve upon the types of classification rules intended for use in population dyslipidemia screening that have been obtained in previous studies employing similar but more limited analytical approaches. BMI and WHR are routinely employed because they are directly related to clinical entities (i.e. peripheral overweight, central obesity, etc.). Moreover, the issue of partial relationship was addressed by examining models using waist circumference alone instead of WHR, but we found little difference in the results (not shown). These potential limitiations notwithstanding, external validation has recently been shown to be crucial for judging the merits of any predictive model [19, 20]. The external validations of the various models estimated from the two different Swiss MONICA samples did provide some evidence of their predictive stability in these populations.
The overall (both genders) sensitivities and specificities of the various predictive models for the Swiss samples in this study were comparable to those obtained using only logistic regression models with WHR and BMI as coded by Reeder et al. [8] in Canadian samples, and by Paccaud et al. [9] in samples from Switzerland and the Seychelles. However, discrepancies in these measures and reversals by gender were more pronounced in the present study. It may be that our use of continuous versions of these variables in the models led to these differences.
The forward (backward) variable selection process inherent in full (pruned) CART modeling differs in an important way from the stepwise selection procedures that are commonly used with linear and logistic models. That is, a predictor variable selected for binary splitting at a given step may be "re-selected" at subsequent steps, or even "re-removed" as at previous steps. In essence, this difference is what makes tree-based models so-called "automatic interaction detectors" [21], and also why it is difficult to pre- or even post-specify tree-based models algebraically, but fortunately (perhaps) it is not necessary to do so to apply them in practice. A major feature of this approach is that no assumption of linearity between Y and the predictor variables (which can be categorical (binary or polychotomous), ordinal, or continuous) is required. Tree-based models are obviously appealing because of these features.
Despite the expected advantages of CART models over their linear and logistic counterparts (also see [22]), as well as the evidently modest ability of WHR and BMI to predict dyslipidemia, we were somewhat disappointed with the comparative classification performance of the CART models for these particular data, especially because we had deliberately "handicapped" the linear and logistic modeling strategies by not applying any formal predictor variable selection methodology and by considering only strictly additive models.
Some additional improvements might have been obtained by incorporating differential costs of misclassification into the classification-tree (also logistic) models. However, these costs are not always easy to specify. This issue can alternatively be addressed indirectly by changing the (usual default) classification cut-off point from 0.50 to (say) p_{s} = sample prevalence of dyslipidemia, and (in effect) classifying an individual as dyslipidemic only if their model-estimated posterior probability of being dyslipidemic exceeds their prior probability of being dyslipidemic (i.e., p_{s}). This latter approach was examined in the present study, but on balance the corresponding classification performance results were not much different from those based on the usual 0.50 cut-off point (not shown otherwise). This was due at least in part to the fact that the observed values of p_{s} (see Table 1) were not close to the extremes of 0 or 1. Of course, changing the cut-off point in this manner simply implies trade-offs between sensitivity and specificity, which may or may not be warranted depending on the actual costs of misclassification.
Conclusions
At least for binary prediction of dyslipidemia from waist-to-hip ratio and body mass index in the context of the relatively small set of other predictor variables examined, the simple additive logistic models obtained in previous studies were about as effective as the more comprehensive statistical models investigated here. Indeed, for the data at hand, perhaps even an old standby such as linear discriminant analysis [23], the forerunner of logistic classification, would have sufficed. In all fairness, CART models may be of more value when much larger sets of predictor variable main effects and interactions than the one considered in this study are considered in the statistical modeling process.
Declarations
Authors’ Affiliations
References
- Han TS, Van Leer EM, Seidell JC, Lean MJ: Waist circumference action levels in the identification of cardiovascular risk factors: prevalence study in a random sample. BMJ. 1995, 311: 1401-1405.0.View ArticlePubMedPubMed CentralGoogle Scholar
- Reeder BA, Senthiselvan A, Despres JP, Angel A, Liu L, Wand H, Rabkin SW: The association of cardiovascular disease risk factors with abdominal obesity in Canada. Canadian Heart Health Surveys Research Group. CMAJ. 1997, 157: S39-S45.PubMedGoogle Scholar
- Shetterly SM, Marshall JA, Baxter J, Hamman RF: Waist-hip-ratio measurement location influences associations with measures of glucose and lipid metabolism. The San Luis Valley Diabetes Study. Ann Epidemiol. 1993, 3: 295-299.View ArticlePubMedGoogle Scholar
- Bjorntorp P: Regional patterns of fat distribution. Ann Int Med. 1985, 103: 994-995.View ArticlePubMedGoogle Scholar
- Seidell JC, Cigolini M, Charzewska J, Ellsinger BM, di Base G: Fat distribution in European women: a comparison of anthropometric measurements in relation to cardiovascular risk factors. Int J Epidemiol. 1990, 19: 303-308.View ArticlePubMedGoogle Scholar
- Pouliot MC, Després JP, Lemieux S, Moorjani S, Bouchard C, Tremblay A, Nadeau A, Lupien PJ: Waist circumference and abdominal sagittal diameter: best simple anthropometric indexes of abdominal visceral adipose tissue accumulation and related cardiovascular risk in men and women. Am J Cardiol. 1994, 73: 460-468.View ArticlePubMedGoogle Scholar
- Houmard JA, Wheeler WS, McCammon MR, Holbert D, Israel RG, Barakat HA, Wells JM, Truitt N, Hamad SF: An evaluation of waist to hip ratio measurement methods in relation to lipid and carbohydrate metabolism in men. Int J Obes. 1991, 15: 181-188.PubMedGoogle Scholar
- Reeder BA, Liu L, Horlick L: Selective screening for dyslipidemia in a Canadian population. J Clin Epidemiol. 1996, 49: 217-222. 10.1016/0895-4356(95)00063-1.View ArticlePubMedGoogle Scholar
- Paccaud F, Schlüter-Fasmeyer V, Wietlisbach V, Bovet P: Dyslipidemia and abdominal obesity: An assessment in three general populations. J Clin Epidemiol. 2000, 53: 393-400. 10.1016/S0895-4356(99)00184-5.View ArticlePubMedGoogle Scholar
- Chambers JM: Linear models. In Statistical Models in S. Edited by: Chambers JM, Hastie TJ. 1992, Wadsworth & Brooks/Cole, Pacific Grove, CA, 4: 95-144.Google Scholar
- Hastie TJ, Pregibon D: Generalized linear models. In Statistical Models in S. Edited by: Chambers JM, Hastie TJ. 1992, Wadsworth & Brooks/Cole, Pacific Grove, CA, 6: 195-248.Google Scholar
- Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. 1984, Wadsworth, Belmont, CAGoogle Scholar
- Clark LA, Pregibon D: Tree-based models. In Statistical Models in S. Edited by: Chambers JM, Hastie TJ. 1992, Wadsworth & Brooks/Cole, Pacific Grove, CA, 9: 377-420.Google Scholar
- SAS Institute Inc: SAS OnlineDoc®, Version 8, Cary, North Carolina, USA. 1999Google Scholar
- Insightful Corp: S-PLUS 2000 Guide to Statistics, Seattle, WA. 1999Google Scholar
- World Health Organization MONICA Project Principal Investigators: The MONICA Project (MONItoring trends and determinants in CArdiovascular disease): a major international collaboration. J Clin Epidemiol. 1988, 41: 105-114. 10.1016/0895-4356(88)90084-4.View ArticleGoogle Scholar
- Venables WN, Ripley BD: Modern Applied Statistics with S-Plus. 1999, Springer, NY, 327-3View ArticleGoogle Scholar
- Wietlisbach V, Paccaud F, Rickenbach M, Gutzwiller F: Trends in cardiovascular risk factors (1984–1993) in a Swiss region: results of the three population surveys. Prev Med. 1997, 26: 523-533. 10.1006/pmed.1997.0167.View ArticlePubMedGoogle Scholar
- Terrin N, Schmid CH, Griffith JL, D'Agostino RB, Selker HP: External validity of predictive models: A comparison of logistic regression, classification trees, and neural nerworks. J Clin Epidemiol. 2003, 56: 721-729. 10.1016/S0895-4356(03)00120-3.View ArticlePubMedGoogle Scholar
- Bleeker SE, Moll HA, Steyerberg EW, Donders ART, Derksen-Lubsen G, Grobbee DE, Moons KGM: External validition is necessary in prediction research: A clinical example. J Clin Epidemiol. 2003, 56: 826-832. 10.1016/S0895-4356(03)00207-5.View ArticlePubMedGoogle Scholar
- Sonquist JA, Morgan JN: The detection of interaction effects: A report on a computer program for the selection of optimal combinations of explanatory variables. Monograph 35, University of Michigan, Ann Arbor: Survey Research Center Institute for Social Research. 1964Google Scholar
- Cook EF, Goldman L: Empiric comparison of multivariate analytic techniques: advantages and disadvantages of recursive partitioning analysis. J Chron Dis. 1984, 37: 721-731.View ArticlePubMedGoogle Scholar
- Fisher RA: The use of multiple measurements in taxonomic problems. Ann Eugenics. 1936, 7: 179-188.View ArticleGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/4/7/prepub
Pre-publication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.