Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents
 Caroline Bennette^{1} and
 Andrew Vickers^{2}Email author
DOI: 10.1186/147122881221
© Bennette and Vickers; licensee BioMed Central Ltd. 2012
Received: 8 August 2011
Accepted: 29 February 2012
Published: 29 February 2012
Abstract
Background
Quantiles are a staple of epidemiologic research: in contemporary epidemiologic practice, continuous variables are typically categorized into tertiles, quartiles and quintiles as a means to illustrate the relationship between a continuous exposure and a binary outcome.
Discussion
In this paper we argue that this approach is highly problematic and present several potential alternatives. We also discuss the perceived drawbacks of these newer statistical methods and the possible reasons for their slow adoption by epidemiologists.
Summary
The use of quantiles is often inadequate for epidemiologic research with continuous variables.
Background
Epidemiology is often introduced using examples in which both exposure and outcome are considered in binary terms: research participants are defined as having, say, lung cancer or not, and being smokers or not, and then the proportion of smokers compared between cases and controls. Many exposures, however, are inherently continuous. Indeed, in the classic casecontrol study on smoking and lung cancer[1], Doll and BradfordHill report results both for cases and controls in terms of proportion of smokers and by "amount of tobacco consumed", grouping into several different categories such as 1  4, 1524 or 50 + cigarettes per day. In contemporary epidemiologic practice, it is more customary to group continuous variables into quantiles  most often tertiles, quartiles or quintiles  based on the exposure's distribution. In one recent study, for example, researchers examining the link between dietary fat and breast cancer grouped fat intake into quintiles. They reported that women in the highest quintile of fat intake were 11% more likely to get breast cancer than women in the lowest quintile[2]. As another example, surgeon annual caseload was found to be significantly associated with the survival of patients after an acute myocardial infarction[3]. The authors reported that the 30day mortality rate was 13.5% for physicians in the lowest quartile of volume (5 or fewer cases per year) compared to 11.8% for physicians in the highest quartile (more than 24 cases annually).
A number of researchers have commented on the disadvantages of categorization in epidemiologic studies[4]. Many associations can be tested using linear models and practicable alternative methods for handling nonlinear relationships have been broadly developed and validated in recent years. Yet despite these methodological advancements and calls for the abandonment of percentilebased categorization [4, 5], the epidemiologic community continues to rely heavily on the use of quantiles as a primary means of analyzing and presenting results. For example, in a recent issue of The American Journal of Epidemiology (October 2009, volume 170, number 8), four of six papers with a continuous exposure used some form of percentilebased categorization; only two kept the variable as continuous.
Quantiles appear intuitively appealing to epidemiologists as they can be thought of in terms of low, medium and high risk groups. Moreover, the association between exposure and outcome can be described in terms of a relative risk between these groups. However, these perceived benefits are outweighed by several important problems that arise when a continuous variable is categorized, particularly if data dependent quantiles are used to form categories. Here we summarize the previous research on the topic and address possible concerns about the use of alternative statistical approaches.
Discussion
Analysis
Categorization of continuously distributed exposure variables is associated with three problems: first, it involves multiple hypothesis testing with pairwise comparisons of quantiles; second, it requires an unrealistic stepfunction of risk that assumes homogeneity of risk within groups, leading to both a loss of power and inaccurate estimation; and third, it leads to difficulty comparing results across studies due to the datadriven cut points used to define categories.
Multiple testing
Investigators often use the lowest quartile or quintile as the reference category and test whether subsequently higher categories are associated with increased risk of an outcome. For example, a recent study examined whether height was associated with risk of Alzheimer's disease [6]. To test this hypothesis, the authors grouped height into quartiles and separately tested whether the proportion of subjects with Alzheimer's disease was significantly different between the lowest and each of the three highest quartiles; a p value was obtained for each of the three comparisons. As is well known, the chance of a false positive result is increased by multiple comparisons. In the study above, one of three comparisons among men was found to be statistically significant  that between men in the highest and lowest quartile of height  and was subsequently reported in the abstract; the two nonsignificant results were not mentioned. In another example, researchers reported a positive association between triglycerides and risk of endometrial cancer despite only showing that those in the highest quartile had higher risk compared to those in the lowest; no significant differences were seen for the other quartiles. Moreover, an association was found for only one of the four exposures analyzed, all of which were grouped into quartiles [7]. Naturally, this problem can be circumvented by conducting an overall test of significance. Yet the use of multiple groups in quantilebased analysis surely exacerbates the tendency to multiple testing, and this is clearly what is seen in the literature [6, 7].
Homogeneity of risk within categories
The assumption that risk is homogeneous within categories is often inappropriate, especially if the exposure distribution is skewed. Take, for example, a recent study reported that arsenic levels were not associated with risk of bladder cancer [8]. Using levels derived from toenail clippings, the investigators categorized men into quartiles based on the distribution of arsenic in the controls. Arsenic levels in the first three quartiles ranged from 0.014 to 0.161 μg/g; levels in the highest quartile ranged from 0.161 to 17.5 μg/g. By grouping arsenic exposure in this manner, the authors implicitly assume that 0.17 and 17 μg/g of arsenic have identical effects on bladder cancer risk, a 100fold difference in exposure. The assumption that risk does not vary within categories naturally has implications for statistical power. Ignoring intracategory variation and ordering means throwing away information, and is prone to reduce a study's power to detect an association, particularly when, as is common, the exposure distribution is asymmetric[5].
Comparison between studies
Where categorization is based on quantiles, it becomes difficult to compare results across different studies. For example, two recent papers described the effect of hospital volume on mortality rates after a surgical treatment for colon cancer. One study, using data from the Surveillance, Epidemiology and End ResultsMedicare linked database, found a significant association between hospital volume and 30day survival[9]; the other, using data from hospitals in Ontario, failed to find this association[10]. It is difficult to reconcile these inconsistent findings because the measure of exposure  quartile of hospital volume  was dependent on the data set used for analysis. In the first study, the lowest quartile of hospital volume was defined as 57 or fewer cases over a 6 year period whereas for the second study, the lowest quartile included hospitals performing fewer than 61 cases in a 3 year period, more than a twofold difference in volume. This would be analogous to comparing two studies on the association between poverty and health, when poverty was defined as an annual income of $22,000 in one and $47,000 in the other.
Alternatives to categorization
The natural approach to determining an association between a continuously distributed exposure and a binary outcome is simply to analyze the exposure variable in its raw, continuous state using a linear regression model. Basic regression approaches assume a linear association between exposure and the log odds of outcome, but there exist straightforward methods to model nonlinear relationships. In the case of a single explanatory variable and outcome, locallyweighted regression ("lowess") is a robust modeling method; where adjustment for covariates is required, several alternative curvefitting strategies exist. Among them, splines and fractional polynomials are easily implemented in multivariable regression[11, 12].
Perceived problems with nonlinear modeling
Both cubic splines and fractional polynomials are easy to implement with most statistical software  one simply adds splined terms or transformed variables into the regression model  and both provide several important benefits: a more realistic estimation of the exposurerisk relationship and the ability to test directly for nonlinearity. However, despite these advantages, several perceived drawbacks appear to have limited their widespread adoption by epidemiologists.
The first argument against nonlinear modeling of continuous variables is that doing so fails to provide a parsimonious description of the exposuredisease relationship, and this hinders communication of the results to the public and other scientists. Yet this argument holds both for nonlinear modeling and the use of categorization: when quartiles or quintiles are used to describe a relationship between an exposure and an outcome, the relationship is described by three or four separate estimates. In the study of height and Alzheimer's disease, for example, the investigators categorized height into quartiles and therefore obtained three separate estimates of association[6]. The choice to report only one comparison is arguably an inappropriate way to simplify their findings; indeed, it ignores the data from 50% of the study participants.
Another argument against regression techniques involving nonlinear terms is that the resulting models are prone to overfit [14]. While the increased flexibility of fractional polynomial or spline regression models may create spurious dips or inflection points, the random variability associated with the estimation of each quantile can similarly lead to spurious findings. A good example of this effect comes from a study of sex hormones and prostate cancer risk. Using the lowest quartile as the reference category, the odd ratios for successively higher estradiol quartiles were reported as 0.53 (95% Confidence Interval [CI]: 0.33, 0.85), 0.40 (95% CI: 0.23, 0.70) and 0.56 (95% CI: 0.32, 0.98). As a result, the authors concluded that the association between estradiol and prostate cancer risk was nonlinear[15]. Yet while central estimates across quantiles do not follow a linear trend, the confidence intervals around each estimate are wide, and it is hard to tell exactly how much evidence of nonlinearity is provided by the data. With formal nonlinear modeling, conversely, evidence of nonlinearity can be tested directly, by a joint test on the nonlinear terms.
Approaches using splines or fractional polynomials to model nonlinearity are also open to criticism for being overly sensitive to the placement of knots or choice of polynomial terms. Although it is no doubt true that, in some cases, the exposurerisk relationship is sensitive to decisions about modeling, we would argue that categorization is comparably sensitive to the choice of cut points. For example, a study by Filardo et al. found that the association between body mass index [BMI] and inhospital mortality after coronary artery bypass graft surgery was strongly influenced by the way in which BMI was categorized[16]. A model that included cubic splines revealed a strong nonmonotonic relationship in which patients with very low and very high BMI were at increased risk; on the other hand, findings were inconsistent when BMI was categorized, depending on whether tertiles, quartiles or quintiles were used. Clearly, use of nonlinear modeling is not invulnerable to poor analytic practice, such as failure to consider overly influential outlying observations. As ever, good modeling depends on close collaboration with those who have good content knowledge and who can therefore assess whether a curve or subset of observations makes sense  use of appropriate regression diagnostics, and sensitivity analysis, to determine how curves change with alternative model specifications or removal of subsets of observations.
A final argument against the nonlinear modeling to predict risk concerns application to case control studies. Where the risk is fixed by design  such as a mean 25% risk in a casecontrol study with 3:1 matching  applying a regression model to a data set will lead to a misestimation of risk. However, this can be easily remedied by recalibration. A simple approach is to add a constant (sometimes described as a Bayes factor) to the linear prediction[17]. Alternatively, imputation approaches can be used to estimate levels of the exposure in controls not subject to sampling[18].
Summary
We are far from the first to argue against the categorization of continuous variables[4, 5] or advocate for the use of nonlinear modeling of continuous variables[11, 12, 19]. Yet categorization, particularly by use of quantiles, remains extremely common in the epidemiologic literature. We hypothesize that this is largely due to historical precedent. Nonlinear models are computationally intensive, and would have been extremely difficult to implement in the early years of modern epidemiology. Doll and BradfordHill had little choice but to categorize the continuous variable of smoking; however, with modern statistical methods and computing, there is no need to follow blindly in their wake.
We are not advocating for the complete abandonment of categorization. In fact, we often use quantiles in the preliminary assessment of exposureoutcome relationships [13, 20]. But we go on to model these relationships directly, using nonlinear terms. Doing so allows us to avoid implausible assumptions that risk does not vary within categories, and report results based on our entire sample for clinically sensible cutpoints that can be compared between different studies[13, 20]. We feel that there are very few, if any, circumstances in which the use of quantiles would be the preferred method of reporting results. As such, we encourage other investigators to abandon use of categorization as a principal analysis in epidemiologic research.
Abbreviations
 CI:

Confidence interval
 BMI:

Body mass index
 PSA:

Prostatespecific antigen.
Declarations
Authors’ Affiliations
References
 Doll R, Hill AB: Smoking and carcinoma of the lung; preliminary report. Br Med J. 1950, 2 (4682): 739748. 10.1136/bmj.2.4682.739.View ArticlePubMedPubMed CentralGoogle Scholar
 Thiebaut AC, Kipnis V, Chang SC, Subar AF, Thompson FE, Rosenberg PS, Hollenbeck AR, Leitzmann M, Schatzkin A: Dietary fat and postmenopausal invasive breast cancer in the National Institutes of HealthAARP Diet and Health Study cohort. J Natl Cancer Inst. 2007, 99 (6): 451462. 10.1093/jnci/djk094.View ArticlePubMedGoogle Scholar
 Tu JV, Austin PC, Chan BT: Relationship between annual volume of patients treated by admitting physician and mortality after acute myocardial infarction. JAMA. 2001, 285 (24): 31163122. 10.1001/jama.285.24.3116.View ArticlePubMedGoogle Scholar
 Rothman K, Greenland S, Lash T: Modern Epidemiology. 2008, Lippincott Williams & Wilkins, 3:Google Scholar
 Greenland S: Avoiding power loss associated with categorization and ordinal scores in doseresponse and trend analysis. Epidemiology. 1995, 6 (4): 450454. 10.1097/0000164819950700000025.View ArticlePubMedGoogle Scholar
 Petot GJ, Vega U, Traore F, Fritsch T, Debanne SM, Friedland RP, Lerner AJ: Height and Alzheimer's disease: findings from a casecontrol study. J Alzheimers Dis. 2007, 11 (3): 337341.PubMedGoogle Scholar
 Lindemann K, Vatten LJ, EllstromEngh M, Eskild A: Serum lipids and endometrial cancer risk: results from the HUNTII study. Int J Cancer. 2009, 124 (12): 29382941. 10.1002/ijc.24285.View ArticlePubMedGoogle Scholar
 Michaud DS, Wright ME, Cantor KP, Taylor PR, Virtamo J, Albanes D: Arsenic concentrations in prediagnostic toenails and the risk of bladder cancer in a cohort study of male smokers. Am J Epidemiol. 2004, 160 (9): 853859. 10.1093/aje/kwh295.View ArticlePubMedGoogle Scholar
 Schrag D, Cramer LD, Bach PB, Cohen AM, Warren JL, Begg CB: Influence of hospital procedure volume on outcomes following surgery for colon cancer. JAMA. 2000, 284 (23): 30283035. 10.1001/jama.284.23.3028.View ArticlePubMedGoogle Scholar
 Simunovic M, Rempel E, Theriault ME, Coates A, Whelan T, Holowaty E, Langer B, Levine M: Influence of hospital characteristics on operative death and survival of patients after major cancer surgery in Ontario. Can J Surg. 2006, 49 (4): 251258.PubMedPubMed CentralGoogle Scholar
 Greenland S: Doseresponse and trend analysis in epidemiology: alternatives to categorical analysis. Epidemiology. 1995, 6 (4): 356365. 10.1097/0000164819950700000005.View ArticlePubMedGoogle Scholar
 Royston P: A strategy for modelling the effect of a continuous covariate in medicine and epidemiology. Stat Med. 2000, 19 (14): 18311847. 10.1002/10970258(20000730)19:14<1831::AIDSIM502>3.0.CO;21.View ArticlePubMedGoogle Scholar
 Vickers AJ, Bianco FJ, Serio AM, Eastham JA, Schrag D, Klein EA, Reuther AM, Kattan MW, Pontes JE, Scardino PT: The surgical learning curve for prostate cancer control after radical prostatectomy. J Natl Cancer Inst. 2007, 99 (15): 11711177. 10.1093/jnci/djm060.View ArticlePubMedGoogle Scholar
 Weinberg CR: How bad is categorization?. Epidemiology. 1995, 6 (4): 345347. 10.1097/0000164819950700000002.View ArticlePubMedGoogle Scholar
 Gann PH, Hennekens CH, Ma J, Longcope C, Stampfer MJ: Prospective study of sex hormone levels and risk of prostate cancer. J Natl Cancer Inst. 1996, 88 (16): 11181126. 10.1093/jnci/88.16.1118.View ArticlePubMedGoogle Scholar
 Filardo G, Hamilton C, Hamman B, Ng HK, Grayburn P: Categorizing BMI may lead to biased results in studies investigating inhospital mortality after isolated CABG. J Clin Epidemiol. 2007, 60 (11): 11321139. 10.1016/j.jclinepi.2007.01.008.View ArticlePubMedGoogle Scholar
 Ulmert D, Cronin AM, Bjork T, O'Brien MP, Scardino PT, Eastham JA, Becker C, Berglund G, Vickers AJ, Lilja H: Prostatespecific antigen at or before age 50 as a predictor of advanced prostate cancer diagnosed up to 25 years later: a casecontrol study. BMC Med. 2008, 15: 66.View ArticleGoogle Scholar
 Vickers AJ, Gupta A, Savage CJ, Pettersson K, Dhalin A, Bjartell A, Manjer J, Scardino PT, Ulmert D, Lilja H: A panel of kallikrein marker predicts prostate cancer in a large, populationbased cohort followed for 15 years without screening. Cancer Epidemiol Biomarkers Prev. 2011, 20 (2): 25561. 10.1158/10559965.EPI101003.View ArticlePubMedGoogle Scholar
 Royston P, Sauerbrei W: Building multivariable regression models with continuous covariates in clinical epidemiologywith an emphasis on fractional polynomials. Methods Inf Med. 2005, 44 (4): 561571.PubMedGoogle Scholar
 Vickers AJ, Savage CJ, Hruza M, Tuerk I, Koenig P, MartinezPineiro L, Janetschek G, Guillonneau B: The surgical learning curve for laparoscopic radical prostatectomy: a retrospective cohort study. Lancet Oncol. 2009, 10 (5): 475480. 10.1016/S14702045(09)700798.View ArticlePubMedPubMed CentralGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/12/21/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.