 Technical advance
 Open Access
 Published:
Semiparametric estimation of the attributable fraction when there are interactions under monotonicity constraints
BMC Medical Research Methodology volume 20, Article number: 236 (2020)
Abstract
Background
The population attributable fraction (PAF) is the fraction of disease cases in a sample that can be attributed to an exposure. Estimating the PAF often involves the estimation of the probability of having the disease given the exposure while adjusting for confounders. In many settings, the exposure can interact with confounders. Additionally, the exposure may have a monotone effect on the probability of having the disease, and this effect is not necessarily linear.
Methods
We develop a semiparametric approach for estimating the probability of having the disease and, consequently, for estimating the PAF, controlling for the interaction between the exposure and a confounder. We use a tensor product of univariate Bsplines to model the interaction under the monotonicity constraint. The model fitting procedure is formulated as a quadratic programming problem, and, thus, can be easily solved using standard optimization packages. We conduct simulations to compare the performance of the developed approach with the conventional Bsplines approach without the monotonicity constraint, and with the logistic regression approach. To illustrate our method, we estimate the PAF of hopelessness and depression for suicidal ideation among elderly depressed patients.
Results
The proposed estimator exhibited better performance than the other two approaches in the simulation settings we tried. The estimated PAF attributable to hopelessness is 67.99% with 95% confidence interval: 42.10% to 97.42%, and is 22.36% with 95% confidence interval: 12.77% to 56.49% due to depression.
Conclusions
The developed approach is easy to implement and supports flexible modeling of possible nonlinear relationships between a disease and an exposure of interest.
Background
The population attributable fraction (PAF) is the fraction of disease cases in a sample that can be attributed to the exposure. The PAF is an important measure of the public health impact of an exposure on disease burden, and thus it is useful to prioritize public health interventions [1, 2]. The maximum likelihood method is commonly used to estimate the PAF. However, this approach to estimation requires a correct model for the probability of disease given the exposure and other covariates subject to the ignorable treatment assignment assumption [3] (to be reviewed in the next section). Logistic regression is typically used to model the probability of disease given the exposure and the other covariates [1, 4, 5].
The PAF of an exposure must fall between 0 and 1 by definition. A zero PAF indicates that the disease risk is irrelevant of the exposure levels. In contrast, a higher PAF value indicates a stronger association between disease risk and the exposure level. In the extreme, a PAF equal to 1 implies no disease risk when there is no exposure. If the disease risk is higher in the absence of the exposure than in the presence of the exposure, the PAF then has no proper meaning. Thus, the definition of the PAF itself also suggests a monotone relationship between the disease risk and the exposure level (a justification is presented at the end of the next section). Incorporating the monotonicity assumption into the estimation of the PAF provides performance gains when there are no other covariates [6].
In many research fields, the exposure is thought to have a monotone effect on the probability of having the disease, i.e., the probability is a monotone function of the exposure. For instance, a doseresponse curve is often thought to be nondecreasing [7, 8]. One such example is the probability of suicidal ideation, which is thought to be an increasing function of both hopelessness and depression [9–12]. Hence, it is desirable to model the probability of suicidal ideation under the monotonicity constraint of both hopelessness and depression.
This situation also presents an analytic challenge. When there are other covariates, they can interact with the exposure, for instance, the interaction of two drugs [13]. In our example, hopelessness is a system of negative expectations concerning one’s future life and can cause depression. On the other hand, current depression can influence one’s hopelessness towards the future [14, 15]. Thus, an interaction between hopelessness and depression can present to have a joint effect on suicidal ideation.
These examples highlight, how in many studies, the effect of the exposure can be complicated and is not necessarily linear, including the common analysis of drug interactions [16]. We bring together several past innovations to propose a novel analytic solution. [17] study the PAF when there are joint effects or interactions. [18] develop an approach of estimating logistic regression models with interactions and monotonicity constraints. The authors [18] apply the approach to estimating the PAF and achieve substantially more accurate estimates in some settings than the usual approach which uses logistic regression without monotonicity constraints.
Semiparametric approaches have been applied to estimate relative risk functions [19], to calculate odds ratios [20], and to estimate effect measures in the presence of interactions [21]. Herein, we use Bsplines [22] (see also [23, 24] and references therein), to develop a semiparametric approach to estimate the PAF in the presence of interactions with confounding. The model fitting procedure is formulated as a well studied quadratic programming problem, and, thus, can be easily solved using standard optimization packages. We implement the approach using the R function solve.QP in the package quadprog [25, 26]. After a simulation study, we illustrate our new method by examining hopelessness and depression for suicidal ideation among elderly depressed patients from the PROSPECT (Prevention of Suicide in Primary Care Elderly: Collaborative Trial) study [27]. Specifically, we model the interaction between hopelessness and depression under the monotonicity constraint.
Methods
We develop the semiparametric approach to estimate PAF accounting for interactions under the monotonicity constraint using Bsplines in the last two subsections. We compare the performance of the following three approaches: the approach we developed (monB), the conventional Bsplines (conB) approach without the monotonicity constraint but with the same knots, and the logistic regression approach (logit). Comparisons are made through simulation studies and a case study of estimating the PAF for suicidal ideation attributable to hopelessness or depression. To save computation time, we use a small number of basis functions, quadratic Bsplines with knots placed at quantiles of the distribution of unique predictor values [28, P. 24]. We use quartiles {0, 1/4, 1/2, 3/4 and 1} as the knot locations.
Simulations
Let Y denote presence (1) or absence (0) of the outcome, Z the exposure, X a confounder, and V an additional covariate. The two interacting covariates Z and X are simulated independently from the Uniform [0,1] distribution where a 0 value is always part of the simulated z. The additional covariate V is simulated from a Bernoulli distribution with p=1/2. The outcome is simulated from a Bernoulli distribution with the following probability functions, A: 0.1+0.4z+0.3x−0.2xz+0.2v; B: 2(0.1+2 log(z/2+1))x/3+0.3v; and C: \(0.8\sqrt {z+0.1}x^{2}+0.1v\). Shapes of the models at a fixed v are provided in Figure S1 of the supplementary material.
We examined 1000 simulations for each sample size 100 and 200. We compared the absolute value of the bias, the variance, and the mean squared error (MSE) of estimating the PAF attributable to the exposure Z. The true PAF values from these models are respectively 0.3, 0.4404, and 0.4616. They are calculated from (4) where the integrals are computed using R functions integral and integral2 [29].
Illustrative case study
In the PROSPECT study, we focus on suicidal ideation four months after the beginning of the study. We consider the 592 patients in the study with no missing data. An event was observed if the score for suicidal ideation is greater than zero [27]. Following [30], we use the Beck Hopelessness Scale [31, BHS] to measure hopeless, and the Beck Depression Inventory score [12, BDI] to measure depression. The BHS and the BDI range from 0 to 19 and from 0 to 17, respectively, where a higher value means more hopelessness or depression.
Figure 1 shows the sample average of suicidal ideation by BDI and BHS scores, while Fig. 2 shows the corresponding patient frequency. Many patients with low BDI and BHS scores experienced no suicidal ideation. Overall, the risk of suicidal ideation increases with BDI and BHS scores, though the patient frequency decreases. The figures also show that high BHS scores are associated with high BDI scores. The PAF for hopelessness is the proportion of suicide ideation that would be prevented if all patients’ hopelessness was reduced to 0 on the BHS scale, while keeping BDI fixed. Similarly, the PAF for depression is the proportion of suicide ideation that would be prevented if all patients’ depression was reduced to 0 on the BDI scale, while BHS fixed.
Semiparametric estimation of the probability
Suppose Y is distributed in the exponential family with mean μ [32]. The logit link function connects μ with the exposure and the confounder through a smooth function f as log[μ/(1−μ)]=f(z,x). [18] models f(z,x) parametrically assuming linearity as f(z,x)=β_{0}+β_{1}z+β_{2}x+β_{3}x z and estimates the coefficients under the constraint that β_{1}z+β_{3}x z≤β_{1}z^{∗}+β_{3}x z^{∗} when z≤z^{∗} at every x. We use a flexible semiparametric approach to model a possible nonlinear f(z,x) by linear combinations of Bsplines basis functions [22] under the constraint f(z,x)≤f(z^{∗},x) when z≤z^{∗} at every x. Let \(\boldmath {\psi }(z)=\left (\psi _{1}(z),\ldots,\psi _{P}(z)\right)'\) be a set of Bspline basis functions. Let \(\boldmath {\phi }(x)=\left (\phi _{1}(x),\ldots,\phi _{Q}(x)\right)'\) be another set of Bspline basis functions. We model f(z,x) as
where B is the unknown coefficient matrix. Using Kronecker product ⊗ and the vectorization operator vec described in Section S1 of the supplementary material, we further obtain
The maximum likelihood estimate of β without the constraint can be viewed as a modified iteratively weighted least squares problem [33, 34]. Derivation in Section S2 of the supplementary material shows that the constraint f(z,x)≤f(z^{∗},x) when z≤z^{∗} at every x can be expressed as Aβ≥0, where the matrix A has a special pattern. The derivation in Section S3 of the supplementary material shows that the estimation procedure can be expressed as a quadratic programming problem. The approach is also capable of finding the estimate under the additional constraint that f(z,x) is monotone in x.
Estimation of the PAF
To be general, let X be a vector of measured covariates. Let Y_{0} denote what the presence or absence would be if the exposure were to be eliminated. Let P be the probability measure. In particular, let P(Y_{0}=1) be the “hypothetical probability of disease in the same population but with all exposure eliminated” [35, 36]. The PAF for the exposure Z is the proportion of disease that would be eliminated if Z=0,
To identify the PAF based on observed data, we make the assumption that all confounders of the diseaseexposure relationship are measured and contained in X. We assume consequently that the ignorable treatment assignment [3] holds: P(Y_{0}=1X=x)=P(Y_{0}=1X=x,Z=0)=P(Y=1X=x,Z=0). Under this assumption, the PAF can be written as
which is equal to the expression (2.2) for the PAF in [36]. The PAF is also written as
where P(·Y=1) is the conditional probability given Y=1 in the subpopulation of people with the disease [1, 5, 37]. We provide a proof of the equivalence between (4) and (5) in Section S4 of the supplementary material.
From a random sample of the population of size n, i=1,…,n, an estimate of the PAF then follows as
where \(I_{\{Y_{i}=1\}}\) is an indicator function [18]. Justification of the convergence of the estimate (6) to the PAF in (5) is provided by [6] in the scenario of no other covariates. A similar proof adjusting for the X can also be derived. Thus, an accurate estimate of the probability of the disease would provide a reasonable estimate of the PAF. The defined PAF in (3) as a proportion has no practical meaning if P(Y_{0}=1)>P(Y=1). Similarly in (6), occurrences of \(\hat {P}\left (Y_{i}=1Z_{i}=0,\textbf {X}_{i}=\textbf {x}_{i}\right)>\hat {P}\left (Y_{i}=1Z_{i}=z_{i},\textbf {X}_{i}=\textbf {x}_{i}\right)\) can result in an estimated PAF out of the [0,1] range. Hence, an estimate of the probability with the monotonicity constraint ensures a reasonable estimate of the PAF.
Results
Simulation
Table 1 summarizes the proportion of the times that the estimated PAF was within the [0,1] range for the three approaches. As discussed, the monotonicity constraint ensures the estimated PAF is within [0,1]. The lack of a constraint results in PAF estimates that are sometimes out of the range using the logistic regression and the conventional Bsplines approaches.
Table 2 shows the performance of the three approaches of estimating the PAF. In the comparison, results from the conventional Bsplines approach are not shown due to its large scale. Instead, we show the results from this approach by censoring the original estimate at 0 if it is negative, or at 1 if it is bigger than 1. Similarly, we obtain the censored estimate from the logistic regression approach. In general, the censored estimate improves the original estimate. Overall, the developed approach outperformed the other approaches in the settings that we examined.
Case study
The estimated probability of suicidal ideation is shown in Fig. 3 separately by the three approaches. Both the logistic regression and the developed approach demonstrate a monotone relationship. Fitted probabilities numerically 0 or 1 occurred using the conventional Bsplines approach. Table 3 summarizes the estimated PAF by the three approaches. The logistic regression produces lower PAF attributable to BHS and higher PAF attributable to BDI. Due to the numerical instability, the conventional Bsplines estimated PAF is out of the [0,1] range. We examine the numerical instability in the “Discussion” section.
We use the bootstrap method to obtain the 95% confidence intervals of the estimates. The lower bound is the 2.5% quantile of the 1000 bootstrap estimates and the upper bound is the 97.5% quantile. The lower bound of the logistic regression estimated PAF attributable to BDI is negative. In fact, 28 of the estimated PAF’s are negative with the minimum being −32.45%. Among the 1000 estimates, two of the estimated PAF attributable to BHS are negative with the minimum being −4.27%.
Discussion
We used Bsplines to develop a semiparametric estimate of the probability of an event under the monotonicity constraint and accounting for interactions. The approach is solved as a quadratic programming problem and can be easily implemented using the statistical software R. Using a boosting technique [38, 39] implement a similar approach to estimate β under the same constraint Aβ≥0 to solve the problem defined in equation (3) of the supplementary material. In the settings that have been tried, a comparison of fitting a univariate generalized linear model under the monotonicity constraint showed that the boosting algorithm is computationally intensive [23]. A summary of other approaches in the literature for estimation when there are monotonicity constraints and interactions is provided in Section S8 of the supplementary material.
Throughout our study, we placed the knots at the quartiles of the distribution of the unique predictors. We observed similar performance of the conventional Bsplines approach and the developed approach when the knots are placed at the tertiles {0, 1/3, 2/3, 1}. In the simulation study, Tables S1 and S2 of the supplementary material show that overall the developed approach has better performance. In the case study, using the developed approach, the estimated PAF attributable to BHS was 67.66% (43.41%,95.13%), and the estimated PAF attributable to BDI was 23.51% (10.92%,55.62%) (Table S3). Similar numerical instability was observed using the conventional Bsplines approach (Figure S2 and Table S3). The estimated PAF is also out of the defined range between 0 and 1 using the conventional approach when knots are placed at the quantiles {0, 1/2, 1} (Table S3). Performance of the developed approach is robust to the knots placement.
Conclusions
We developed a semiparametric estimate of the probability of an event using Bsplines. Our approach can model a monotone relationship between the response and covariates, and can account for interactions. We applied the approach to estimate the PAF, and compared the performance of the estimator with the logistic regression approach and the conventional approach without the monotonicity constraint. Simulation studies showed that the developed estimator outperforms the other two approaches.
Availability of data and materials
The R code to implement the developed approach is available in the supplementary material. The data that support the findings of this study are available from Martha Bruce, but are not publicly available. Data are however available from the authors upon reasonable request and with permission of Martha Bruce.
Abbreviations
 BDI:

Beck depression inventory
 BHS:

Beck hopelessness scale
 conB:

Conventional Bsplines approach without the monotonicity constraint
 logit:

Logistic regression
 monB:

Developed monotone Bsplines approach
 PAF:

Population attributable fraction.
References
Deubner DC, Wilkinson WE, Helms MJ, Tyroler HA, Hames CG. Logistic model estimation of death attributable to risk factors for cardiovascular disease in Evans County, Georgia. Am J Epidemiol. 1980; 112(1):135–43.
Rothman KJ, Greenland S. Causation and causal inference In: Rothman KJ, Greenland S, editors. Modern Epidemiology. Philadelphia, Pennsylvania: Lippincott, Williams & Wilkins: 1998.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983; 70(1):41–55.
Bruzzi P, Green SB, Byar DP, Brinton LA, Schairer C. Estimating the population attributable risk for multiple risk factors using casecontrol data. Am J Epidemiol. 1985; 122(5):904–14.
Greenland S, Drescher K. Maximum likelihood estimation of the attributable fraction from logistic models. Biometrics. 1993; 49(3):865–72.
Wang W, Small D. A comparative study of parametric and nonparametric estimates of the attributable fraction for a semicontinuous exposure. Int J Biostat. 2012; 8(1):1–22.
Kelly C, Rice J. Monotone smoothing with application to doseresponse curves and the assessment of synergism. Biometrics. 1990; 46(4):1071–85.
Kong M, Eubank RL. Monotone smoothing with application to doseresponse curve. Commun Stat Simul Comput. 2006; 35:991–1004.
Beck AT. Depression: Clinical, Experimental, and Theoretical Aspects. New York: Hoeber Medical Division, Harper & Row; 1967, p. 370.
Wetzel RD. Hopelessness, depression, and suicide intent. Arch Gen Psychiatr. 1976; 33(9):1069–73.
Beck AT, Steer RA, Kovacs M, Garrison B. Hopelessness and eventual suicide: a 10year prospective study of patients hospitalized with suicidal ideation. Am J Psychiatr. 1985; 142(5):559–63.
Beck AT, Brown G, Berchick RJ, Stewart BL, Steer RA. Relationship between hopelessness and ultimate suicide: A replication with psychiatric outpatients. Am J Psychiatr. 1990; 147(2):190–5.
Kong M, Lee JJ. A generalized response surface model with varying relative potency for assessing drug interaction. Biometrics. 2006; 62:986–95.
Stotland E. The Psychology of Hope. San Francisco, California: JosseyBass; 1969, p. 284.
Beck AT, Kovacs M, Weissman A. Hopelessness and suicidal behavior: an overview. J Am Med Assoc. 1975; 234(11):1146–9.
Kong M, Lee JJ. A semiparametric response surface model for assessing drug interaction. Biometrics. 2008; 64:396–405.
Hössjer O, Kockum I, Alfredsson L, Hedström AK, Olsson T, Lekman M. A general framework for and new normalization of attributable proportion. Epidemiol Methods. 2017; 6(1):1–25.
Traskin M, Wang W, Ten Have TR, Small DS. Efficient estimation of the attributable fraction when there are monotonicity constraints and interactions. Biostatistics. 2013; 14(1):173–88.
Zhao LP, Kristal AR, White E. Estimating relative risk functions in casecontrol studies using a nonparametric logistic regression. Am J Epidemiol. 1996; 144(6):598–609.
Figueiras A, CadarsoSuárez C. Application of nonparametric models for calculating odds ratios and their confidence intervals for continuous exposures. Am J Epidemiol. 2001; 154(3):264–75.
CadarsoSuárez C, RocaPardiñas J, Figueiras A. Effect measures in nonparametric regression with interactions between continuous exposures. Stat Med. 2006; 25(4):603–21.
de Boor C. A Practical Guide to Splines. New York: Springer; 1978.
Wang W, Small D. Monotone Bspline smoothing for a generalized linear model response. Am Stat. 2015; 69(1):28–33.
Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M. A review of spline function procedures in R. BMC Med Res Methodol. 2019; 19(1):46.
Goldfarb D, Idnani A. Dual and primaldual methods for solving strictly convex quadratic programs In: Hennart JP, editor. Numerical Analysis. Berlin: Springer: 1982. p. 226–39.
Goldfarb D, Idnani A. A numerically stable dual method for solving strictly convex quadratic programs. Math Program. 1983; 27:1–33.
Bruce ML, Ten Have TR, Reynolds CF, Katz II, Schulberg HC, Mulsant BH, Brown GK, McAvay GJ, Pearson JL, Alexopoulos GS. Reducing suicidal ideation and depressive symptoms in depressed older primary care patients: a randomized controlled trial. J Am Med Assoc. 2004; 291(9):1081–91.
Hastie TJ, Tibshirani RJ. Generalized Additive Models. Boca Raton, Florida, USA: Chapman & Hall/CRC; 1990.
Borchers HW. Pracma: Practical Numerical Math Functions. 2019. R package version 2.2.9. https://CRAN.Rproject.org/package=pracma. Accessed 15 Dec 2019.
Cheung YB, Law CK, Chan B, Liu KY, Yip PS. Suicidal ideation and suicidal attempts in a populationbased study of Chinese people: risk attributable to hopelessness, depression, and social factors. J Affect Disord. 2006; 90(23):193–9.
Beck AT, Steer RA. Manual for the Beck hopelessness scale. San Antonio, TX: Psychological Corporation; 1988.
McCullagh P, Nelder JA. Generalized Linear Models, 2nd edn. Boca Raton, Florida: Chapman & Hall; 1989.
O’Sullivan F, Yandell BS, Raynor WJJ. Automatic smoothing of regression functions in generalized linear models. J Am Stat Assoc. 1986; 81(393):96–103.
Eilers PHC, Marx BD. Flexible smoothing with bsplines and penalties. Stat Sci. 1996; 11:89–121.
Benichou J. A review of adjusted estimators of attributable risk. Stat Methods Med Res. 2001; 10:195–216.
Sjölander A, Vansteelandt S. Doubly robust estimation of attributable fractions. Biostatistics. 2011; 12:112–21.
Benichou J, Gail MH. Variance calculations and confidence intervals for estimates of the attributable risk based on logistic models. Biometrics. 1990; 46(4):991–1003.
Hofner B, Mayr A, Robinzonov N, Schmid M. Modelbased boosting in R: a handson tutorial using the R package mboost. Comput Stat. 2014; 29:3–35.
Hofner B, Kneib T, Hothorn T. A unified framework of constrained regression. Stat Comput. 2016; 26:1–14.
Acknowledgements
The authors thank the two anonymous reviewers whose comment helped improving the overall quality of the manuscript.
Funding
Michael O. Harhay was supported by grant R00 HL141678 from the National Heart, Lung, and Blood Institute (NHLBI) of the US National Institutes of Health (NIH). The funding body had no role in study design, data collection and analysis, or decision to publish.
Author information
Authors and Affiliations
Contributions
All the authors made substantial contributions to the manuscript. WW conducted the data analysis and simulation studies, and drafted the manuscript. DSS and MOH reviewed it and revised it. All three authors approved the manuscript to be submitted.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Data from the PROSPECT (Prevention of Suicide in Primary Care Elderly: Collaborative Trial) study [27] was used for this study. The research protocol received full review and approval from the institutional review board of each of the 3 universities participating in the PROSPECT trial and written informed consent was obtained for all participants.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file
The supplementary material contains the following sections: S1, a review of the Kronecker product and the vectorization operator; S2 and S3, technical details of the developed approach; S4, proof of the equivalence between Eqs. (4) and (5); S5, a figure of the models used in the simulation studies; S6, additional simulation results where the knots are placed at the tertiles; S7, additional data analysis results where the knots are placed at the tertiles and at the quantiles {0, 1/2, 1}; S8 literature review of relevant statistical methods; and S9, R code to implement the developed approach.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, W., Small, D.S. & Harhay, M.O. Semiparametric estimation of the attributable fraction when there are interactions under monotonicity constraints. BMC Med Res Methodol 20, 236 (2020). https://doi.org/10.1186/s12874020011184
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874020011184
Keywords
 Attributable fraction
 Bsplines
 Interaction
 Monotonicity constraint
 Quadratic programming