 Research article
 Open Access
 Open Peer Review
 Published:
Sharp bounds on sufficientcause interactions under the assumption of no redundancy
BMC Medical Research Methodology volume 17, Article number: 71 (2017)
Abstract
Background
Sufficientcause interaction is a type of interaction that has received much attention recently. The sufficient component cause model on which the sufficientcause interaction is based is however a nonidentifiable model. Estimating the interaction parameters from the model is mathematically impossible.
Methods
In this paper, I derive bounding formulae for sufficientcause interactions under the assumption of no redundancy.
Results
Two real data sets are used to demonstrate the method (R codes provided). The proposed bounds are sharp and sharper than previous bounds.
Conclusions
Sufficientcause interactions can be quantified by setting bounds on them.
Background
A common aim of many observational studies is to identify risk factors for disease. Once risk factors have been identified, researchers will often be interested in knowing whether any two factors can interact in causing the disease. ‘Sufficientcause interaction’ (also referred to as ‘synergism’, ‘causal coaction’, ‘causal mechanistic interaction’, or simply ‘mechanistic interaction’) is a type of interaction that has received much attention recently [1–11] and is based on Rothman’s sufficient component cause model [12, 13]. The model posits that the causation of disease can be through any one of many different mechanisms or pathways. A mechanism/pathway requires several different component causes to operate, hence it is also called a ‘causal pie’. If two factors participate in the same causal pie, then a sufficientcause interaction can be said to exist between them.
If the monotonicity assumption is not imposed [14–18], the sufficient component cause model in its general form is overparameterized and nonidentifiable. That is, the total number of model parameters exceeds the total degrees of freedom the data can offer. For example, two binary risk factors mean the data can offer at most four degrees of freedom (four different exposure profiles) but the model has a total of nine parameters, each corresponding to one of the nine possible causalpie classes (one ‘allunknown’ class unrelated to either factor, two maineffect classes for each factor, and four twofactor interaction classes). [If the monotonicity assumption is imposed on the two factors, the number of causalpie classes reduces to four (one ‘allunknown’ class unrelated to either factor, one maineffect class for each factor, and one twofactor interaction class), and the model becomes identifiable.] Researchers recently found ways to circumvent the nonidentifiability problem and have developed methods to test for sufficientcause interactions without imposing the monotonicity assumption [1–11]. It is however mathematically impossible to estimate the interaction parameters from a truly nonidentifiable sufficient component cause model. At best, bounds can be set.
In this paper, I derive the bounding formulae for sufficientcause interactions under the assumption of no redundancy [6–11, 19]. R codes for all computations are provided for convenience and the method is demonstrated with two real datasets. The proposed bounds will also be shown to be sharp and sharper than previous bounds [20].
Methods
Notations and definitions
This paper closely follows the notations used in previous studies [6–11]. Here, we are interested in the relationship between two exposures and a binary outcome (e.g., disease/no disease). We assume a population is studied from time 0 to T. The two exposures (X _{1} and X _{2}) can have arbitrarily many levels (a total of L _{1} ≥ 2 and L _{2} ≥ 2, respectively). We assume that the exposure profile for a person does not change over time during the study period and is represented by profile = x _{1},x _{2}, with x _{1} ∈ {1,…,L _{1}} and x _{2} ∈ {1,…,L _{2}} We assume that there is no loss to follow up and competing death during this study period. Let D = 1 represent disease occurrence in (0, T), and D = 0, otherwise. We assume D is known but the exact time of disease occurrence, if ever, is unknown to researchers. (D is a binary outcome within a defined period, not a timetoevent outcome.) It is assumed that there is no confounding, selection bias or measurement error in the study. The associations between the two exposures and the disease should reflect the genuine causal effects of the exposures on the disease.
While there is only a total of L _{1} × L _{2} exposure profiles, there is a total of (L _{1} + 1) × (L _{2} + 1) different causalpie classes, including one allunknown class, L _{1} + L _{2} maineffect classes, and L _{1} × L _{2} interaction classes. (Figure 1 in Lee’s paper [7] depicts (2 + 1) × (2 + 1) = 9 causalpie classes in total for two binary exposures.) The causalpie classes can be represented by class = c _{1},c _{2}, with c _{1} ∈ {*,1,…,L _{1}} and c _{2} ∈ {*,1,…,L _{2}}. Note that here we introduce a null notation *, such that a class contains for k = 1,2, “X _{ k } = c _{ k }” as one of its component causes if c _{ k } ≠ *, and does not involve X _{ k } whatsoever if c _{ k } = *. For example, the allunknown class involving neither X _{1} nor X _{2} is represented by class = *,*; the maineffect classes are represented by class = c _{1},* with c _{1} ≠ * for X _{1}only classes, and class = *,c _{2} with c _{2} ≠ * for X _{2}only classes; and the interaction classes are represented by class = c _{1},c _{2} with c_{1} ≠ * and c_{2} ≠ *.
The sufficient component cause model is partly deterministic and partly stochastic. The presence of risk factor(s) alone is not sufficient for the disease. Only when all unknown components (complement causes) also appear can the sufficient cause become complete and the disease occur. We let \( {U}_{c_1,{c}_2}=1 \) represent the arrival of the unknown components of the class = c _{1},c _{2} causalpie class in (0, T), and \( {U}_{c_1,{c}_2}=0 \), otherwise, for c _{1} ∈ {∗, 1, …, L _{1}} and c _{2} ∈ {∗, 1, …, L _{2}}.
Cumulative disease risk, cumulative completion risk, and relative prevalence
Let \( {\mathrm{Risk}}^{\mathrm{profile}={x}_1,{x}_2} \) denote the cumulative disease risk in (0, T) for people in the population with profile = x _{1}, x _{2}, that is, Pr(D = 1X _{1} = x _{1}, X _{2} = x _{2}). Let Risk_{class = i,j } denote the cumulative completion risk in (0, T) for a specific class = i, j sufficientcause interaction, that is, Pr(U _{ ij } = 1) for the specific i ∈ {1, …, L _{1}} and j ∈ {1, …, L _{2}}. Let Risk_{class = int} denote the cumulative completion risk in (0, T) for the global sufficientcause interaction (sufficientcause interaction regardless of classes), that is, \( \Pr \left[{\displaystyle \underset{\begin{array}{l} i\in \left\{1,\dots, {L}_1\right\},\\ {} j\in \left\{1,\dots, {L}_2\right\}\end{array}}{\cup}\left({U}_{ij}=1\right)}\right] \). Let Risk_{class = any} denote the cumulative completion risk over (0, T) for any class (allunknown, maineffect, or interaction), that is, \( \Pr \left[{\displaystyle \underset{\begin{array}{l} i\in \left\{\ast, 1,\dots, {L}_1\right\},\\ {} j\in \left\{\ast, 1,\dots, {L}_2\right\}\end{array}}{\cup}\left({U}_{ij}=1\right)}\right] \), or equivalently, the proportion of those excluding the ‘immune’ persons in the study population during the study period. (An immune person is one who will not contract the disease during the study period, no matter what exposure profile he/she might contrarytofact assume.)
If the disease is rare we would always expect the above cumulative completion risks (or period prevalence, since these are defined for subjects in the study population over the study period) to be close to 0. To be informative for interactions for rare diseases, here we follow Sjölander et al.’s suggestion [20] to define the relative prevalence (RP) for the specific sufficientcause interactions: \( {\mathrm{RP}}_{\mathrm{class}= i, j}=\frac{{\mathrm{Risk}}_{\mathrm{class}= i, j}}{{\mathrm{Risk}}^{\mathrm{profile}= i, j}}, \) for the specific i ∈ {1, …, L _{1}} and j ∈ {1, …, L _{2}}. In addition, we also define a relative prevalence for the global sufficientcause interaction: \( {\mathrm{RP}}_{\mathrm{class}=\mathrm{int}}=\frac{{\mathrm{Risk}}_{\mathrm{class}=\mathrm{int}}}{{\mathrm{Risk}}_{\mathrm{class}=\mathrm{any}}}. \) Note that specific and global RPs assume different denominators.
The noredundancy assumption
The noredundancy assumption is a Poissonlike assumption which dictates there can only be at most one arrival event of the unknown components (at most one class of sufficient causes that can be completed) in a sufficiently short time interval for each and every subject in the population [19]. In other words, there are at most (L _{1} + 1) × (L _{2} + 1) + 1 causal response types in a very short time interval, with each of the (L _{1} + 1) × (L _{2} + 1) types corresponding to exactly one causalpie class, plus an additional one for the immune type. The table in Lee’s paper [6] enumerates the total (2 + 1) × (2 + 1) + 1 = 10 causal response types for two binary exposures under the noredundancy assumption. By comparison, the conventional potential outcome model (without the noredundancy assumption) would have a total of \( {2}^{L_1\times {L}_2} \) causal response types, and 2^{2 × 2} = 16 for two binary exposures.
The noredundancy assumption is a relatively weak assumption that can still hold true even if there is a strong dependency in the arrival events. Note that no redundancy is specified only with respect to an infinitesimally short time interval. It says nothing about the entire followup period and can therefore also hold true even for nonrare diseases (diseases with high Risk^{profile = i,j} for i ∈ {1, …, L _{1}} and j ∈ {1, …, L _{2}}). Several sufficientcause interaction tests had previously been developed under this assumption [6–11].
Bounds on sufficientcause interactions under the noredundancy assumption
In Additional file 1, I derive the bounds on sufficientcause interactions under the noredundancy assumption. For the specific sufficientcause interactions, the bounds are (LB in superscript for lower bound; UB for upper bound):
and
respectively, for the specific i ∈ {1, …, L _{1}} and j ∈ {1, …, L _{2}}. For the global sufficientcause interaction, the bounds are:
and
respectively. [Risk^{LB} _{class = int} involves the use of ‘contrast coefficients’. The contrast coefficients for X _{1}, \( \left({u}_1,\dots, {u}_{L_1}\right) \), contains as its elements an equal number of ‘+1’ and ‘−1’ if L _{1} is an even number, and exactly one ‘0’ and an equal number of ‘+1’ and ‘−1’ for the remaining elements if otherwise. The contrast coefficients for X _{2}, \( \left({v}_1,\dots, {v}_{L_2}\right) \), are similarly constructed.]
When both exposures are binary, the lower bound formula is simplified considerably. Formula (1) becomes
for i, j ∈ {1, 2}. Formula (5) becomes
where \( \mathrm{PRISM}=\frac{\left(1{\mathrm{Risk}}^{\mathrm{profile}=2,1}\right)\times \left(1{\mathrm{Risk}}^{\mathrm{profile}=1,2}\right)}{\left(1{\mathrm{Risk}}^{\mathrm{profile}=2,2}\right)\times \left(1{\mathrm{Risk}}^{\mathrm{profile}=1,1}\right)} \) is the ‘peril ratio index of synergy based on multiplicativity’ [7].
Casecontrol study for rare diseases
For a rare disease with exceedingly low risks, we have \( 1\frac{1{\mathrm{Risk}}^{\mathrm{profile}= i, j}}{\left(1{\mathrm{Risk}}^{\mathrm{profile}= i\prime, j}\right)\times \left(1{\mathrm{Risk}}^{\mathrm{profile}= i, j\prime}\right)}\approx {\mathrm{Risk}}^{\mathrm{profile}= i, j}{\mathrm{Risk}}^{\mathrm{profile}= i\prime, j}{\mathrm{Risk}}^{\mathrm{profile}= i, j\prime } \) for (i′ ≠ i) ∈ {1, …, L _{1}} and (j′ ≠ j) ∈ {1, …, L _{2}}, \( 1{\displaystyle \prod_{i=1}^{L_1}{\displaystyle \prod_{j=1}^{L_2}{\left(1{\mathrm{Risk}}^{\mathrm{profile}= i, j}\right)}^{u_i\times {v}_j}}}\approx {\displaystyle \sum_{i=1}^{L_1}{\displaystyle \sum_{j=1}^{L_2}{u}_i\times {v}_j\times {\mathrm{Risk}}^{\mathrm{profile}= i, j}}}, \) and \( 1{\displaystyle \prod_{i=1}^{L_1}{\displaystyle \prod_{j=1}^{L_2}\left(1{\mathrm{Risk}}^{\mathrm{profile}= i, j}\right)}}\approx {\displaystyle \sum_{i=1}^{L_1}{\displaystyle \sum_{j=1}^{L_2}{\mathrm{Risk}}^{\mathrm{profile}= i, j}}}. \) Therefore, the lower bounds on the relative prevalence of sufficientcause interactions are approximately
for the specific i ∈ {1, …, L _{1}} and j ∈ {1, …, L _{2}}, and
where \( {\mathrm{OR}}^{\mathrm{profile}= i, j}=\frac{{\mathrm{Odds}}^{\mathrm{profile}= i, j}}{{\mathrm{Odds}}^{\mathrm{profile}=1,1}}=\frac{{\mathrm{Risk}}^{\mathrm{profile}= i, j}}{1{\mathrm{Risk}}^{\mathrm{profile}= i, j}}/\frac{{\mathrm{Risk}}^{\mathrm{profile}=1,1}}{1{\mathrm{Risk}}^{\mathrm{profile}=1,1}} \) is the odds ratio comparing the profile = i, j subjects with the profile = 1, 1 subjects. These bounds are functions of odds ratios and can therefore be estimated directly from a casecontrol study conducted in the study population.
When both exposures are binary, the bounds reduce to
for the specific i, j ∈ {1, 2}, and
where RERI = OR^{profile = 2,2} − OR^{profile = 2,1} − OR^{profile = 1,2} + 1 is the ‘relative excess risk due to interaction’ in terms of odds ratios [1–5].
Additional file 2 presents two functions written in R code: ‘bounds.cohort’ for cohort data and ‘bounds.cscn’ for casecontrol data. Input the data as the argument and the functions will output the various bounds on sufficientcause interactions. Additionally, the functions also automatically perform 10,000 bootstrap replications to calculate a 95% lower confidence limit for a lower bound and a 95% upper confidence limit for an upper bound.
Results
Example 1. A cohort study of hypertension risk
The data of a cohort study on hypertension risk (taken directly from Example 3 in Zou’s paper [21]) is analyzed here as an example. The cohort study assesses the effects of body mass index (BMI, coded as 1 if BMI ≥ 25 kg/m^{2} and 0 if otherwise) and age (coded as 1 if age ≥ 40 years and 0 if otherwise) on hypertension (coded as 1 if diastolic blood pressure ≥ 90 mmHg and 0 if otherwise). We assume that there is no confounding, selection bias or measurement error in the study and that the followup is 100% complete.
Table 1 presents the bounds and their 95% bootstrapped confidence limits for sufficientcause interactions between BMI and age. The lower bounds for the (high BMI, old age)specific sufficientcause interaction are greater than zero (0.0411 for the cumulative completion risk; 0.1509 for the relative prevalence), but do not achieve statistical significance (as judged from their 95% lower confidence limits which are both zero). As for the global sufficientcause interactions, the lower bounds are 0.0830 (cumulative completion risk) and 0.1758 (relative prevalence), respectively, and are both significantly greater than zero. The upper bound for the cumulative completion risk of the global sufficientcause interaction is 0.4718 with an upper 95% confidence limit of 0.4993.
Example 2. A casecontrol study on lung cancer risk
Zhang et al.’s casecontrol data (directly taken from Table 4 in reference [22]) is analyzed here as the second example. The study examines the genegene interactions between two DNA base excision repair genes on lung cancer risk: the ADPRT (adenosine diphosphate ribosyltransferase) Val762Ala polymorphism and the XRCC1 (Xray repair crosscomplementing group 1) Arg366Gln polymorphism (both having three genotypes). The raredisease assumption is invoked here (For lung cancer, the assumption is tenable). In addition, we assume geneenvironment independence [10] such that unmeasured environmental factors, no matter what they may be, cannot confound the genetic effects of the two studied genes.
Table 2 presents the lower bounds and the 95% lower limits for sufficientcause interactions between these two genes. The lower bound of the relative prevalence for the (ADPRT = Ala/Ala, XRCC1 = Gln/Gln)specific sufficientcause interaction is greater than zero (0.5221) but does not achieve statistical significance. The lower bound of the relative prevalence for the global ADPRTXRCC1 interaction is 0.2471 and is significantly greater than zero (as judged from its 95% lower confidence limit which is 0.0784).
Discussion
Public health researchers have long sought a way to quantify sufficientcause interactions using only the observational data at hand. Due to the nonidentifiability problem, a sufficientcause interaction can be tested but unfortunately not estimated. We are therefore provided with a very limited piece of information (of whether or not a sufficientcause interaction is statistically significant), which falls far short of quantification. By setting bounds on sufficientcause interactions (as demonstrated in the two examples in this paper), we can finally make some actual (if not exact) quantifications of such interactions.
Additional file 3 shows that the bounding formulae we presented in this paper produce ‘sharp’ bounds, i.e., bounds that are attainable. Previously, Sjölander et al. [20] derived an assumptionfree lower bound for the cumulative completion risk of the specific class = i, j sufficientcause interaction (which they called ‘weak’ sufficientcause interaction). Using the notations of this paper, their bound is \( \underset{\begin{array}{l}\left({i}^{\prime}\ne i\right)\in \left\{1,\dots, {L}_1\right\}\\ {}\left({j}^{\prime}\ne j\right)\in \left\{1,\dots, {L}_2\right\}\end{array}}{ \max}\left\{{\mathrm{Risk}}^{\mathrm{profile}= i, j}{\mathrm{Risk}}^{\mathrm{profile}={i}^{\prime }, j}{\mathrm{Risk}}^{\mathrm{profile}= i,{j}^{\prime }},0\right\}. \) Additional file 4 shows we can achieve a sharper lower bound.
In this paper, the lower bound formulae also provide an avenue for testing of specific sufficientcause interactions; if the bootstrapped 95% lower confidence limits for a particular lower bound is greater than zero, then the corresponding sufficientcause interaction is present. Alternatively, one can rely on the lower bound for the global sufficientcause interaction; if its bootstrapped 95% lower confidence limit is greater than zero, then some sufficientcause interaction (between certain levels of the two factors) must be present. When both exposures are binary, such global test reduces to testing PRISM = 1 against PRISM ≠ 1 in cohort studies [7], and RERI = 0 against RERI ≠ 0 in casecontrol studies.
The assumption of no confounding is a strong one. To alleviate the problem, the data can be stratified by the confounders (if these are identified and measured in the study) and separate bounds set on sufficientcause interactions using the proposed formulae in this paper for each of the resulting strata. Further work is warranted to develop stratified bounding methods for sufficientcause interactions when the total number of strata is large and the average stratum size is small (the sparsedata scenario) and when some of the stratifying variables also interact with the two exposures of concern (sufficientcause interactions involving more than two variables).
Conclusions
The study provides bounding formulae for sufficientcause interactions under the assumption of no redundancy. The bounds are sharp and sharper than previous bounds. Sufficientcause interactions cannot be estimated but can be quantified using the bounds presented in this study.
Abbreviations
 ADPRT :

Adenosine diphosphate ribosyltransferase
 BMI:

Body mass index
 LB:

Lower bound
 PRISM:

Peril ratio index of synergy based on multiplicativity
 RERI:

Relative excess risk due to interaction
 RP:

Relative prevalence
 UB:

Upper bound
 XRCC1 :

Xray repair crosscomplementing group 1
References
 1.
VanderWeele TJ, Robins JM. The identification of synergism in the sufficientcomponent cause framework. Epidemiology. 2007;18:329–39.
 2.
VanderWeele TJ, Robins JM. Empirical and counterfactual conditions for sufficient cause interactions. Biometrika. 2008;95:49–61.
 3.
VanderWeele TJ. Sufficient cause interactions and statistical interactions. Epidemiology. 2009;20:6–13.
 4.
VanderWeele TJ. Sufficient cause interactions for categorical and ordinal exposures with three levels. Biometrika. 2010;97(3):647–59.
 5.
VanderWeele TJ, Knol MJ. Remarks on antagonism. Am J Epidemiol. 2011;173:1140–7.
 6.
Lee WC. Testing synergisms in a noredundancy sufficientcause rate model. Epidemiology. 2013;24(1):174–5.
 7.
Lee WC. Assessing causal mechanistic interactions: a peril ratio index of synergy based on multiplicativity. PLoS ONE. 2013;8(6):e67424.
 8.
Lee WC. Estimation of a common effect parameter from followup data when there is no mechanistic interaction. PLoS ONE. 2014;9:e86374.
 9.
Lin JH, Lee WC. Testing for mechanistic interactions in longterm followup studies. PLoS ONE. 2015;10:e0121638.
 10.
Lee WC. Testing for sufficientcause geneenvironment interactions under independence and HardyWeinberg equilibrium assumptions. Am J Epidemiol. 2015;182(1):9–16.
 11.
Lee WC. Excess relative risk as an effect measure in casecontrol studies of rare diseases. PLoS ONE. 2015;10(4):e0121141.
 12.
Rothman KJ. Causes. Am J Epidemiol. 1976;104:587–92.
 13.
Rothman KJ, Greenland S, Lash TL, editors. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott; 2008.
 14.
Greenland S, Brumback B. An overview of relations among causal modelling methods. Int J Epidemiol. 2002;31(5):1030–7.
 15.
Liao SF, Lee WC. Weighing the causal pies in casecontrol studies. Ann Epidemiol. 2010;20(7):568–73.
 16.
Suzuki E, Yamamoto E, Tsuda T. On the link between sufficientcause model and potentialoutcome model. Epidemiology. 2011;22(1):131–2.
 17.
Suzuki E, Yamamoto E, Tsuda T. On the relations between excess fraction, attributable fraction, and etiologic fraction. Am J Epidemiol. 2012;175(6):567–75.
 18.
Lee WC. Completion potentials of sufficient component causes. Epidemiology. 2012;23(3):446–53.
 19.
Gatto NM, Campbell UB. Redundant causation from a sufficient cause perspective. Epidemiol Perspect Innov. 2010;7:5.
 20.
Sjölander A, Lee W, Källberg H, Pawitan Y. Bounds on sufficientcause interaction. Eur J Epidemiol. 2014;29:813–20.
 21.
Zou GY. On the estimation of additive interaction by use of the fourbytwo table and beyond. Am J Epidemiol. 2008;168:212–24.
 22.
Zhang X, Miao X, Liang G, Hao B, Wang Y, Tan W, Li Y, Guo Y, He F, Wei Q, Lin D. Polymorphisms in DNA base excision repair genes ADPRT and XRCC1 and risk of lung cancer. Cancer Res. 2005;65:722–6.
Acknowledgements
Not applicable.
Funding
This paper is partly supported by grants from Ministry of Science and Technology, Taiwan (MOST 1052314B002049MY3). No additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
The dataset supporting the conclusions of this article is included within the article and the Additional files.
Author’ contributions
This is a singleauthorship paper by WCL.
Competing interests
The author declares that he has no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Additional files
Additional file 1:
Derivations of the bounding formulas. (PDF 272 kb)
Additional file 2:
R code. (PDF 151 kb)
Additional file 3:
A proof that the bounds are sharp. (PDF 179 kb)
Additional file 4:
A proof that the bounds are sharper than previous bounds. (PDF 286 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Sufficient component cause model
 Epidemiologic methods
 Causal inference
 Interaction
 Identifiability