 Research Article
 Open Access
 Published:
Incorporating sampling weights into robust estimation of Cox proportional hazards regression model, with illustration in the MultiEthnic Study of Atherosclerosis
BMC Medical Research Methodology volume 20, Article number: 62 (2020)
Abstract
Background
Cox proportional hazards regression models are used to evaluate associations between exposures of interest and timetoevent outcomes in observational data. When exposures are measured on only a sample of participants, as they are in a casecohort design, the sampling weights must be incorporated into the regression model to obtain unbiased estimating equations.
Methods
Robust Cox methods have been developed to better estimate associations when there are influential outliers in the exposure of interest, but these robust methods do not incorporate sampling weights. In this paper, we extend these robust methods, which already incorporate influence weights, so that they also accommodate sampling weights.
Results
Simulations illustrate that in the presence of influential outliers, the association estimate from the weighted robust method is closer to the true value than the estimate from traditional weighted Cox regression. As expected, in the absence of outliers, the use of robust methods yields a small loss of efficiency. Using data from a casecohort study that is nested within the MultiEthnic Study of Atherosclerosis (MESA) longitudinal cohort study, we illustrate differences between traditional and robust weighted Cox association estimates for the relationships between immune cell traits and risk of stroke.
Conclusions
Robust weighted Cox regression methods are a new tool to analyze timetoevent data with sampling, e.g. casecohort data, when exposures of interest contain outliers.
Background
Cox proportional hazards regression models [1] are widely used for analysis of timetoevent data. Modifications of traditional Cox models have been developed to accommodate several important scenarios, including data sampled from a bigger population of interest and data containing influential outliers. In the context of data sampling, estimates can be weighted by the inverse sampling probability [2]. To reduce the impact of violation of model assumptions, several robust methods have been proposed [3–6]. One robust method focuses on robustness to variation in proportional hazards over time [5, 7], and incorporates sampling weights. However, a related robust method that focuses on robustness to influential outliers [3, 8, 9] does not incorporate sampling weights.
One context in which sampling weights play an important role is the casecohort design, a strategy used to maximize power for a primary outcome of interest and, at the same time, facilitate the analysis of multiple secondary outcomes. When measurement of an exposure of interest in all members of a cohort is not feasible, measuring it in a random ‘cohort’ of participants, plus all additional ‘cases’ who experience the primary outcome can be an efficient study design [10]. Several methods exist for analyzing casecohort data, but one relatively simple one involves use of inverse sampling weights [11]. Accounting for the sampling scheme is crucial in obtaining unbiased estimates that reflect populationlevel associations between exposure and outcome.
One example of an ongoing casecohort study in which outliers play an important role is a study of immune cell traits analyzed in a subcohort of the MultiEthnic Study of Atherosclerosis (MESA) longitudinal cohort study [12] that includes all cases of angina and myocardial infarction (MI). A number of lymphocyte and monocyte subsets were measured in this subcohort, using methods similar to those used by Tracy et al. [13] and Olson et al. [14], with the goal of evaluating associations not only with the primary outcomes of interest (angina and MI), but also with a range of secondary outcomes. As shown by Tracy et al. [13], the immune cell subsets often have skewed distributions. Although Cox models do not require covariates to be normally distributed, the chance that outliers are influential increases when covariate distributions are skewed. If all exposure values have a consistent association with the outcome of interest, then the outlying values do not bias the association of interest. However, if some of the exposure values are outliers due to a separate biological process, then within these outliers there can be an induced association with the outcome of interest that is not causal, and thus biases estimation of the true association of interest. We assume that such a structure exists in the population, rather than being induced by the sampling process. Currently, no method is available to both incorporate the sampling weights and provide robustness in the presence of extreme outliers.
In this paper we extend Bednarski’s partial likelihood method that provides robustness to influential outliers so that it can also incorporate sampling weights. In the “Methods” section we describe our modification to this robust Cox regression method. In the “Simulations” section we illustrate via simulations that this robust method has less bias than traditional weighted Cox regression when a subset of the participants have exposure values that are different from the rest, for reasons that are unrelated to the event of interest, i.e. when there are influential outliers with a different underlying association with outcome. In the “Application” section we evaluate the association between the immune cell traits and stroke in the MESA casecohort sample to illustrate practical differences between traditional and robust weighted Cox regressions. In the “Discussion” section we compare our weighted robust Cox regression method to alternative estimators.
Methods
Methods for fitting Cox models that incorporate weighting by inverse sampling probabilities are wellestablished [2]. However the use of such weighting in combination with robust modeling methods is not consistently implemented. For example, sampling weights are implemented in one robust method that focuses on robustness to variation in proportional hazards (PH) over time [5, 7], but not in another that is more robust to influential outliers [3, 8, 9]. These two methods incorporate a similar approach to robustness, i.e. constructing an estimator with minimum variance subject to a bound on the bias in local neighborhoods of the Cox model, but each considers a slightly different family of estimators. Both lead to partial likelihood estimators that achieve robustness via weighting, so that when sampling weights also exist, the implementations must incorporate both types of weights.
Here we will focus on extending Bednarski’s method, implemented in the R package coxrobust, to generate a new R package coxrobustw that incorporates sampling weights. In simulations and data analysis, we include comparisons to Sasieni and Schemper’s methods, implemented in the R package coxphw. We will refer to Bednarski’s methods as ‘outlierrobust’ and to Sasieni and Schemper’s methods as ‘PHrobust’ due to their focus on robustness to different types of influence on Cox model inferences.
We assume that we have observed data on n people, indexed by i, each of which has the following values: observed event time t_{i}=min(T_{i},C_{i}) where \(T_{i} \in \mathbb {R}^{+}\) is a potential event time and \(C_{i} \in \mathbb {R}^{+}\) is a potential censoring time so that t_{i}∈[0,T_{i}),z_{i}=observed covariate vector, with \(z_{i} \in \mathbb {R}^{j}\) where j is the number of covariates, and Δ_{i}=\(\mathbbm {1}_{\left [{T_{i}<C_{i}}\right ]}\), i.e. 0 for censored observations and 1 for observed events. Core quantities in implementation of partial likelihood estimation for Cox models include S^{(0)}(β,t),S^{(1)}(β,t),S^{(2)}(β,t), and \(\bar {z}(\beta,t)\) [15]. Table 1 specifies these quantities for traditional Cox models, Cox models with influence weights as in coxrobust, and Cox models with both coxrobust’s influence weights and sampling weights.
Cox partial likelihood estimation uses the score estimating equation \(\sum \limits _{i=1}^{n} \left [ z_{i}  \bar {z}(\beta,t_{i}) \right ] = 0\) [1, 16]. The outlierrobust estimator in the coxrobust package uses the modified estimating equation \(\sum \limits _{i=1}^{n} A(t_{i}, z_{i}) \left [ z_{i}  \bar {z}_{r}(\beta,t_{i}) \right ] = 0\), where A(t,z) is a smooth nonnegative map which is zero for large values of t and/or β^{′}z. Note that \(\bar {z}\) now has the subscript r to indicate that it is a robust version of the weighted mean of the covariate vector, incorporating the function A, as defined in Table 1. A takes as an input the covariate vector z which could contain one or more covariates with influential outliers. As noted by Minder and Bednarski [8], the "doubletrimming" accomplished by using A in both parts of the equation leads to the Fisherconsistency of the estimator, so it targets the Cox model parameters. This specific map A is desirable because \(\hat {\beta }\) is then Fréchet differentiable yielding a consistent and asymptotically normal estimator of β for infinitesimal extensions of the Cox model [3] [lemmas 4.2 & 4.3], the definition of outlierrobustness used by Bednarski.
The outlierrobust estimator described in the previous paragraph relies on mathematical details in Bednarski’s 1993 paper [3]. Highlights from that paper are included here for clarity. Bednarski’s outlierrobust estimator relies on writing the Cox score estimating equation in terms of the empirical distribution function F_{n}(t,c,z) of the sample (T_{1},C_{1},Z_{1}),...,(T_{n},C_{n},Z_{n}):
This equation is modified with a class \(\mathcal {A}\) of smooth functions from \(\mathbb {R}^{+} \times \mathbb {R}^{j} \rightarrow \mathbb {R}^{+}\) to give a modified class of regression parameter estimators, defined by the vector equation L(F_{n},β,A)=
for A in \(\mathcal {A}\). Bednarski uses functions A yielding Fréchet differentiable functionals L(F_{n},β,A) that give Fisherconsistent estimators of Cox model parameters (Lemma 3.1 in [3]).
Bednarski [3] goes on to specify the conditions that are necessary for \(\sqrt {n}\)consistency, Fréchet differentiability, and asymptotic normality (Theorems 4.14.3) in the case without censoring, which can be extended to incorporate censoring. For B a closed set in \(\mathbb {R}^{j}\) containing an open neighborhood of the true parameter β_{0}, F the true distribution of t and z from the Cox model distribution, \(\mathcal {A}^{*}\) a class of functions from \(\mathbb {R}^{+} \times \mathbb {R}^{j} \rightarrow \mathbb {R}^{+}\), A_{0} a nonnegative continuous function of time with bounded support S_{b}=[a,b], and \(\mathcal {A}\) a class of functions \(\{A_{0} A; A \in \mathcal {A}^{*}\}\), the following conditions need to hold:
(A1) For all \(A \in \mathcal {A}^{*}\) and \(w \in S_{b}, \int A(w,z) \mathbbm {1}_{\left [ {t \geq w} \right ] } dF(t,z) > \epsilon \) for some ε>0.
(A2) All the functions from \(\mathcal {A}^{*}\) vanish outside a bounded set, they are absolutely continuous and have jointly bounded variation. The set \(\mathcal {A}^{*}\) is compact for the supremum norm on \(C(\mathbb {R}^{j}\times \mathbb {R}^{+})\), i.e. the space of continuous functions from \(\mathbb {R}^{j} \times \mathbb {R}^{+}\) to \(\mathbb {R}^{+}\).
(A3) The following functions of variables \((w,y) \in \mathbb {R}^{+} \times \mathbb {R}^{j}\):
$$\begin{array}{*{20}l} {A(w,y) \frac{\int A(w,z) z \mathbbm{1}_{\left[ {t \geq w} \right] } \exp(\beta_{0}'z) dF(t,z)}{\int A(w,z) \mathbbm{1}_{\left[ {t \geq w} \right] } \exp(\beta_{0}'z) dF(t,z)}} \end{array} $$have jointly bounded variation for \(A \in \mathcal {A}\) and β∈B.
In the case when censoring is present, indicators \(\mathbbm {1}_{\left [ {t \geq w} \right ] }\) become \(\mathbbm {1}_{\left [ {(a \wedge t) \geq w} \right ] }\) and the inner integration is with respect to F(t,a,z). The function A(w,y) is multiplied in the outer integral by \(\mathbbm {1}_{\left [w \leq c\right ]}\) and the integral is with respect to F(w,c,y).
Specifically, in the coxrobust implementation, the map A is A_{β,M}(t,z)=M−min(M,t exp(β^{′}z)), where M is an order statistic in the sample t_{1} exp(β^{′}z_{1}),...,t_{n} exp(β^{′}z_{n}). The class of functions \(\mathcal A^{*}\) is the set {A_{β,M}:β∈B,M≤M^{∗}}, where M^{∗} is some fixed upper bound for M. In the default implementation, M is the 95^{th} percentile, but the percentile is a modifiable input to the R function. A and β are estimated iteratively, with three iterations leading to convergence in the scenarios they examined, and thus three iterations implemented in the coxrobust package [3]. To incorporate sampling weights, we added a step after the estimation of A and β that incorporates the sampling weights w [11] into the estimating equation \(\sum \limits _{i=1}^{n} w_{i} A(t_{i}, z_{i}) \left [ z_{i}  \bar {z}_{wr}(\beta,t_{i}) \right ] = 0\), with the details of the modified weighted mean covariate vector \(\bar {z}_{wr}\) given in Table 1 [15]. The reason for adding sampling weights after iteration is so that the influence weights reflect influence due to outliers, rather than due to large sampling weights.
In addition to deriving a consistent estimator of β, Bednarski [3] also derives an influence function that can be used to approximate the estimator’s variance, both at the model and at small departures from it. The existence of this influence function relies on sufficient smoothness of A, as discussed in section 5 of his 1993 paper. The resulting variance estimate for the specific choice of A implemented in coxrobust is:
where \(I_{r}(\hat {\beta })\) is the observed information matrix that incorporates outlier downweighting:
and \(r_{i}(\hat {\beta })\) is a residual for the ith subject:
This specific formulation of the variance estimate does not apply to all possible specifications of A, but does apply to the one chosen by Bednarski for the coxrobust package and implemented in this paper.
The variance estimate for the new coxrobustw algorithm incorporates the sampling weights w into the jackknife variance estimate in Eq. (1), i.e.
where \(I_{wr}(\hat {\beta })\) is the observed information matrix that incorporates both sampling weights and outlier downweighting:
and \(r_{i}^{wr}(\hat {\beta })\) is a residual for the ith subject:
The new R package coxrobustw implements this robust estimator that incorporates sampling weights, using the modified score equation and variance estimate detailed above. As in the original package, the algorithm uses three iterations and a default M of the 95^{th} percentile. The package is publicly available at https://github.com/csitlani/coxrobustw.
Results
Simulations
We conducted simulations to illustrate the utility of this new weighted robust Cox regression procedure. For both a complete population, and a casecohort sample from that population, we compared traditional Cox regression model estimates to robust Cox regression model estimates using our new package coxrobustw (‘outlierrobust’) and the existing coxphw that is robust to departures from proportional hazards (‘PHrobust’). For the casecohort sample, the weighted versions of all three methods were used, with weights being the inverse of the sampling probability. Two versions of the outlierrobust method were included, with the truncation parameter M set to either the 90^{th} or the 95^{th} percentile.
We generated time to event data by specifying the hazard ratio associated with a oneunit difference in exposure x (HRx), which was incorporated into the scale parameter of a Weibull distribution, i.e. scale = 1000× exp(− ln(HRx)×x) and shape = 1. The censoring time also had a Weibull distribution, with scale = 2 and shape = 1, and the observed time was set to be the minimum of the censoring time and the time to event. We generated exposure data from a normal distribution, with mean and variance described below, but truncated the values at 0 and 100 to mirror the type of exposure data available in the MESA immune cell trait project. Immune cell traits were analyzed as a percentage of their parent population, e.g. Th1 cells were analyzed as a percentage of CD4+ cells. Contamination was subsequently added by changing the mean of the exposure distribution for a fixed portion of the observations. For example, we simulated an exposure with mean 12 and standard deviation (SD) 8, from which the survival data were generated based on an assumed HRx of 1.25. Then for a portion of the observations, we replaced the exposure data with data from a normal distribution with higher mean, but still SD 8. The scenarios in Table 2 include no contamination, 5% contamination with mean 24, 5% contamination with mean 36, and 10% contamination with mean 24. We evaluated the methods both on the full sample of n=6000 people, and on the casecohort sample that was generated by keeping all people who experienced an event, plus a random sample of size 600 from those who did not experience an event. The average size of the casecohort sample was 1080, and the average percent of contaminated observations in the sample matched the specified level of contamination, regardless of whether or not a weighted percentage was calculated. All simulations were conducted in R version 3.2.3 [17], and were repeated one thousand times for each setup.
The results in Table 2 show that both the traditional Cox model and the robust versions provide essentially unbiased estimates when no contamination is present, i.e. the mean coefficient estimate is approximately log(1.25)=0.223. As expected, the traditional model is more efficient than any of the robust methods when modeling assumptions are satisfied. However, when contamination is present, all methods are biased toward the null. The traditional Cox model and the PHrobust method are substantially more biased than the outlierrobust method because they do not incorporate methods to detect and minimize the influence of outliers. The outlierrobust version, on the other hand, uses the map A specified in the “Methods” section and implemented in the coxrobust package, along with its truncation parameter M, to minimize the influence of the contaminated observations.
The true parameter value is not recovered by the outlierrobust method in part because the accuracy of outlier detection is not consistently high. Outlier detection metrics are not straightforward, due to the use of A both at the observation level and in contributions to the weighted covariate mean \(\bar {z}_{wr}\) at each event time. However, averaging over event times and simulations, for the three contamination scenarios considered in this paper, the percentage of contaminated observations correctly identified as such varies from 30 to 95%. Likewise, the percentage of correctly discarded observations varies from 24 to 83%. Outlier detection was similar in the population and in the sample. Comparing choice of truncation parameter, higher sensitivity for detecting contaminated observations corresponds to lower percentage of correctly discarded observations. On balance, using the 90^{th} percentile as the truncation parameter in the outlierrobust method yields similar, but slightly less biased, estimates of association, when compared to using the 95^{th} percentile. The (unweighted) population results are qualitatively similar to the weighted results for the casecohort sample. The weighted results use the new coxrobustw package.
Application
To illustrate the use of robust weighted Cox methods in data obtained from human subjects, we analyzed a secondary outcome in the MESA casecohort study of immune cell traits. Specifically, we examined occurrence of stroke as the outcome event and 17 immune cell traits postulated to be associated with cardiovascular disease as the exposures of interest. The immune cell traits were quantified as percent of total immune cells, or percent of a subset of immune cells. Table 3 describes the specific measures that were used. Based on a review of the literature, often in animal models, the lymphocyte subsets cluster into four groups: 1) high levels of proinflammatory cells; 2) high levels of profibrotic cells; 3) high levels of antiinflammatory and antifibrotic cells; and 4) high levels of proinflammatory cells that mark chronic use of adaptive immunity. All clusters are thought to increase cardiovascular risk except for the third, which is thought to decrease it [18–21]. The primary goal of this analysis was to illustrate the use of the new statistical method, rather than to draw key conclusions about the associations between the immune cell traits and stroke events.
The entire MESA cohort is a racially diverse cohort of 6814 adults between the ages of 45 and 84 years enrolled between 2000 and 2002 from six field centers across the United States. The MESA protocol has been approved by the Institutional Review Boards of all collaborating institutions, and all participants gave informed consent. Cryopreserved blood samples from the baseline visit were assayed at the University of Vermont to measure lymphocyte and monocyte subsets, using methods similar to those used by Tracy et al. [13] and Olson et al. [14]. From participants who had two vials of cryopreserved cells, a random cohort of 765 participants was sampled, along with all additional cases of MI and angina, for a total sample size of 1200 participants. Participants were followed for stroke outcomes through 2015. In order to ensure that estimates can be generalized to the MESA population, the sampling design necessitates use of sampling weights in statistical models, even for secondary outcomes such as stroke. The weighted mean age of participants included in this analysis was 62 years, and 54% were male. The sample was 39% White, 28% Black, 21% Hispanic, and 12% Chinese American. Stroke events occurred in 6% of the sample (N=70), which corresponds to a weighted rate of 4.6% in the population.
A number of the immune cell traits have outliers, implying the potential usefulness of robust methods that minimize their influence on association estimates. Summary data provided in Table 4 illustrate that the maximum value is often several SDs or more above the mean.
Analyses were performed using several methods: traditional Cox models, traditional Cox models after Winsorizing the exposure at 4 SDs from the mean [22], and both weighted robust methods (outlierrobust coxrobustw and PHrobust coxphw). The truncation parameter M was set at the 95^{th} percentile, with sensitivity analyses performed using the 90^{th} percentile. Confidence intervals based on sandwich variance estimates were used throughout to account for the inverseprobability of sampling weights. Separate models were fitted for each of the immune cell traits, without adjustment for other traits. A conservative approach of Bonferroni correction [23] was used to account for the 17 immune cell traits. Estimated hazard ratios are per SD of the percent of each immune cell type.
Due to the small number of stroke events, we included limited adjustment for covariates. Specifically, baseline age, gender, and race/ethnicity (White, Black, Hispanic, Chinese American) were included as adjustment variables in the regression models. Based on previous analyses of stroke in the MESA data [24], we ran sensitivity analyses that included additional covariates such as season of blood draw, systolic blood pressure, cardiovascular medications (antihypertensives and statins), smoking, education (via an indicator of having attained a bachelor’s degree or higher), lowdensity lipoprotein cholesterol, total cholesterol, diabetes, and body mass index.
After correction for multiple testing, there were no significant associations between stroke and the immune cell traits (Figure 1). Given that there were only 70 stroke cases, the power to find an association was small, so the lack of clinically important conclusions is not surprising, but our focus is on the comparative results across methods. Consistent with our simulations, traditional Cox methods and the PHrobust method gave estimates that were more similar to each other than to the outlierrobust method. Traditional methods using Winsorization were quite similar to those without Winsorization, and were thus different from the outlierrobust method. This difference was not surprising, given the different levels of truncation in each method (95^{th} percentile versus ±4 SDs) and the incorporation of both exposure value and timetoevent in the outlierrobust method versus just exposure value in Winsorization. Notably, when the outlierrobust method differed from traditional Cox methods, it most often gave point estimates further from the null, consistent with the idea that the outliers may be the result of an unrelated process that leads to attenuated association estimates obtained with nonrobust methods. The outlierrobust method generated wider confidence intervals than the traditional Cox method, which is to be expected given the added robustness to influential outliers. Results were similar when additional adjustment covariates were added to the models or the truncation parameter M was set to the 90^{th} percentile.
Discussion
This paper extends Cox proportional hazards regression methods that are robust to outliers in exposure data, so that they also incorporate sampling weights. When outliers are not causally related to the outcome of interest in the same way that other exposure values are, the outlierrobust method provides a less biased estimate of the true association than traditional methods. One application for this weighted outlierrobust method is in a casecohort sample where the exposure of interest contains outliers; we provided such an example in a MESA casecohort study where immune cell traits were measured. No significant associations with stroke events were found using traditional Cox models. Both in the scenario we simulated and in the illustrative dataset, larger associations were most often seen using the outlierrobust method, which supports the idea that traditional methods may underestimate associations. That said, the relative results will depend on the specific contamination model, and cannot be generalized to all possible scenarios based on the illustrations we provide.
Although normality is not required for covariates in Cox models, some departures from normality, such as skewness, lead to an increased chance of influential outliers. In cases of contamination such as those described in this paper, the skewness can reflect a source of bias in estimation of the exposureoutcome association. One method to minimize this bias is the outlierrobust one we have described. Alternative methods for analyzing exposure data that are not normally distributed include artificially truncating or Winsorizing [22], as well as transforming the data. We have shown in our application that Winsorizing the exposure data at 4 SDs from the mean generally does not substantially change the estimates. In simulation data not shown, Winsorizing at 2, 3, or 4 SDs from the mean still resulted in a more biased association estimate than using the weighted robust method, and the corresponding variance estimate did not account for the modification to the data. Logtransformation would make the exposure distribution less skewed, while maintaining reasonable interpretation; however, it would not incorporate the idea that the outliers are there for an external reason, and thus are not related to the outcome in the same way that other observations are. Consideration of alternatives emphasizes the idea that the source of the outliers is important, and the choice of method may depend on the reason outliers exist.
Specifically, different approaches might be warranted if the outliers are the result of a separate biological process, rather than being technical artifacts. For example, if a participant has a damaged blood sample or there is a technical malfunction of the flow cytometer used to obtain immune cell traits, then omitting the incorrect data is likely warranted. Truncation may be a better option for less welldefined technical artifacts that are recognized not to be plausible true values. On the other hand, in the case where the outliers are the real product of a biological process, for example if a participant has an undetected human immunodeficiency virus infection which has led to a low (or even zero) T helper type 1 (Th1) cell count, then outlierrobust methods, such as the one proposed here, are most appropriate.
The type of sampling would also affect the most appropriate use of weighting in a robust Cox regression approach. This paper focused on outcomebased sampling, given covariates that have outlying values that are in some sense wrong (atypical for the individual, assay errors, etc). These covariates were not used to choose the subsample; in fact, they were only measured on the subsample. We would expect that the true values of the covariates for these individuals would be related to the sampling weights, because the true values would be related to risk. However, conditional on risk, the outlying values would not be related to the sampling weights. Because sampling is not based on the outlying covariates, there is no harm in detecting outliers based on the sample, rather than reweighting to the full cohort. There is potentially harm in detecting outliers after reweighting, because the outlier threshold will be excessively sensitive to values in the reference subcohort. Under other sampling schemes it might well be preferable to modify the current procedure to include sampling weights in the influence iteration.
Conclusions
Adding sampling weights to robust Cox regression methods provides a new tool to analyze timetoevent data with sampling, e.g. casecohort data, when exposures of interest contain outliers. A readily available R package facilitates implementation of this new method.
Availability of data and materials
The data that support the findings of this study cannot be publicly posted due to identifiability concerns. They are however available upon reasonable request from the data coordinating center for the MESA Cohort study (https://www.uwchscc.org/) or, eventually, via BioLINCC (https://biolincc.nhlbi.nih.gov/home/).
Abbreviations
 BioLINCC:

Biologic Specimen and Data Repository Information Coordinating Center
 DSMB:

Data and Safety Monitoring Board
 FWA:

Federalwide Assurance
 IRB:

Institutional Review Board
 MESA:

MultiEthnic Study of Atherosclerosis
 MI:

myocardial infarction
 NCATS:

National Center for Advancing Translational Sciences
 NHLBI:

National Heart, Lung, and Blood Institute
 NK:

Natural Killer
 PH:

proportional hazards
 Treg:

T regulatory cells
 Th1:

T helper type 1
 Th17:

T helper type 17
 Th2:

T helper type 2
References
 1
Cox D. Regression Models and Life Tables. J R Stat Soc Series B Stat Methodol. 1972; 34(2):187–220.
 2
Therneau T, Grambsch P. Modeling Survival Data: Extending the Cox Model. New York: Springer; 2000.
 3
Bednarski T. Robust Estimation in Cox’s Regression Model. Scand Stat Theory Appl. 1993; 20(3):213–225.
 4
Sasieni P. Maximum weighted partial likelihood estimates for the Cox model. J Am Stat Assoc. 1993; 88(421):144–152.
 5
Schemper M, Wakounig S, Heinze G. The estimation of average hazard ratios by weighted Cox regression. Stat Med. 2009; 28(19):2473–89.
 6
Farcomeni A, Viviani S. Robust estimation for the Cox regression model based on trimming. Biom J. 2011; 53(6):956–73.
 7
Sasieni P. Some new estimators for Cox regression. Ann Stat. 1993; 21(4):1721–59.
 8
Minder C, Bednarski T. A Robust Method for Proportional Hazards Regression. Stat Med. 1996; 15(10):1033–1047.
 9
Bednarski T, Nowak M. Robustness and efficiency of Sasienitype estimators in the Cox model. J Stat Plan Infer. 2003; 115(1):261–72.
 10
Prentice R. A casecohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986; 73(1):1–11.
 11
Therneau T, Li H. Computing the Cox model for case cohort designs. Lifetime Data Anal. 1999; 5(2):99–112.
 12
Bild D, Bluemke D, Burke G, Detrano R, Roux AD, Folsom A, Greenland P, Jacob Jr D, Kronmal R, Liu K, Nelson J, O’Leary D, Saad M, Shea S, Szklo M, Tracy R. MultiEthnic Study of Atherosclerosis: objectives and design. Am J Epidemiol. 2002; 156(9):871–81.
 13
Tracy R, Doyle M, Olson N, Huber S, Jenny N, Sallam R, Psaty B, Kronmal R. THelper Type 1 Bias in Healthy People Is Associated With Cytomegalovirus Serology and Atherosclerosis: The MultiEthnic Study of Atherosclerosis. J Am Heart Assoc. 2013; 2(3):000117.
 14
Olson N, Doyle M, Jenny N, Huber S, Psaty B, Kronmal R, Tracy R. Decreased naive and increased memory CD4(+) T cells are associated with subclinical atherosclerosis: the multiethnic study of atherosclerosis. PLoS One. 2013; 8(8):71498.
 15
Binder D. Fitting Cox’s proportional hazards models from survey data. Biometrika. 1992; 79(1):139–47.
 16
Lin D, Wei L. The Robust Inference for the Cox Proportional Hazards Model. J Am Stat Assoc. 1989; 84(408):1074–8.
 17
R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2015.
 18
Hansson G, Hermansson A. The immune system in atherosclerosis. Nat Immunol. 2011; 12(3):204–12.
 19
Iwasaki A, Medzhitov R. Control of adaptive immunity by the innate immune system. Nat Immunol. 2015; 16(4):343–53.
 20
Jaipersad A, Lip G, Silverman S, Shantsila E. The role of monocytes in angiogenesis and atherosclerosis. J Am Coll Cardiol. 2014; 63(1):1–11.
 21
Lahoute C, Herbin O, Mallat Z, Tedgui A. Adaptive immunity in atherosclerosis: mechanisms and future targets. Nat Rev Cardiol. 2011; 8(6):348–58.
 22
Dixon W, Yuen K. Trimming and winsorization: A review. Statistiche Hefte. 1974; 15(23):157–70.
 23
Dunn O.Multiple Comparisons Among Means. J Am Stat Assoc. 1961; 56(293):52–64.
 24
Reina S, Llabre M, Allison M, Wilkins J, Mendez A, Arnan M, Schneiderman N, Sacco R, Carnethon M, Delaney J. HDL cholesterol and stroke risk: The MultiEthnic Study of Atherosclerosis. Atherosclerosis. 2015; 243(1):314–9.
Acknowledgements
The authors thank the other investigators, the staff, and the participants of the MESA study for their valuable contributions. A full list of participating MESA investigators and institutions can be found at http://www.mesanhlbi.org. The authors also thank the reviewers for comments which greatly improved this paper.
Funding
The collection and analysis of the immune cell data for the MESA subcohort was supported by NHLBI grants R01HL120854 and R01HL134505. Collection of other MESA data was funded by NHLBI contracts HHSN268201500003I, N01HC95159, N01HC95160, N01HC95161, N01HC95162, N01HC95163, N01HC95164, N01HC95165, N01HC95166, N01HC95167, N01HC95168 and N01HC95169, and by NCATS grants UL1TR000040, UL1TR001079, and UL1TR001420. NCO’s participation in the immune cell substudy was supported by NHLBI grant R00HL129045. None of the funding organizations were involved in the analysis, interpretation of data, or writing of the manuscript.
Author information
Affiliations
Contributions
CMS developed the methods, produced the modified R package, ran the simulations, conducted analyses of MESA data, and wrote the first draft of the manuscript. TL, BM, and KMR contributed to the development of the methods and revision of the manuscript. NCO, MFD, SAH, and RPT generated the MESA immune cell data and revised the manuscript. BMP and RPT designed and secured funding for the immune cell study. BMP and JACD contributed to the interpretation of the data and revision of the manuscript. JACD coordinated the procurement of the immune cell data and combined it with other MESA data. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The MESA protocol has been approved by the Institutional Review Boards of all collaborating institutions: Wake Forest University (IRB00008492 under FWA00001435), Columbia University (IRB00002973 under FWA00002636), Johns Hopkins University (IRB00001656 under FWA00005752), University of Minnesota (IRB00000438 under FWA00000312), Northwestern University (IRB00005003 under FWA00001549), University of California Los Angeles (IRB00000172 under FWA00004642), and University of Washington (IRB00005647 under FWA00006878). All participants gave written informed consent.
Consent for publication
Not applicable.
Competing interests
BMP serves on the Steering Committee for the Yale Open Data Access Project funded by Johnson & Johnson and on the DSMB of a clinical trial of a device funded by the manufacturer (Zoll LifeCor). The remaining authors have no competing interests to declare.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Sitlani, C.M., Lumley, T., McKnight, B. et al. Incorporating sampling weights into robust estimation of Cox proportional hazards regression model, with illustration in the MultiEthnic Study of Atherosclerosis. BMC Med Res Methodol 20, 62 (2020). https://doi.org/10.1186/s12874020009459
Received:
Accepted:
Published:
Keywords
 Cox regression
 Sampling weights
 Casecohort design
 Robust regression
 Immune cell traits