Skip to main content
  • Research Article
  • Open access
  • Published:

Risk-adjusted CUSUM control charts for shared frailty survival models with application to hip replacement outcomes: a study using the NJR dataset



Continuous monitoring of surgical outcomes after joint replacement is needed to detect which brands’ components have a higher than expected failure rate and are therefore no longer recommended to be used in surgical practice. We developed a monitoring method based on cumulative sum (CUSUM) chart specifically for this application.


Our method entails the use of the competing risks model with the Weibull and the Gompertz hazard functions adjusted for observed covariates to approximate the baseline time-to-revision and time-to-death distributions, respectively. The correlated shared frailty terms for competing risks, corresponding to the operating unit, are also included in the model. A bootstrap-based boundary adjustment is then required for risk-adjusted CUSUM charts to guarantee a given probability of the false alarm rates. We propose a method to evaluate the CUSUM scores and the adjusted boundary for a survival model with the shared frailty terms. We also introduce a unit performance quality score based on the posterior frailty distribution. This method is illustrated using the 2003-2012 hip replacement data from the UK National Joint Registry (NJR).


We found that the best model included the shared frailty for revision but not for death. This means that the competing risks of revision and death are independent in NJR data. Our method was superior to the standard NJR methodology. For one of the two monitored components, it produced alarms four years before the increased failure rate came to the attention of the UK regulatory authorities. The hazard ratios of revision across the units varied from 0.38 to 2.28.


An earlier detection of failure signal by our method in comparison to the standard method used by the NJR may be explained by proper risk-adjustment and the ability to accommodate time-dependent hazards. The continuous monitoring of hip replacement outcomes should include risk adjustment at both the individual and unit level.

Peer Review reports


Continuous monitoring of healthcare, and increasingly, social care across various providers is an important task of the healthcare regulator, such as the Care Quality Commission (QCC) in the UK. Additionally, a number of professional bodies and registers take on the same function for their clinical discipline. For instance, in regards to joint replacement, surgeon and operating unit-level outcomes are compiled by National Joint Registry for England and Wales (NJR). Methods of continuous monitoring of production quality have been initially developed and employed in quality control in the industry [1]. One of the most popular methods is the cumulative sum (CUSUM) chart, a graphical method based on sequential monitoring of cumulative performance over time. This method is based on sequential procedures and allows timely identification of a deterioration in performance. A number of CUSUM-based quality control systems are being implemented in various clinical disciplines, with the earliest application being in cardiothoracic surgery [2]. Currently they are used in surveillance of the healthcare quality by QCC [3], and by Dr Foster unit at Imperial College [4]. In this paper we expand the CUSUM methodology and adapt it for monitoring the performance of hip prostheses using the NJR data.

A hip replacement is a surgical operation where the damaged hip joint is replaced by a prosthesis. This operation is recommended to reduce pain and improve mobility of a patient after other therapies have failed. There are currently hundreds of types and brands of prosthesis components for use in the hip replacement surgery, and new brands of implant continue to be introduced through technological innovations. An important aspect of an implant brand’s performance is its expected time-to-revision. The current exception is that all prostheses used as treatment for end stage arthritis should have a failure rate of less than 5% at 10 years. Because of a relatively long time-to-failure of the hip prosthesis, long-term premarketing clinical trials are unfeasible. Therefore, continuous monitoring methods are needed for early detection of poor performance and timely withdrawal of the inferior components from clinical practice.

The first CUSUM-based methods for healthcare were based on binomial or Poisson distributions, monitoring failure rates within a fixed time interval, e.g. 30-days mortality [5], or one year hip replacement failure rates [6]. CUSUM methods for survival data are a natural extension of the methods for binary data. Censoring, truncation, adjustment for observed covariates and unobserved factors (frailties) can be easily included in survival models. By monitoring the individual-specific outcomes, the CUSUM score can be evaluated sequentially, changing at each individual failure. However, this method seems not to be appropriate in the case of hip replacement, where the expected time-to-revision is longer than 10 years. Hardoon et al. [7] proposed to compare the number of revisions within a certain time interval to that expected given a target revision rate and the total number of hip years in the interval. That is, patients contribute to a CUSUM score until revision or censoring (death or end of follow-up). They analysed the data from Swedish Arthroplasty register using Weibull distribution to model time to revision of hip replacement.

However, time-to-revision of hip prostheses varies depending on the patient characteristics, and on the type of fixation used [8]. This necessitates the use of case mix adjusted monitoring methods. The first risk adjusted CUSUM methods for time-to-failure (survival) data were introduced by Biswas and Kalbfleisch [9]. This method was picked up by the Scottish Arthroplasty Project, where CUSUMs are used to monitor complication rates of joint replacements by surgeon and unit from 2010. This is achieved by likelihood-based scoring method with risk adjustment for age, sex, osteoarthritis (OA) and rheumatoid arthritis (RA) [10]. A Bayesian-based CUSUM method for Weibull survival time is described in Assareh et al. [11].

Although the event of interest in our study is a revision, a priori death should not be treated as noninformative censoring. We develop a general competing risk version of the survival model for NJR data, where death is a competing risk. To safeguard the properties of the CUSUM charts, the control limits for risk-adjusted CUSUMs need to be revised to accommodate the estimation error.

We propose and implement a parametric version of the approach by Gandy and Kvaløy [12], of using bootstrap to provide the control limits conditional on the estimated in-control distribution, resulting in less conservative, i.e. more powerful, procedures.

We are using the Weibull distribution for fitting the baseline revision-specific hazard function, because this distribution has a good fit to the empirical distribution of time-to-revision [7]. The Gompertz distribution is used for fitting the baseline mortality-specific hazard function. The observed covariates and the correlated frailty components at the unit level are included in the model, assuming that all patients from a unit share the same unobservable gamma distributed risks of prosthesis revision and of death after hip replacement surgery.

We develop a bootstrap-based boundary adjustment for the risk-adjusted CUSUM chart to guarantee a given conditional probability of the false alarm rates. We also propose a score characterizing the quality of the hip replacement surgery in a unit. This score is based on the estimate of the posterior conditional frailties for units given the observed data. Mathematical development of the CUSUM scores for a Weibull/Gompertz survival model with shared frailty is provided in the Appendix.

The developed methods are applied to the 2003-2012 hip replacement data from the NJR. We illustrate the use of risk-adjusted CUSUM methodology to monitor the performance of two specific hip prostheses brands: the DePuy ASR Resurfacing Cup and the Biomet M2A-38 cup, which were flagged as outliers by NJR [13].


Motivating example

An artificial hip includes three major components: a stem that is inserted into the femur, a head (a ball) attached to the top of the femur and a cup, also called the acetabular component, that is implanted into the pelvis. A hip resurfacing procedure is typically used in younger patients where it can delay the need for a total hip replacement, it replaces the socket with an artificial cup and resurfaces the head of the femur instead of removing it. In 2010, NJR recorded 123 brands of acetabular cups, 13 brands of resurfacing cups and 146 brands of femoral stems used in primary and revision procedures [14].

Given a vast variety of available types and brands of prosthesis components for use in the hip replacement surgery, monitoring implant quality is the main objective of the NJR implant scrutiny group that was established in 2009. According to the current NJR methodology [15], an implant is considered to be a Level 1 outlier when its Patient Time Incident Rate (PTIR) is twice the PTIR of the implant group, where the group rate is weighted by the relevant implant types. From 2009 to 2014, three hip stems, three hip acetabular components and 17 hip stem/cup combinations were reported as Level 1 outliers [13].

To test our analytical approach on real world data, our analysis will focus on two of these outlier compoents: (i) the DePuy ASR Resurfacing Cup (first identified as a part of an outlier head/cup combination in April 2010 and last implanted in July 2010) and (ii) the Biomet M2A-38 acetabular cup (first identified by the NJR as an outlier in 2014, and last implanted in June 2011).

A standard CUSUM chart usually has a learning period where the parameters of the relevant null distribution are estimated, and the deviation from the null of clinical concern is decided upon to calibrate the control limits. The chart is then run with these control limits. An example of this approach is by Hardoon et al. [7], 2007 who monitored a constant target revision rate in a time interval. However, the failure rates differ by implant types, the age of the patients, and other case mix characteristics. They also may vary by the site at which operations take place (the operating unit). Therefore we consider a risk-adjusted CUSUM where the target rates are estimated for the popular implants (top 80%), and experienced units (more than 1 surgery per week, on average), which requires an introduction of shared frailty terms, describing similarities within and heterogeneity between units, to our survival models, and an appropriate adjustment of the control limits.

Description of the NJR data

The NJR data were made available after a formal request to the NJR Research Committee. The dataset is related to the data cut used in the 10th NJR Annual Report [16]. The data were anonymised in respect to patient, to surgeon and to operating unit identifying details. Approval was obtained from Computing Subcommittee of the University of East Anglia Ethics Committee, reference number CMP/1718/F/10A. The NJR dataset provides the following four groups of variables used in the time-to-failure analysis of the hip replacements to risk-adjust the CUSUM boundaries.

  • Information on procedures, such as date of operation or revision, and side;

  • Institution and staff involved, such as unit and consultant IDs (anonymised), and surgeon grade;

  • Hip prosthesis characteristics, such as fixation type (cemented, uncemented, hybrid, resurfacing), its components (head, cup, stem, and liner brands), head size, bearing surfaces (metal, polyethylene, ceramic);

  • Patient characteristics, such as age, sex, ASA physical status classification [17] at 5 levels from healthy (1) to near death (5), Body Mass Index (BMI), index of multiple deprivation (IMD)[18] (a higher IMD means higher proportion of people in the area classed as deprived), and death date.

Since about a half of records had missing BMI values, this factor was excluded from further consideration. ASA scores were grouped into two categories in further analysis: ASA 1-2 - normal healthy patients and patients with mild systemic disease, ASA 3-5 - patients with serious, non-incapacitating systemic disease, patients with life-threatening incapacitating systemic disease and patients that are near death.

Data selection in SQL (elimination of duplicates, second and subsequent revisions) resulted in 504,024 records with the fields listed above. By further cleaning the following records have been additionally excluded:

  • Patients with bilateral operations;

  • Records with missing or misreported side;

  • Records with time to revision equal to 0;

  • Records with date of operation after 31 December 2012;

  • Patients younger than 50 years at operation day;

  • Records with missing values of IMD.

This process resulted in 281,265 records. Finally, all records for the patients operated in units with less than 52 operations per year (i.e. less than once per week, on average), and all records with implanted cup/head brands in the bottom 20% in popularity that year, as well as cup/head brands “DePuy” and “Biomet” were excluded in the in-control dataset, resulting in 113,772 records in total. To test the efficiency of our CUSUM procedure, we have also selected two test datasets including only the records with cup brands “DePuy ASR Resurfacing Cup” (1734 records) and “Biomet M2A 38” (764 records), respectively. The cases for prostheses revised within three months of implantation were censored at the time of revision to exclude failures that might be directly attributive to surgical technique or postoperative complications. Description of the three datasets is given in Table 1. We provide analysis of these data performed in R [19] in the “Results” section.

Table 1 Description of the datasets

Basics of CUSUM method for time-to-event data

The CUSUM method is a sequential analysis technique based on the calculation of the series Wi, i=0,1,2,..., defined by a simple recurrent equation

$$\begin{array}{@{}rcl@{}} \begin{aligned} W_{0} & = 0, \\ W_{i+1} & = \max \{0,\; W_{i}+X_{i}\}, \end{aligned} \end{array} $$

where index i stands for a single observation or for a group of observations and Xi is the weight or score assigned to index i. The CUSUM alerts when Wi crosses a control limit, usually chosen to guarantee a long average run length (ARL) when the process is in control, or to provide a low false alarm probability [20]. In applications to survival data, and assuming independent competing risks of revision and death, the score Xi for an individual i with time-to-revision ti and vector of covariates ui can be defined as the logarithm of the revision-specific factor of the likelihood ratio

$$\begin{array}{@{}rcl@{}} \begin{aligned} X_{i} = \log \left (\frac {f^{1}_{i}(t_{i}|\mathbf{u}_{i})^{\delta_{i}}S^{1}_{i}(t_{i}|\mathbf{u}_{i})^{1-\delta_{i}}}{f^{0}_{i}(t_{i}|\mathbf{u}_{i})^{\delta_{i}}S^{0}_{i}(t_{i}|\mathbf{u}_{i})^{1-\delta_{i}}}\right), \end{aligned} \end{array} $$

where δi is a censoring indicator, \(S^{j}_{i}(.)\) and \(f^{j}_{i}(.)\) are survival and density functions, respectively, and index j=0,1, stands for null hypothesis H0 (process is under control) and alternative hypothesis H1 (failure rate is higher than expected by a certain margin). Under the assumption of independent competing risks, the revision-specific factor of the likelihood coincides with the likelihood function that would be obtained be treating failures from any other causes as censored observations.

For a set I of independent individuals, the score XI can be calculated as a sum of individual scores Xi, iI:

$$\begin{array}{@{}rcl@{}} \begin{aligned} X_{I} = \sum_{i\in I} X_{i}. \end{aligned} \end{array} $$

Assuming proportional hazards model with the Weibull baseline distribution under hypotheses Hj, j=0, 1, the hazard functions hj(t|u)=μj(t)χ(u) are proportional to the Weibull baseline hazards μj(t) and a regressor function χ(u). The regressor function is usually specified as χ(u)= exp(βu) (the Cox’s regression term) for a transposed column vector of unknown parameters β. The baseline hazard function under H0 corresponds to the hazard function μ0(t)=(k/λ)(t/λ)k−1 for the Weibull distribution with the shape parameter k and the scale parameter λ, and the baseline hazard function μ1(t) under the alternative hypothesis H1 is proportional to μ0, μ1(t)=HRμ0(t). The hazard ratio HR represents the departure from the target survival that we want to detect.

For consecutive time intervals T, consider a subset I=IT of NI individuals observed (prostheses in use) over the time interval T. In this case, the scores XI can be calculated as [7]

$$\begin{array}{@{}rcl@{}} \begin{aligned} X_{I}=O_{I}\log(\text{HR})-(\text{HR}-1)E_{I}, \end{aligned} \end{array} $$

where OI is the observed number of failures (revisions) occurring during the interval T and EI is the number of failures that would be expected in the same interval under hypothesis H0.

Denote by (t1i,t2i) an intersection of the interval T with the lifetime of the prosthesis i implanted at t0i. Then t1i is the maximum of the lower bound of interval T and t0i, and t2i is the minimum of the upper bound of interval T, the time of revision of prosthesis i and the time of censoring of the patient with prosthesis i. From this, the value of (t2it1i) is equal to the length of time when prosthesis i is in use in the time interval T. The values of EI can be computed as

$$\begin{array}{@{}rcl@{}} \begin{aligned} E_{I}=\sum_{i=1}^{N_{I}} \lambda^{-k}\left ((t_{2i}-t_{0i})^{k}-(t_{1i}-t_{0i})^{k}\right). \end{aligned} \end{array} $$

CUSUM scores for shared frailty competing risks model

Under the proportional hazards model with frailty, the hazard functions h(t|u,Z) for an observed vector of covariates u and unobserved non-negative random frailty component Z, is proportional to the baseline hazard μ(t), frailty term Z, and a regressor function χ(u)= exp(βu). The conditional survival function is given by

$$ {\begin{aligned} S(t|\mathbf u, Z)\,=\,\exp(-\int_{0}^{t}h (x|\mathbf{u},Z)dx)=\exp(-Z\chi (\mathbf{u})\int_{0}^{t} \mu(x)dx). \end{aligned}} $$

The marginal survival function is defined by

$$\begin{array}{@{}rcl@{}} \begin{aligned} S(t|\mathbf u)=\mathbb {E}S(t|\mathbf u, Z). \end{aligned} \end{array} $$

We will use the index f, f=r,d, to denote the types of failure (revision of implant or death of a patient without implant failure, respectively), considered as competing risks. For mathematical convenience, it is frequently assumed that frailty Zf is gamma-distributed with mean 1 and unknown variance \(\sigma _{f}^{2}\). The assumption of gamma distributed frailty is not too restrictive, as a number of authors demonstrated that gamma-based shared frailty models are robust for a wide class of frailty distributions [21, 22]. The frailty variance \(\sigma _{f}^{2}\) characterizes heterogeneity in the population.

We also assume that the baseline hazard functions are \(\phantom {\dot {i}\!}\mu _{0,r}(t)=(k_{r}/\lambda _{r})(t/\lambda _{r})^{k_{r}-1}\) and \(\mu _{0,d}(t)=\lambda _{d}\exp (k_{d}t)\phantom {\dot {i}\!}\) with the shape parameter kf and the scale parameter λf, f=r,d, for the Weibull and Gompertz distributions, respectively. In this case, the type-of-failure specific marginal survival function is given by

$$\begin{array}{@{}rcl@{}} \begin{aligned} S_{f}(t|\mathbf u_{f})=(1+\sigma_{f}^{2}e^{\beta^{*}\mathbf u_{f}}H_{f}(t))^{-1/\sigma_{f}^{2}} \end{aligned} \end{array} $$

with the type-of-failure specific baseline cumulative hazards \(H_{r}(t)=(t/\lambda _{r})^{k_{r}}\phantom {\dot {i}\!}\) and \(\phantom {\dot {i}\!}H_{d}(t)=(\lambda _{d}/k_{d})(\exp (k_{d}t)-1)\).

Correlated frailty terms for revision and death can be constructed as

$$\begin{array}{@{}rcl@{}} \begin{aligned} Z_{r}= &Y_{0}+Y_{r}, \\ Z_{d}= &\frac {m_{r}}{m_{d}}Y_{0}+Y_{d} \end{aligned} \end{array} $$

for independent gamma distributed random variables Y0G(l0,mr) and YfG(lf,mf) with \(l_{f}=1/\sigma _{f}^{2}-l_{0}\), \(m_{f}=1/\sigma _{f}^{2}\), f=r,d; 0≤ρ≤ min(σr/σd,σd/σr). The result of this construction is that the frailties are gamma-distributed with \(\mathbb {E}Z_{f}=1\), \(\text {Var}Z_{f}=\sigma _{f}^{2}\), and Corr(Zr,Zd)=ρ. Given the frailties (Zr,Zd) and the covariates (ur, ud), type-of-failure specific instantaneous risks are assumed to be conditionally independent at any time t.

The bivariate marginal survival function for the type-of-failure specific latent time moments (tr, td) is given by the formula

$$\begin{array}{@{}rcl@{}} {\begin{aligned} S(t_{r},t_{d}|\mathbf u_{r},\mathbf u_{d})= &\mathbb {E}S(t_{r},t_{d}|\mathbf u_{r},\mathbf u_{d},Z_{r},Z_{d}) \\ =&\mathbb {E}\exp (-Z_{r}\chi (\mathbf u_{r})H_{r}(t_{r})-Z_{d}\chi(\mathbf u_{d})H_{d} (t_{d})) \\ = & \frac {\left(1+\sigma_{r}^{2}\chi (\mathbf u_{r})H_{r}(t_{r})\right)^{-l_{r}}\left(1+\sigma_{d}^{2}\chi (\mathbf u_{d})H_{d} t_{d}\right)^{-l_{d}}}{\left (1+\sigma_{r}^{2}\chi (\mathbf u_{r})H_{r}(t_{r})+\sigma_{d}^{2}\chi (\mathbf u_{d})H_{d}(t_{d})\right)^{l_{0}}}& \end{aligned}} \end{array} $$

[23]. If left truncation is present at ages (t0r, t0d), we calculate the conditional survival function by dividing the bivariate survival function by S(t0r,t0d|ur,ud).

In the context of hip replacement, the shared frailty terms arise from the assumption that the nj patients who have undergone surgery in the same unit j, j=1,,J, have the same, possibly correlated, unobserved risks of revision and death. This means that the full likelihood function for our model has a form of \({\mathcal L}=\prod _{j=1}^{J}{\mathcal L}_{j}(\bar t_{jr},\bar t_{jd}|\bar {\mathbf {u}}_{jr},\bar {\mathbf {u}}_{jd})\) for

$$\begin{array}{@{}rcl@{}} {\begin{aligned} &{\mathcal L}_{j}(\bar t_{jr},\bar t_{jd}| \bar{\mathbf{u}}_{jr}, \bar{\mathbf{u}}_{jd})=\prod_{i=1}^{n_{j}}\left (-\frac{\partial }{\partial t_{jir}}\right)^{\delta_{jir}}\left (-\frac{\partial }{\partial t_{jid}}\right)^{\delta_{jid}}S_{j}(\bar{t}_{jr},\bar t_{jd}| \bar{\mathbf{u}}_{jr},\bar{\mathbf{u}}_{jd}), \end{aligned}} \end{array} $$

where δf=0,1 is the censoring indicator with δf=0 indicating right censoring, and \(\bar t_{jf}\) and \( \bar {\mathbf {u}}_{jf}\) are the vectors of cause-specific latent times and of covariates for the patients from unit j, respectively, f=r, d, and

$$\begin{array}{@{}rcl@{}} {\begin{aligned} &S_{j}(\bar t_{jr},\bar t_{jd}| \bar{\mathbf{u}}_{jr}, \bar{\mathbf{u}}_{jd}) \\ &=\frac {\left (1+\sigma_{r}^{2}\sum_{i=1}^{n_{j}}\chi (\mathbf u_{jir})H_{r}(t_{jir})\right)^{-l_{r}}\left (1+\sigma_{d}^{2}\sum_{i=1}^{n_{j}}\chi (\mathbf u_{jid})H_{d}(t_{jid})\right)^{-l_{d}}}{\left (1+\sigma_{r}^{2}\sum_{i=1}^{n_{j}}\chi (\mathbf u_{jir})H_{r}(t_{jir})+\sigma_{d}^{2}\sum_{i=1}^{n_{j}}\chi (\mathbf u_{jid})H_{d}(t_{jid})\right)^{l_{0}}}, \end{aligned}} \end{array} $$

where a subscript i, i=1,...,nj, corresponds to a current patient i from unit j. This likelihood can be used for parameter estimation.

Proposed CUSUM scores for a competing risks model with shared frailty are based on the likelihood ratio \({\mathcal L}\). For a time interval T, let Ij(T) be a set of individuals from unit j whose implants are in use during the period T, and \(I=I(T)=\bigcup I_{j}(T)\). The scores XI(T) for the time interval T are defined as

$$\begin{array}{@{}rcl@{}} {\begin{aligned} X_{I}(T) = \sum_{j =1}^{J}\log \left (\frac {\mathbb {E}\prod_{i \in I_{j}(T)}{\mathcal L}^{1}(t_{jir},t_{jid}|\mathbf{u}_{jir},\mathbf{u}_{jid},Z_{jr},Z_{jd})}{\mathbb {E}\prod_{i \in I_{j}(T)}{\mathcal L}^{0}(t_{jir},t_{jid}|\mathbf{u}_{jir},\mathbf{u}_{jid},Z_{jr},Z_{jd})}\right), \end{aligned}} \end{array} $$

where Zjr, Zjd are the shared frailty terms for unit j, the superscript h, h=0,1, stands for hypothesis, and

$$\begin{array}{@{}rcl@{}} \begin{aligned} &{\mathcal L}^{h}(t_{jir},t_{jid}|\mathbf{u}_{jir},\mathbf{u}_{jid},Z_{jr},Z_{jd}) \\ &=\left (-\frac{\partial }{\partial t_{jir}}\right)^{\delta_{jir}}\left (-\frac{\partial }{\partial t_{jid}}\right)^{\delta_{jid}}S^{h}(t_{jir},t_{jid}|\mathbf u_{jir},\mathbf u_{jid},Z_{jr},Z_{jd}). \end{aligned} \end{array} $$

In general case, expression for XI(T) does not have a simple closed form. In the special case of ρ=0, the competing risks of revision and death are independent, and the score XI(T) is the sum of the respective component scores for revision and death (see Appendix). If the interest lies in the risk of revision only, death can be treated as a non-informative censoring, and we concentrate on the CUSUM analysis of revision scores to the end of this Section.

For the baseline Weibull hazard function, under the proportionate alternatives μ1(t)=HRμ0(t), we can rewrite the revision component of the score (3) as

$$\begin{array}{@{}rcl@{}} {\begin{aligned} &X_{I}^{r}(T) = O_{I}\log (\text{HR})-\sum_{j =1}^{J}({\sigma_{r}^{-2}}+O_{j})\\ &\times\log \left(\frac {1+\sigma_{r}^{2}\text{HR}\sum_{i \in I_{j}(T)}e^{\beta^{*}\mathbf u_{i}}\lambda^{-k}((t_{2i}-t_{0i})^{k}-(t_{1i}-t_{0i})^{k})}{1+\sigma_{r}^{2}\sum_{i \in I_{j}(T)}e^{\beta^{*}\mathbf u_{i}}\lambda^{-k}((t_{2i}-t_{0i})^{k}-(t_{1i}-t_{0i})^{k})}\right), \end{aligned}} \end{array} $$

where Oj is a number of revisions in the unit j during period T so that \(O_{I}=\sum _{j}O_{j}\) (see Additional file 1 for proof).

Often, the proportional hazards assumption is too strong; different groups of patients and prostheses do not necessarily have proportional hazard functions for the hip revision times and/or for death. We weaken this assumption by allowing different shape parameters kf(u) in the baseline Weibull and Gompertz hazard functions which depend on covariates through additional Cox-regression multipliers, \(k_{f}({\mathbf u})=\exp (\beta ^{*}_{k} \mathbf u)k_{f}\). Then the CUSUM scores for revision are calculated as

$$\begin{array}{@{}rcl@{}} {\begin{aligned} &X_{I}^{r}(T) = O_{I} \log(\text{HR})-\sum_{j=1}^{J}(\sigma_{r}^{-2}+O_{j}) \\ &\times\log \left (\frac {1\,+\,\sigma_{r}^{2}\text{HR}\sum_{i \in I_{j}(T)}e^{\beta^{*}\mathbf u_{ji}}\lambda^{\,-\,k_{r}({\mathbf u_{ji}})}((t_{j2i}\,-\,t_{j0i})^{k_{r}({\mathbf u_{ji}})}\,-\,(t_{j1i}\,-\,t_{j0i})^{k_{r}({\mathbf u_{ji}})})}{1\,+\,\sigma_{r}^{2}\sum_{i \in I_{j}(T)}e^{\beta^{*}\mathbf u_{ji}}\lambda^{\,-\,k_{r}({\mathbf u_{ji}})}((t_{j2i}\,-\,t_{j0i})^{k_{r}({\mathbf u_{ji}})}\,-\,(t_{j1i}-t_{j0i})^{k_{r}({\mathbf u_{ji}})})}\right). \\ \end{aligned}} \end{array} $$

CUSUM chart control limits for the shared frailty model for revision

The unknown parameters of the time-to-revision model under the null hypothesis H0 are estimated from the in-control (learning) dataset. These are the Cox-regression parameters β and βk, parameters of the Weibull baseline distributions k and λ, and the variance of the frailty term σ2. The vector of unknown parameters ξ=(lnk, lnλ, lnσ2,β,βk) is estimated using the maximum likelihood method to obtain the estimates \(\hat \xi \). The time-to-failure distribution with these estimated parameters is then used to compute the CUSUM scores for the two test datasets and to estimate the control limits for the CUSUM chart: See Additional file 1 for details of calculation of the CUSUM score. Let P=P(ξ) be the true distribution function for revision times, and τ=τc(P;ξ) is the time at which the chart alerts when it exceeds a threshold c. The false alarm probability in T time units is \(hit(P;\xi) = \mathbb P(\tau _{c}(P;\xi) \leq T)\) for some finite T>0. The threshold chit(P;ξ)= inf{c>0:hit(P;ξ)≤α} for some 0<α<1 is needed to restrict the false alarm probability to α. However, only \(\hat P\) and \(\hat \xi =\xi (\hat P)\) are known.

A parametric version of the bootstrap algorithm proposed by Gandy and Kvaløy [12] is used to estimate the control limits to guarantee, that the false alarm rate of a CUSUM chart with the in-control distribution P, conditional on \(\hat \xi \), is below nominal level α with high probability 1−γ.

Define the first time \(\tau _{c}(P|\hat \xi)\) at which the CUSUM chart conditional on \(\hat \xi \) exceeds the given value c. We are interested in the boundary \(c_{hit}(P|\hat \xi)\) defined by equation \(c_{hit}(P|\hat \xi)=\inf \{c>0:\; \mathbb P(\tau _{c} (P|\hat \xi)\leq T)\leq \alpha \}\) for some 0<α<1. Since P is unknown, \(c_{hit}(P|\hat \xi)\) is unknown too and the estimate \(c_{hit}(\hat P|\hat \xi)\) is usually used instead. However, such estimate does not guarantee the false alarm rate of the chart. Following [12], we estimate the 1−γ quantile for the threshold \(c_{hit}(P|\hat \xi)\) for some 0<γ<1 using the following algorithm.


Let N be the number of records (patients) in the control dataset, NSim be the number of simulations needed to estimate \(c_{hit}(\hat P|\hat \xi)\), NBoot be the number of bootstrap replicates, and T=[Tmin,Tmax] be the observation period.

  1. 1.

    Calculate the maximum likelihood estimate (MLE) \(\hat \xi \) of the vector of unknown parameters ξ as well as the estimate \(\widehat {\text {Cov}}\) of the covariance matrix cov (inverse Hessian) for \(\hat \xi \) using the control dataset and the survival model with Weibull hazard described above;

  2. 2.

    Generate from the multivariate normal distribution with mean \(\hat \xi \) and the covariance matrix \(\widehat {\text {Cov}}\), a random vector ξcur;

  3. 3.

    Keeping the covariates in all three test datasets fixed, generate for all patients new times-to-revision trev on the basis of the survival model with Weibull hazard described above and vector ξcur. Update the censoring using the rule δ=1 if trev<= min{tdeath,Tmax} and δ=0, otherwise. Replace trev for δ=0 by trev= min{tdeath,Tmax}. Repeat NSim times and calculate for the test dataset j, j=1,2, the values of \(c_{\text {hit}}^{j}(\hat P_{cur}|\hat \xi _{cur})\) and \(c_{\text {hit}}^{j}(\hat P|\hat \xi _{cur})\);

  4. 4.

    To take into account multiple testing, we set \(c_{\text {hit}}(\hat P_{cur}|\hat \xi _{cur})=\underset {j=1,2}\max \{c_{\text {hit}}^{j}(\hat P_{cur}|\hat \xi _{cur})\}\) and \(c_{\text {hit}}(\hat P|\hat \xi _{cur})=\underset {j=1,2}\max \{c_{\text {hit}}^{j}(\hat P|\hat \xi _{cur})\}\). Calculate \(p_{cur}=c_{\text {hit}}(\hat P_{cur}|\hat \xi _{cur})-c_{\text {hit}}(\hat P|\hat \xi _{cur})\);

  5. 5.

    Repeat steps 2-4 NBoot times and calculate the 1−γ empirical quantile pγ of pcur.

The estimate of the adjusted threshold is equal to \(c_{hit}(\hat P|\hat \xi)-p_{\gamma }\). This threshold guarantees that in approximately 100(1−γ)% of the applications the probability of false alarm will not exceed the value of α.In the “Results” section, we use the values of NSim=100, NBoot=100, α=0.1, and γ=0.1, Tmin=01.01.2005, and Tmax=31.12.2012 for the analysis of the NJR data.

Estimating operating unit performance

Estimating performance across surgical units is also of potential importance in the quality control setting. The posterior frailty distribution obtained from the fitted shared frailty survival model described in the “Methods” section, can be used for this purpose. Given the prior gamma distribution with (shape, scale) parameters (a,b)=(σ−2,σ2), mean ab=1 and variance ab2, and the observed data Dj, the posterior frailty distribution for unit j, is the gamma distribution with (shape, scale) parameters (aj,bj) equal to

$$\begin{array}{@{}rcl@{}} \begin{aligned} a_{j}&=a+O_{j}, \\ b_{j}&=\frac {b}{1+b\sum_{i \in I_{j}}H(t_{i},\mathbf{u}_{i})}, \end{aligned} \end{array} $$

where Oj is the number of observed revisions in unit j, Ij is set of all patients from unit j, and H(ti,ui) is the cumulative hazard for individual i from unit j with time to revision (or censoring) ti and the vector of covariates ui [24].

The effects of the units (shared frailties) are given by the conditional expectation \(\mathbb {E}(Z_{j}|D_{j})=a_{j}b_{j}\), and parameters aj and bj can be estimated by substituting the MLE estimates \(\hat \xi \) of the unknown parameters ξ [21]. Given the proportional hazards formulation, the shared frailty term can be interpreted as an excess hazard of a unit relative to the baseline hazard. Because of this interpretation, we refer to these estimated frailties as unit-level hazard ratios and denote them by HRj.

Additionally, we propose a new score characterizing the quality of the hip replacement surgery in a unit as

$$ Q_{j}= P\{Z_{j}|D_{j}\} <1, $$

where Dj is the data from the control dataset relating to unit j. Large value of Q indicates a decreased hazard of revision in a unit, whereas small value of Q indicates poor performance of a unit. Since the values of Q and HR depend on the vector of unknown parameters ξ and only the MLE estimate \(\hat \xi \) of this vector is available, we generate a set of Naverage estimates \(\hat \xi _{l}\) from \(\mathrm {N}(\hat \xi,\widehat {cov})\) distribution, and take the average of the obtained estimates of \(Q(\hat \xi _{l})\) and of \(\text {HR}_{j}(\hat \xi _{l})\) over this set of parameters.


For the control dataset described in the “Methods” section, we estimated unknown parameters of the competing risks model with and without shared frailty terms maximizing the likelihood function (2). These include the parameters for the baseline hazard distributions and the coefficients of the Cox’s regressions for time-to-revision and time-to-death, allowing for the possible covariate-dependent shape parameters, as described at the end of the “Methods” section. Significant predictors had been chosen using the backward elimination in stepwise regression. The estimated coefficients and their confidence intervals for the models with and without frailty components are given in Table 2. The notation \(\phantom {\dot {i}\!}^{\prime \prime }{k}_{f}^{\prime \prime }\) before the name of a variable means that its coefficient relates to the shape parameter kf. The baseline values for the categorical and binary regressors were: males for sex, cemented for fixation, ceramic/ceramic for cup/head bearing surfaces, and operation date before 01.01.2007.

Table 2 Description, parameter estimates and confidence intervals for the competing risks models with/without frailty

Comparing likelihood, AIC and BIC values in Table 2 we see that the correlation between cause-specific frailties Zr and Zd does not differ significantly from zero, and the best (in terms of AIC and BIC) model includes a frailty term only for revision. That is, the risks of revision and death can be modelled as independent, and formula (5) can be used to calculate CUSUM scores for revision.

Females had a decreased hazard of revision of hip prostheses compared to males on the time-to-revision interval [ 0,λ]. Hazard of revision decreased with age and head size. Uncemented hip prostheses had an increased hazard of revision compared to cemented or hybrid fixation. The cup/head combinations with resurfacing/metal and resurfacing/resurfacing bearing surfaces also had increased hazards compared to other types of bearings, whereas the polyethylene/ceramic bearing surfaces provided a decreased risk of revision compared to the ceramic/ceramic ones. These results agree with the findings by [8]. Those patients who underwent the surgery after 01.01.2007 had an increased hazard of revision. This may reflect the fact that early revisions were missed by the NJR due to poor data quality in the early years. We also have found a significant random effect of units, with the estimated frailty variance \(\sigma ^{2}_{r}\) equal to 0.18 with confidence interval of (0.12−0.28). i.e. the hazard of revision differed by units.

Patients with serious disease (ASA P3-P5) and patients from areas with high deprivation (IMD 4-5) had increased hazards of death. The cup/head combination with polyethylene/metal bearing surfaces had a significantly increased hazard of death compared to ceramic/ceramic bearing. The shape parameters for baseline hazards of death also differed by these factors and by the date of surgery before/after 01.01.2007.

Based on the fitted revision submodel with frailty under independent competing risks, and targeting the hazard ratios of 1.25, 1.50 and 1.75 under alternative hypotheses, the CUSUM scores were calculated quarterly for the period 2005-12. The bootstrap-based boundaries were calculated at the false alarm rate α=0.1 and the tolerance level 1−γ=0.9 and adjusted for multiple comparisons for two tested hip implants. The CUSUM scores did not differ much between the models with and without frailty component. Figure 1 presents the CUSUM charts for the two test datasets as well as the in-control dataset for the models without/with frailty component at all three target hazard ratios. The CUSUM charts without frailty for DePuy ASR Resurfacing Cup produced alarm in the 4th quarter of 2009 for HRs of 1.25 and 1.75, and in the 3rd quarter of 2009 for HR of 1.50. The charts with frailty produced alarm somewhat later, in the 4th quarter of 2009 for all three values of the hazard ratio. This is comparable with the alarm based on PTIR by NJR in April 2010. For the Biomet M2A 38, the CUSUM charts without frailty hit the boundary in the second quarter of 2011 for HR=1.25, in the first quarter of 2011 for HR of 1.50, and in the second quarter of 2010 for HR= 1.75. The CUSUM charts with frailty alarm in the 2rd, the 1nd and the 2nd quarter of 2011, respectively. This is 3 to 4 years prior to the NJR alarm issued in 2014 [8].

Fig. 1
figure 1

CUSUM charts calculated for quarterly revision rates in the three NJR datasets: DePuy ASR Resurfacing Cup (black), Biomet M2A 38 (blue) and in-control dataset (magenta), over the period 2005-12. The control bounds (solid red lines) are estimated by the parametric bootstrap

The estimates of the quality scores Qj and the hazard ratios HRj have been calculated for 269 units included in the control dataset using Naverage=100. Our results demonstrate high heterogeneity in performance. 17 units out of the total of 269 had the quality scores greater than 0.9. HRs for these units were between 0.38 and 0.67. 15 units had the quality score values less than 0.1. Their HRs varied from 1.52 to 2.28.

To check the goodness-of-fit of chosen parametric distributions in our models for revision and mortality, we compared semiparametric estimates of baseline cumulative hazard functions to baseline cumulative hazards obtained from our parametric models, separately within each strata of a moderate to large size with a particular shape value. The results are shown in Fig. 2 for the Weibull baseline hazards in the revision model, and in Fig. 3 for the Gompertz baseline hazards in the mortality model. Additionally, these figures include plots of the residuals between the parametric and semiparametric estimates of the baseline hazards pooled across the strata. In Fig. 2, the larger deviations are still very small in absolute value, and mostly correspond to the small number of operations performed before 2007. Figure 3 is the confirmation of a well-known fact [25] that the Gompertz distribution describes human mortality well only up to 95 years, and the oldest patients in Fig. 3 are the outliers. Overall, the Weibull and the Gompertz models fit the revision and the mortality data, respectively, very well.

Fig. 2
figure 2

Comparison of the baseline cumulative hazard functions estimated using semiparametric (magenta) and parametric (Weibull model, grey) methods. Age&sex groups for revision data

Fig. 3
figure 3

Comparison of the baseline cumulative hazard functions estimated using semiparametric (magenta) and parametric (Gompertz model, grey) methods. Date of operation & cup/head bearing groups for mortality data

To assess the predictive value of our models, we also calculated the Harrell’s concordance index [26, 27] between the predicted and the observed survival. In the models without frailty, the estimates of the concordance were equal to 0.818 (SE=0.009) and 0.732 (SE=0.003) for revision and mortality data, respectively. For the models with frailty, the concordance values were equal to 0.819 (SE=0.009) and 0.732 (SE=0.003), respectively.


In hip replacement surgery, the continuous monitoring of the revision experience of hip prostheses is necessary due to delayed outcomes after the introduction of new brands into practice. CUSUM charts are a useful tool for early detection of changes in the revision rates after hip replacement. In the standard applications of the CUSUM-based monitoring, the learning data set required for the model identification is usually chosen from a preceding period. This assumes the stationarity of the process and leads to loss of information and the reduction of the period under study. Instead, we chose the in-control and the test data from the same period. This novel approach is especially beneficial for the future development of the adaptive version of the algorithm.

In the absence of the gold standard, the choices of the learning dataset and the model describing the data play an important role in the analysis using a self-starting CUSUM. After the routine cleaning of the original dataset, we excluded the records from units with less than 52 hip replacements per year to guarantee to some degree the sufficient experience of the implant within surgical teams. Similarly, only the top 80% of cup/head brands in each year were included to exclude rarely used brands, where the measure of failure rate was unlikely to be stable or robust.

Naive analysis treating competing risk events as noninformative censoring can lead to bias in estimates if competing risks are not independent. The competing risks model with dependent unobserved risk factors (frailties) is a convenient analytical tool for such data.

Two types of failure - revision and death without revision - are considered in this study. Other events during the follow-up period (e.g. loss to follow-up due to migration) are treated as noninformative censoring. In addition to observed factors, we included in the competing risks model correlating type-of-failure-specific random effects and all patients from a unit shared their values [28]. Sex, age, fixation, bearing surfaces, head size, and the date of operation were significantly associated with the life-time of the hip prosthesis. Bad health (ASA 3-5), high deprivation (IMD 4-5), polyethylene/metal bearing surfaces, and the date of operation were significantly associated with the higher hazards of death. These effects were robust against the frailty settings.

Identifiability of the competing risks model with random effects was studied in [29]. The main assumption for the identifiability of this model is the finite mean of the frailty. Identifiability of the bivariate survival models with time-dependent frailties given by the correlated Lévy-processes was studied in the recent publication [30]. Our methodology can be easily adapted to this scenario.

There is no consensus on whether the risks of revision and death are independent in hip replacement. Shwarzer et al. [31] showed these risks to be dependent in their data. However, a recent publication by Sayers et al. [32] argued for independence. Comparing the results from four competing risk models with and without shared frailty terms, we found that the best model included the shared frailty for revision but not for death. This means that the competing risks of revision and death are independent in the NJR data. The variance of the frailty term for revision differed significantly from zero, in other words, there were significant differences between units.

We used the classical AIC and BIC for the model selection. However, the conditional AIC (cAIC) [3335] is more appropriate for use in frailty models, since the marginal AIC favors smaller models excluding random effects. We believe that the use of cAIC would not have changed our models because of the negligibly small values of the estimates for the variance of the frailty for mortality, the very small correlation between frailties, and the practically unchanged value of the log-likelihood compared to the model without a random effect for mortality. The cAIC methods are also very computationally intensive. However, our final model includes the random effect for revision. We intend to incorporate cAIC for model selection in our further work.

We proceeded with CUSUM monitoring of revision rates. The two cup brands, DePuy ASR Resurfacing Cup and Biomet M2A 38, were not included in the learning dataset and their performances were monitored using CUSUM charts. We calculated the adjusted boundaries for three target values of the hazard ratios to guarantee approximately 10% of false alarm rate with probability of 0.9 during the observation period 2005-12. The estimates of the boundary calculated using the models with the frailty component were higher, i.e. more conservative, than the one calculated using the model without the frailty component. This delayed two of the alarms, by three and by 12 months. The charts were comparatively robust to the changes in the target HR levels. The estimated CUSUM scores of the DePuy ASR Resurfacing Cup consistently increased from mid-2009. The increase of the CUSUM scores for the Biomet cup also started in 2009 and produced alarms in 2010-11, four years before the increased failure rate came to the attention of the UK regulatory authorities [15].

Estimating the posterior frailty distribution allows to compare the quality of the hip replacement surgery across units. From the 269 units included in the control dataset, 17 (6.3%) had a decreased hazard of revision with a quality score higher than 0.90 and 15 (5.6%) had an increased hazard of revision with a quality score less than 0.10. The associated hazard ratios of revision across the units varied from 0.38 to 2.28.

Due to low revision rates, the data set under study has about 90% censoring. The properties of the statistical methods in highly censored data sets are not well known. A further simulation study is required to assess the performance of our methods under varying amounts of censoring. Another limitation of this study is the choice of the gamma distribution for the correlated frailties. The advantage of the gamma frailty is a closed form expression for its Laplace transform. It allows for simple expressions for CUSUM scores. However, this choice results in necessarily positive correlations between revision and mortality frailties. Other forms of the frailty distributions (e.g. log-normal) to allow possible negative correlations will be pursued in our future work.


This study developed and implemented, for the NJR data, continuous monitoring methods for surgical outcomes. We used the Weibull and the Gompertz hazard functions to describe the baseline hazards of revision and death, respectively. These functions appear to provide a good approximation to the respective type-of-failure life-time. However, adjustment for observed covariates is necessary to improve this approximation and to better understand the influence of the different factors on the life-times of the hip prosthesis and the patient.

Flexible parametrization taking into account possible influence of observed covariates on the shape and the slope parameters of the revision and mortality hazard functions as well as inclusion of the random effects (frailties) accommodate non-proportional hazards and improve the fit of our models to observed data.

Our results demonstrate that the competing risks of revision and death are independent in the NJR data. This finding will facilitate further development of continuous monitoring methods for these data.

We developed a novel method of CUSUM-based monitoring of revision rates. This method includes the choice of the in-control and the test data from the same period, and can be expanded for the subsequent development of an adaptive algorithm. Implementation of the special bootstrap algorithm to estimate the control limits in the CUSUM method guarantees with high probability that the false alarm rate is below a prespecified level. An earlier detection of failure signal by our method in comparison to the PTIR method may be explained by proper risk-adjustment and the ability to accommodate time-dependent hazards.

We found considerable variation in the hazard ratios of revision across the units. Therefore, the continuous monitoring of hip replacement outcomes should include risk adjustment at both the individual and unit level.

Our approach can be easily adapted to other practice areas requiring the continuous monitoring of the failure rates. Further development of the dynamic CUSUM-based methodology similar to that of [36] is needed to adapt our approach to real-time applications, where the new data are regularly updated. Additionally, more sophisticated methods are required to adjust for multiplicity if testing hundreds of various implant brands. We intend to address these further challenges elsewhere.

Availability of data and materials

The NJR data are available to interested researchers subject to approval of the data access request by the Healthcare Quality Improvement Partnership (HQIP) and governance controls. The R programs used to analyse the data are available from the authors on request.



American Society of Anesthesiologists


Body mass index


Cumulative sum


Hazard Ratio


Index of Multiple Deprivation


National Joint Register


Patient time incident rate


Care Quality Commission


  1. Page E. Continuous inspection schemes. Biometrika. 1954; 14:100–15.

    Article  Google Scholar 

  2. de Leval M, Franćois K, Bull C, Brawn W, Spiegelhalter D. Analysis of a cluster of surgical failures: Application to a series of neonatal arterial switch operations. J Thorac Cardiovasc Surg. 1994; 107(3):914–924.

    Article  CAS  PubMed  Google Scholar 

  3. Spiegelhalter D, Sherlaw-Johnson C, Bardsley M, Blunt I, Wood C, Grigg O. Statistical methods for healthcare regulation: rating, screening and surveillance. J R Stat Soc Ser A Stat Soc. 2012; 175(1):1–47.

    Article  Google Scholar 

  4. Bottle A, Aylin P. Intelligent information: A national system for monitoring clinical performance. Health Serv Res. 2008; 43:10–31.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Grigg O, Farewell V, Spiegelhalter D. Use of risk-adjusted CUSUM and RSPRT charts for monitoring in medical contexts. Stat Methods Med Res. 2003; 12(2):147–170.

    Article  CAS  PubMed  Google Scholar 

  6. Biau D, Meziane M, Bhumbra R, Dumaine V, Babinet A, Anract P. Monitoring the quality of total hip replacement in a tertiary care department using a cumulative summation statistical method (CUSUM). J Bone Joint Surg Br. 2011; 93:1183–1188.

    Article  CAS  PubMed  Google Scholar 

  7. Hardoon S., Lewsey J., van der Meulen J. Continuous monitoring of long-term outcomes with application to hip prostheses. Stat Med. 2007; 26(28):5081–5099.

    Article  PubMed  Google Scholar 

  8. National Joint Register. 14th Annual report 2017. surgical data to 31 December 2016. 2017.

  9. Biswas P, Kalbfleisch J. A risk-adjusted CUSUM in continuous time based on the Cox model. Stat Med. 2008; 27(17):3382–3406.

    Article  PubMed  Google Scholar 

  10. Macpherson G, Brenkel I, Smith R, Howie C. Outlier analysis in orthopaedics: Use of CUSUM: The Scottish Arthroplasty Project: Shouldering the burden of improvement. J Bone Joint Surg Am. 2011; 93:81–88.

    Article  PubMed  Google Scholar 

  11. Assareh H, Smith I, Mengersen K. Bayesian estimation of the time of a linear trend in risk-adjusted control charts. Int J Comput Sci. 2011; 38(4):409–417.

    Google Scholar 

  12. Gandy A, Kvaløy J. Guaranteed conditional performance of control charts via bootstrap methods. Scand Stat Theory Appl. 2013; 40:647–668.

    Article  Google Scholar 

  13. National Joint Register. 12th Annual Report 2015. Surgical data to 31 December 2014. 2015.

  14. National Joint Register. 8th Annual Report 2011. Surgical data to 31 December 2010. 2011.

  15. National Joint Register. NJR implant performance analysis methodology. 2017.

  16. National Joint Register. 10th Annual Report 2013. Surgical data to 31 December 2012. 2013.

  17. Owens W, Felts J, Spitznagel JE. ASA physical status classifications: a study of consistency of ratings. Anesthesiol. 1978; 49:239–43.

    Article  CAS  Google Scholar 

  18. English Indices of Deprivation. Guidance Document.\\attachment_data/file/6222/1871538.pdf.

  19. R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2016.

  20. Gandy A, Lau F-H. Non-restarting CUSUM charts and control of the false discovery rate. Biometrika. 2013; 100(1):261–8.

    Article  Google Scholar 

  21. Glidden D, Vittinghoff E. Modelling clustered survival data from multicentre clinical trials. Stat Med. 2004; 23(3):369–88.

    Article  PubMed  Google Scholar 

  22. Gleiss A, Gnant M, Schemper M. Explained variation in shared frailty models. Stat Med. 2017; 37(9):1472–90.

    Google Scholar 

  23. Wienke A. Frailty Models in Survival Analysis. New York: Chapman & Hall; 2010.

    Book  Google Scholar 

  24. Nielsen G, Gill R, Andersen P, Sørensen T. A counting process approach to maximum likelihood estimation in frailty models. Scand J Stat Theory Appl. 1992; 19:25–43.

    Google Scholar 

  25. Vaupel JW. Biodemography of human ageing. Nature. 2010; 464(7288):536–542.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Harrell Jr FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996; 15(4):361–87.

    Article  CAS  Google Scholar 

  27. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010; 21(1):128–138.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Gorfine M, Hsu L. Frailty-based competing risks model for multivariate survival data. Biometrics. 2011; 67(2):415–26.

    Article  PubMed  Google Scholar 

  29. Abbring JH. The identifiability of the mixed proportional hazards competing risks model. J R Statist Soc B. 2003; 65(3):701–10.

    Article  Google Scholar 

  30. Begun A, Yashin A. Study of the bivariate survival data using frailty models based on lévy processes. AStA Adv Stat Anal. 2018; 103(1):37–67.

    Article  Google Scholar 

  31. Shwarzer G, Schumacher M, Maurer T, PE O. Statistical analysis of failure times in total joint replacement. J Clin Epidemiol. 2001; 54:997–1003.

    Article  Google Scholar 

  32. Sayers A, Evans J, Whitehouse M, Blom A. Are competing risks models appropriate to describe implant failure?. Acta Orthopaedica. 2018; 89(3):256–8.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Vaida F, Blanchard S. Conditional Akaike information for mixed-effects models. Biometrika. 2005; 92(2):351–70.

    Article  Google Scholar 

  34. Greven S, Kneib T. On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika. 2010; 97(4):773–89.

    Article  Google Scholar 

  35. Ha ID, Jeong JH, Lee Y. Statistical Modelling of Survival Data with Random Effects. Singapore: Springer; 2017.

    Book  Google Scholar 

  36. Zhang X, Woodall W. Dynamic probability control limits for risk-adjusted Bernoulli CUSUM charts. Stat Med. 2015; 34(25):3336–3348.

    Article  PubMed  Google Scholar 

Download references


The authors thank Sophie E. Garrett and Dr Wenjia Wang for the extraction of the preliminary NJR dataset in an analysis friendly format. The authors also thank the referees, Ha Il Do and Vera Tomazalla for their useful suggestions for improving the presentation of the material of this article.

We thank the patients and staff of all the hospitals in England, Wales and Northern Ireland who have contributed data to the National Joint Registry. We are grateful to the Healthcare Quality Improvement Partnership (HQIP), the NJR Research Sub-committee and staff at the NJR Centre for facilitating this work. The authors have conformed to the NJR’s standard protocol for data access and publication. The views expressed represent those of the authors and do not necessarily reflect those of the National Joint Registry Steering Committee or the Health Quality Improvement Partnership (HQIP) who do not vouch for how the information is presented.

The Healthcare Quality Improvement Partnership ("HQIP") and/or the National Joint Registry ("NJR") take no responsibility for the accuracy, currency, reliability and correctness of any data used or referred to in this report, nor for the accuracy, currency, reliability and correctness of links or references to other information sources and disclaims all warranties in relation to such data, links and references to the maximum extent permitted by legislation. HQIP and NJR shall have no liability (including but not limited to liability by reason of negligence) for any loss, damage, cost or expense incurred or arising by reason of any person using or relying on the data within this report and whether caused by reason of any error, omission or misrepresentation in the report or otherwise. This report is not to be taken as advice. Third parties using or relying on the data in this report do so at their own risk and will be responsible for making their own assessment and should verify all relevant representations, statements and information with their own professional advisers.


The work by A. Begun and E. Kulinskaya was supported by the Economic and Social Research Council [grant number ES/L011859/1]. The work by A. Begun was also supported by the Orthopaedics Trust.

Author information

Authors and Affiliations



All authors have made contributions to conception, design and methodology of this study. AJM formulated the problem and obtained the data, AB and EK contributed to methods development, AB carried out the analysis, and EK drafted the first version of the manuscript. All authors have been involved in revisions, read and approved the final manuscript.

Corresponding author

Correspondence to Elena Kulinskaya.

Ethics declarations

Ethics approval and consent to participate

The NJR data were made available after a formal request to the NJR Research Committee. The data were anonymised in respect to patient, to surgeon and to operating unit identifying details. Approval was obtained from Computing Subcommittee of the University of East Anglia Ethics Committee, reference number CMP/1718/F/10A.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1

calculation of the CUSUM score

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Begun, A., Kulinskaya, E. & MacGregor, A.J. Risk-adjusted CUSUM control charts for shared frailty survival models with application to hip replacement outcomes: a study using the NJR dataset. BMC Med Res Methodol 19, 217 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: