All analyses are carried out separately for male and female cohorts, as they have separate CVD risk prediction models in practice.

### Data source

A ‘CVD primary prevention cohort’ was defined from a Clinical Practice Research Datalink (CPRD) [7] dataset linked with Hospital Episode Statistics [8] (HES) and Office for National Statistics [9] (ONS) using the same criteria as QRISK3 [1]. The study period was 1st Jan 1998 to 31st Dec 2015 and the cohort entry date defined as the latest of: date turned 25; one year follow up as a permanently registered patient in CPRD; or 1st Jan 1998. Patients were excluded if they had a CVD event (identified through CPRD, HES or ONS) or statin prescription prior to their cohort entry date. The end of follow up was: the earliest date of patient’s transfer out of the practice or death; last data collection for practice; 31st Dec 2015 or five years follow up. Patients were censored after five years as five year risk predictions are used throughout this study. All predictor variables included in the QRISK3 [1] risk prediction model were extracted at cohort entry date. Code lists and detailed information on how variables were defined is provided in Additional file 1.

### Quantifying the miscalibration in risk predictions of patients in the present day

The first step was to quantify the miscalibration induced by developing a model over a time period in which a secular trend in CVD was present, and using it to calculate risk predictions for patients after this time period. Missing data for body mass index (BMI), systolic blood pressure (SBP), SBP variability, cholesterol, high density lipoprotein (HDL), smoking status and ethnicity in the CVD primary prevention cohort was imputed using multiple imputation by chained equations. The imputation model included all predictor variables from QRISK3, the Nelson Aalen estimation of the cumulative baseline hazard at the point of censoring or an event, and the outcome indicator. The package used to do this was mice [10]. Only one imputed dataset was produced, as running the analysis across multiple datasets and combining estimates was not essential to answering our hypotheses, and the computational time to do so was significant. Also the bespoke imputation procedure carried out on the data for developing the MSM (described later) resulted in a single dataset, so the decision was made across all analyses for consistency.

Patients were then split into two cohorts defined by their cohort entry date. Those with a cohort entry date prior to 1st Jan 2010 were put into the development cohort, with the remaining patients making up the validation cohort. Patients in the development cohort were then censored at 1st Jan 2010 if their follow up extended beyond this point. The data was split like this because if QRISK3 was replicated exactly using data from 1998 to 2015 for model development, it would not have been possible to assess the calibration of risk scores for patients after 2015, as they would have no follow up.

A Cox proportional hazards model using the same predictor variables as QRISK3 was then fit to the development cohort. Fractional polynomials of age, BMI and SBP were tested for using the mfp package [11]. Five year risk predictions were then generated for both the development and validation cohort using this model, and the calibration of these risks was assessed. Calibration was assessed by splitting individuals from the cohort into 10 groups by their predicted risk (deciles). The Kaplan Meier estimate of risk (observed risk) was then plot against the average predicted risk (predicted risk) within each decile. Eq. (1) corresponds to this model, where *h*(*t*) denotes the hazard function, *h*_{0}(*t*) the baseline hazard at time *t*, *X*_{0} the vector of predictors at cohort entry date and *β*_{X} a vector of the associated coefficients .

$$ h(t)={h}_0(t)\ast \exp \left({\beta}_X.{X}_0\right) $$

(1)

### Attempt to model the secular trend to remove miscalibration in validation cohort

Given the miscalibration that was found in the validation dataset (see results), this indicated that the secular trend could not be explained by changes in predictor variables between the development and calibration dataset. This provided support for modelling the secular trend in the development cohort, to try and remove the miscalibration in the validation cohort. The same Cox model defined by eq. (1) was fitted to the development cohort, but with cohort entry date included as a variable, referred to as calendar time. This is denoted by *T*_{0} in Fig. 1 (DAG-1) and eq. (2). Unmeasured confounding is left off the DAGs to reduce the number of arrows and maintain clarity (particularly for DAG-2), however it may be present. The implications of unmeasured confounding are discussed in the limitations section (see exchangeability assumption).

All DAGs were generated using the dagitty software [12]. Fractional polynomials for this variable were tested using the mfp package [11]. Five year risks were generated for validation cohort and the calibration of the models was assessed.

$$ h(t)={h}_0(t)\ast \exp \left({\beta}_T.{T}_0+{\beta}_X.{X}_0\right) $$

(2)

### Developing an MSM to assess secular trend after adjusting for statin use during follow up

#### MSM – overview

A major concern was that an increase in statin use over time may have caused some of the reduction in CVD incidence. If the secular trend was driven by statin use, then modelling it (which would result in lower predicted risks) would make lots of patients whose risk if they remained untreated was > 10%, ineligible for treatment. Statin use at baseline could not have been driving this secular trend as the development cohort only considered patients who were statin free at baseline, however patients could initiate statins during follow up. The aim of this section was therefore to assess the presence of the secular trend when adjusting for statin use during follow up.

It is possible to adjust for changes in predictor variables and statin use post baseline using standard regression techniques (such as an interval censored Cox model). This would result in an estimate of the direct effect of calendar time on CVD incidence, the portion of which is not explained through changes in the predictor variables and statin use during follow up. Such a model would be sufficient for assessing whether the secular trend remained after adjusting for statin use during follow up in the development cohort. However the model could not be used in a risk prediction setting, as future values of predictor variables would be required to generate risk scores. When generating a risk score for a new individual, you would not know the future values of their predictor variables. Furthermore, the coefficient of statin use during follow up would not be causal, and the risk of a patient if they did/did not initiate statins during follow up could therefore not be estimated [13]. Therefore the proposed method to answer our question was an MSM.

MSMs were developed to calculate the causal effect of a time dependent exposure on an outcome in an observational setting, where the treatment and outcome are confounded by time varying covariates [14, 15]. Sperrin et al. [13] have shown how MSMs can be used to adjust for ‘treatment drop in’, the issue of patients starting treatment during follow up in a dataset being used for risk prediction. Consider DAG-2 (Fig. 2), where *k* = 0 denotes baseline, and *k* = 1, 2 two time points during follow up (this could be extended to any number of time points). *A*_{k} denotes the statin treatment status at time *k*, *X*_{k} covariate information prior to time *k*, and *T*_{k} calendar time at time *k*. Note *A*_{0} is not included in DAG-2 as *A*_{0} = 0 by definition of the CVD primary prevention cohort. In the absence of unmeasured confounding, MSM’s allow for the estimation of \( E\left[Y\left(\underset{\_}{A}=\underset{\_}{0}\right)|{X}_0\right] \), where __A__ denotes the entire treatment course during follow up, as opposed to *E*[*Y*(*A*_{0} = 0)| *X*_{0}]. The strategy involves adjusting for variables at baseline as normal and then re-weighting the population by variables that may be on the treatment causal pathway, breaking the links from *X*_{k} to *A*_{k}. In the resulting pseudo population the allocation of treatment during follow up happens at random (within the levels of the variables defined at baseline). This allows the generation of risk scores using data at baseline only, but also accounting for statin use during follow up (the risk scores developed in a counterfactual scenario that no-one receives statin treatment). Importantly for this study, if calendar time only effected the outcome Y through increasing statin use in follow up, when using an MSM the direct effect of *T*_{0} on Y would be zero, and adjusting for calendar time at baseline would not result in a drop in the average risk score of patients in the validation cohort.

The estimator of \( E\left[Y\left(\underset{\_}{A}=\underset{\_}{0}\right)|{X}_0\right] \) is only valid under the three identifiability assumptions of causal inference (exchangeability, consistency and positivity) and correct specification of the marginal structural model, and the model used to calculate the weights. The viability of these assumptions in this study is discussed in the limitations.

#### MSM - data derivation

The CVD primary prevention cohort was used as a starting point. However in order to derive the MSM, patient information was extracted at 10 time points, at 6 month intervals from the cohort entry date, denoted as *X*_{k} and *A*_{k} for *k* = 0, 1, 2,…, 9. The variable *X*_{k} contained all the QRISK3 predictors evaluated at time *k* (for test data this was the most recent value prior to time *k*). *A*_{k} = 1 if a patient had initiated statin treatment prior to *k*, and *A*_{k} = 0 otherwise. As patients were excluded from the cohort if they have had a statin prescription prior to their cohort entry date, A_{0} = 0 for all patients. If a CVD event happened within 6 months of a statin initiation, the statin initiation was ignored. This was to stop any effects of poorly recorded data (start of statins may have been triggered by the CVD event).

A key issue in deriving the dataset was missing data. A combination of imputation techniques were implemented to maintain consistency in variable information within each patient across the 10 time points. First, where possible, last observation carried forward imputation was implemented within each patient. Then, where possible, next observation carried backwards imputation was used to impute the remaining missing data. However, there was still missing data for patients who had no entries across all 10 time points for a given variable. The data at baseline was then extracted and missing values were imputed using one stochastic imputation. All predictor variables, Nelson Aalen estimate of baseline hazard and the outcome indicator were included in the imputation model (same process that was used to impute the data for the standard Cox model). These imputed baseline values were then used at each following time point (last observation carried forward imputation).

#### MSM - calculation of weights and specification of model

The MSM was fitted as a weighted interval censored Cox model using the coxph function from the survival package [16]. The weights themselves were calculated using the IPW package [17]. Stabilised weights were calculated as is common practice to provide more precise estimation of the weights. For individual *i*, the formula for the weight of interval/time period K was defined as:

$$ {sw}_i=\prod \limits_{k=0}^K{\left({\hat{p}}_{ki}^{\ast}\right)}^{A_{ki}}{\left(1-{\hat{p}}_{ki}^{\ast}\right)}^{1-{A}_{ki}}/\prod \limits_{k=0}^K{\left({\hat{p}}_{ki}\right)}^{A_{ki}}{\left(1-{\hat{p}}_{ki}\right)}^{1-{A}_{ki}} $$

(3)

where \( {\hat{p}}_{ki}^{\ast }=P\left[{A}_k=1|{\underset{\_}{A}}_{k-1},{X}_0\right] \) and \( {\hat{p}}_{ki}=P\left[{A}_k=1|{\underset{\_}{A}}_{k-1},{\underset{\_}{X}}_k,{X}_0\right] \), and \( {\underset{\_}{A}}_k \) and \( {\underset{\_}{X}}_k \) denote treatment history and covariate history respectively up time point *k* for individual *i*. More simply put, the denominator is the probability that the individual received the treatment they did, based on time varying predictors and predictors at baseline. The numerator is the probability that the individual received the treatment they did, based on predictors at baseline only. The models used to estimate the probability of treatment when deriving the weights were interval censored Cox models. If calendar time at baseline, *T*_{0}, was being included in the MSM, it was also included as a stabilising factor in the calculation of the weights as part of *X*_{0}. Detailed information on how to calculate weights is also given in the literature [15, 17, 18] and the formula for calculating weights (and notation for variables) matches that from the work by Sperrin et al. [13]

Two MSM’s were created, one that adjusted for calendar time at baseline and one that did not:

$$ h(t)={h}_0(t)\ast \exp \left({\beta}_A.{A}_t+{\beta}_X.{X}_0\right) $$

(4)

$$ h(t)={h}_0(t)\ast \exp \left({\beta}_A.{A}_t+{\beta}_X.{X}_0+{\beta}_T{T}_0\right) $$

(5)

The same fractional polynomials of age, BMI, SBP and calendar time that were found to be optimal in the standard Cox models were used in the MSM, and in the models used to calculate the weights. Ideally we would have re-calculated the optimal fractional polynomials for the weighted model fitted to the interval censored data, however software was not available to do this. Using the same fractional polynomials from the standard Cox analysis was preferred to having no fractional polynomials, as removing them led to poorly calibrated models. The coefficient *β*_{A} is the average causal effect of initiating statin treatment after adjusting for all other variables. It is quite common to allow the effect of statin treatment to be modified by baseline variables, which could be achieved by including interaction terms *A*_{t}*X*_{0}. However the primary aim was to account for statin use in follow up, rather than calculate the effect of statin treatment in different subgroups, so we did not feel this was necessary.

As a comparison, unweighted interval censored Cox models using only data at baseline (i.e. equation (1) and eq. (2) were fitted to the same data as the MSM. The effect of modelling the secular trend could then be assessed when using (interval censored) Cox regression, as well as under the MSM framework. This was preferred to re-using the standard Cox models directly, which were fitted to a different dataset.

#### MSM – analysis of interest

The MSM was used to generate risk predictions assuming no statin treatment at baseline or during follow up, \( E\left[Y|{X}_0,\underset{\_}{A}=\underset{\_}{0}\right] \), the estimator of \( E\left[Y\left(\underset{\_}{A}=\underset{\_}{0}\right)|{X}_0\right] \). The interval censored Cox model only produced risk predictions based on no statin treatment at baseline, *E*[*Y*| *X*_{0}, *A*_{0} = 0], the estimator of *E*[*Y*(*A*_{0} = 0)| *X*_{0}, ]. The outcome of interest was the risk ratio of the average predicted risk of patients in the validation cohort, before and after adjusting for calendar time at baseline in the MSM framework, \( E\left[Y\left(\underset{\_}{A}=\underset{\_}{0}\right)|{X}_0,{T}_0\right]/E\left[Y\left(\underset{\_}{A}=\underset{\_}{0}\right)|{X}_0\right] \). This was compared to the risk ratio after adjusting for calendar time at baseline in the unweighted interval censored Cox models, (*E*[*Y*(*A*_{0} = 0)| *X*_{0}, *T*_{0}]/*E*[*Y*(*A*_{0} = 0)| *X*_{0}]).