Methodology and reporting characteristics of studies using interrupted time series design in healthcare

Background Randomised controlled trials (RCTs) are considered the gold standard when evaluating the causal effects of healthcare interventions. When RCTs cannot be used (e.g. ethically difficult), the interrupted time series (ITS) design is a possible alternative. ITS is one of the strongest quasi-experimental designs. The aim of this methodological study was to describe how ITS designs were being used, the design characteristics, and reporting in the healthcare setting. Methods We searched MEDLINE for reports of ITS designs published in 2015 which had a minimum of two data points collected pre-intervention and one post-intervention. There was no restriction on participants, language of study, or type of outcome. Data were summarised using appropriate summary statistics. Results One hundred and sixteen studies were included in the study. Interventions evaluated were mainly programs 41 (35%) and policies 32 (28%). Data were usually collected at monthly intervals, 74 (64%). Of the 115 studies that reported an analysis, the most common method was segmented regression (78%), 55% considered autocorrelation, and only seven reported a sample size calculation. Estimation of intervention effects were reported as change in slope (84%) and change in level (70%) and 21% reported long-term change in levels. Conclusions This methodological study identified problems in the reporting of design features and results of ITS studies, and highlights the need for future work in the development of formal reporting guidelines and methodological work. Electronic supplementary material The online version of this article (10.1186/s12874-019-0777-x) contains supplementary material, which is available to authorized users.


Background
Rigorous evaluations are the cornerstone of evidencebased healthcare. The gold standard for evaluating the causal effect of an intervention is a randomised controlled trial (RCT). However, RCTs do have limitations, they can be costly, unethical or impractical to conduct [1][2][3]. Therefore, researchers must consider alternative designs to evaluate interventions and quasi-experimental studies are one possible solution [4]. Quasi-experimental studies do not use randomisation and may use both preand post-intervention data. Interrupted time series (ITS) is considered one of the strongest quasi-experimental designs [4].
In an ITS design, data are collected at multiple and equally spaced time points (e.g. weekly, monthly, or yearly) before and after an intervention. Knowing the exact time when an intervention occurs is an important feature. The main objective of an ITS is to examine whether the data pattern observed post-intervention is different to that observed pre-intervention. There are a range of effect estimates to describe the impact of the intervention. For example, a change in level corresponds to the difference in the time point of interest to the predicted pre-intervention trend or a change in slope which is the difference between the post-and pre-intervention slopes [5].
When designing an ITS design and analysing the data there are important characteristics that need to be considered, these include (1) autocorrelation, whereby data collected closely together are correlated with each other, (2) nonstationary or secular trend, which is where the data are increasing or decreasing over time irrespective of any intervention, (3) seasonality or cyclic patterns, (4) outliers, (5) other interventions (interruptions) occurring in the data series, and (6) sample size.
Three systematic reviews [5][6][7] have looked at ITS designs but all focussed on specific, narrow areas. Polus et al [6] focused on pre-specified intervention types (for example behavioural/educational, clinical, environmental, health policy, and health systems) and only included 16 studies from Cochrane Effective Practice and Organisation of Care (EPOC) reviews. The key findings were ITS terminology is used for a variety of study designs and when an ITS design is used the intervention is at the organisational level. Jandoc et al [7] included 220 studies but only focused on drug utilisation. They found ITS designs increasingly being used but reporting standards varied. Ramsay et al [5] concentrated on mass media interventions, included 58 studies, concentrated on methodology quality and concluded that in most cases there was poor reporting of study design. To our knowledge, no study has assessed the breadth of use of ITS designs across healthcare settings.
The aim of this methodological study was to identify a cohort of ITS designs across all healthcare settings and to describe how ITS are being used, what design characteristics are considered, and how they are reported.

Methods
This methodological study was conducted according to a prespecified study protocol [8].

Inclusion and exclusion criteria
We included ITS with a minimum of two data points pre and one post-intervention that assessed a health or healthcare intervention (e.g. programs, policies, or educational interventions). Systematic reviews, metaanalysis, RCTs, ITS designs with a control group or studies that did not use an ITS analysis were excluded. There were no restrictions on participants, language of study, or the type of outcome.

Search strategy
We searched MEDLINE(R) and Epub Ahead of Print, In-Process & Other Non-Indexed Citations and Daily) in October 2016 for ITS designs published in 2015. The search strategy is in Additional file 1.
JH screened titles and abstracts identified by the search for inclusion. CR and SF double assessed 10% of the titles and abstracts and if there were no disagreements then JH would proceed to single screening. Full-text copies for all the potential studies were obtained and assessed for inclusion by JH with CR and SF double assessing 10%.

Data extraction and analysis
A data extraction form was developed and piloted on three randomly selected papers by all authors. JH recorded the relevant details of all included studies onto a data extraction form. Two authors (CR and SF) independently assessed 10% of randomly selected studies and if there were no disagreements then JH would proceed to data extract the remaining studies. Data extracted from the studies consisted of: definition of study design (e.g. ITS, before-and-after), country of study, study objectives (population, intervention and the outcomes of interest), type and level of intervention, participants, data source, type of outcome, the number of data points collected pre-and post-intervention and the frequency, study methodology characteristics, estimations of intervention effects and the reporting details of the abstract and discussion. We based data extraction on the primary outcome and if no defined primary outcome was reported, we used the first reported outcome. We summarised data using descriptive statistics (numbers and percentages or median, 25th, and 75th centile).
As a methodological study, risk of bias assessment was not performed on individual studies.

Results
The search strategy identified 3111 title and abstracts ( Fig. 1). After removing duplicates (187) and excluding 2552 titles and abstracts that did not meet the inclusion criteria, 372 full-text studies were assessed for eligibility. Of these, 256 articles were excluded with the majority (170) having too few time points. A total of 116 articles were included in the study and the list of included studies is provided in Additional file 2. As there were no disagreements in the screening and data extraction, CR and SF only double assessed 10%.
In the abstract of the included studies, the intervention was clearly defined in 110 (95%) of studies ( Table 1). The method of analysis was reported in 57% of the studies and the number of pre-and post-intervention data points were stated in 29% and 28% of studies respectively. The main results were reported in three quarters of the studies.
Characteristics of the included studies are presented in Table 2. Seventy four (64%) studies provided a study definition in at least the title, abstract, or the main paper. Of these 9 (12%) provided a definition in the title, abstract, and main paper with only two consistently giving the same definition. Examples of study definitions used include ITS, quasi-experimental, time series, observational, cohort, and cross-sectional study. The majority of studies were from the US (34%), the UK (16%), Asia (16%), and Europe (12%). The type of intervention varied, the most common being programs (35%) and policies (28%). Interventions were mainly at the hospital level (63%), aimed at health professionals (79%), and over half of the data came from hospitals (55%). Data were mainly collected at monthly intervals (64%), and the number of data points collected pre-and post-intervention was mostly the same within any given study with a median ratio of one (25th, 75th centile 1-2). Additionally, Study objectives were clearly defined in 85 (73%) of studies and 17 (15%) gave a rationale for using an ITS design. The majority of the studies investigated one intervention (78%), 15% studied two and 8% studied three or more. Only 30 (26%) specified a primary and/or secondary outcome and 107 (92%) clearly defined when the intervention occurred. Eightythree (72%) gave a funding source statement, 7 (6%) studies referred to a study protocol, and only one study was non-English (Spanish).
Characteristics of the methodology used in each study showed all but one study provided a description of the analysis, with segmented regression analysis being used in 90 (78%) studies (Table 3). Autocorrelation was considered in 63 (55%) studies, of these only 40 (63%) reported any formal testing. Only 9 (8%) studies considered nonstationary and seasonality was considered in 28 (24%).
Seven (6%) studies provided a sample size description, three gave a discussion only with no formal calculation and four gave a calculation of which one could be reproduced. For these four, the sample size was based on detecting an "importance difference" for two studies, while the remainder determined what effect size and power was detectable based on data they had. However all four  based the sample size on comparing differences in proportions pre-and post-intervention. The unit of analysis was the same as the unit of intervention in 74 (64%) studies and a description of how missing data was handled was reported in 5% of studies. A transition period, a period which allows the intervention to take affect was considered in 17% of studies. Sensitivity analysis was carried out in 20 (17%) of studies. Of these, 16 studies did another form of analysis (e.g. adjusted for autocorrelation, seasonality, and covariates), and two considered a transition period.
Data were presented graphically in 109 (94%) of studies. Three (3%) studies reported outliers in their data but only one reanalysed the data with the data point removed. Results of the analysis were reported by 115 (99%) with both relative and absolute figures presented (Table 4). Change in slope (84%) and change in level (70%) were the most common intervention effects reported with 33 (28%) reporting both. Of the studies that reported a change in level, 75 (93%) reported an immediate change while 17 (21%) reported other change in levels, for example, 12 months after the intervention. Only 8% reported confidence intervals (CIs) or standard errors (SEs) along with the relative slope effects, 80% for an immediate change in level effect and 65% for other change in level effects. Table 5 describes the characteristics of the discussion section of the studies. All studies gave an overall interpretation of results, 113 (97%) studies summarised key  Other data points that were accounted for (n = 3)

(2-8)
Values are n (%) unless otherwise stated. a 90 day periods; five times over two weeks before and after; daily Abbreviations: US United States, UK United Kingdom results and one study discussed the impact of outliers in their data. Forty five (39%) gave a discussion or statement of whether other co-interventions might have taken place in the study. Weakness and limitations were discussed in 98 (84%) of studies, and 39 (34%) indicated strengths.

Discussion
To our knowledge this methodological study is the first to show inconsistencies and gaps in the reporting of design features and results of ITS design across a variety of healthcare studies and the first to describe the different ways of reporting effect estimates (e.g. relative slope and level effects and absolute change in slope and level). Reporting effect estimates is vital to interpret studies, but due to the variety of possible estimates that could be reported, this can cause challenges across multiple studies. For example, a change in level corresponds to the difference in the time point of interest to the predicted pre-intervention trend where a change in slope is the difference between the post-and pre-intervention slopes [5]. The different effect estimates cannot be combined together. Therefore, although most individual studies in this study reported an effect size, the many different  Values are n (%). Abbreviations: ARIMA, Autoregressive integrated moving average; GEE generalized estimating equation; ACF, autocorrelation function; PACF Partial autocorrelation function ways of reporting make meta-analyses difficult. In addition, none of the included studies based the sample size justification on the effect size of interest, making interpretation of a meaningful difference difficult. For example, the sample size calculation compared differences in proportions pre-and post-intervention, but the effect size was a difference in slope [9]. Only 74 (64%) of studies provided a study definition in at least the title, abstract, or the main paper with definitions varying between them and only 29 studies referring to ITS. As ITS studies can be included in systematic reviews a clear and consistent definition is needed. This is so researchers can identify ITS studies to be included in their review and be sure no studies are missed because of the study definition.
This study identified five statistical methods used to analysis ITS data (Table 5), however there has been no research comparing ITS methods to determine which one to use and when. Also, the varying statistical methods can have implications on what effect estimates are reported, which can impact on the interpretation of results and effect the pooling of results for a metaanalysis.
It was not possible to determine whether the statistical analysis performed was appropriate in some cases. Our study found that one study did not provide any description of the analysis and of the 115 that did only 55% considered autocorrelation, 10% non-stationary and 28% seasonality. These figures are slightly lower than the previous reviews of Jandoc et al [7]. (autocorrelation 66%; non-stationary 15%; seasonality 31%), while Polus et al [6]. reported autocorrelation and non-stationary in 31% of studies. For ITS studies these considerations are important as it can cause the results either to be underestimated or overestimated which could affect the overall conclusions of studies.
This study had no restrictions on the type of intervention included, language of report and included 116 studies, therefore the findings are representative of ITS studies that are published. However there were some limitations. Only one year was searched, 2015, but there are no reasons to believe that other years would give a markedly different perspective. We potentially could have missed papers due to the inconsistency in the reporting of the definition of an ITS study as well as not searching grey literature and other databases but we do not see these limitations biasing the representativeness of ITS studies.

Conclusions
Currently reporting is poor. Jandoc et al [7] provided reporting recommendations of ITS studies, however our study has identified gaps in reporting which are not included. In the statistical methods section of the study, a  description of a sample size calculation or a justification should be provided and if present, how missing data was handled. For reporting of effect sizes, there needs to be consistency in what should definitely be reported along with confidence intervals. Also, the study design should be indicated in the main paper in addition to the title and abstract. Therefore, there is a need to provide clearer guidelines for reporting standards through consensus approaches such as a consensus meeting or a Delphi study. In additional to poor reporting of ITS studies, this study highlighted that there are numerous ways of analysing ITS studies. This can make interpretation of results difficult, for example presenting effect sizes as either relative or absolute. This illustrates the need to assess methodological strengths and weaknesses of current ITS analysis methods.

Additional files
Additional