Skip to main content

A scoping review of studies using observational data to optimise dynamic treatment regimens



Dynamic treatment regimens (DTRs) formalise the multi-stage and dynamic decision problems that clinicians often face when treating chronic or progressive medical conditions. Compared to randomised controlled trials, using observational data to optimise DTRs may allow a wider range of treatments to be evaluated at a lower cost. This review aimed to provide an overview of how DTRs are optimised with observational data in practice.


Using the PubMed database, a scoping review of studies in which DTRs were optimised using observational data was performed in October 2020. Data extracted from eligible articles included target medical condition, source and type of data, statistical methods, and translational relevance of the included studies.


From 209 PubMed abstracts, 37 full-text articles were identified, and a further 26 were screened from the reference lists, totalling 63 articles for inclusion in a narrative data synthesis. Observational DTR models are a recent development and their application has been concentrated in a few medical areas, primarily HIV/AIDS (27, 43%), followed by cancer (8, 13%), and diabetes (6, 10%). There was substantial variation in the scope, intent, complexity, and quality between the included studies. Statistical methods that were used included inverse-probability weighting (26, 41%), the parametric G-formula (16, 25%), Q-learning (10, 16%), G-estimation (4, 6%), targeted maximum likelihood/minimum loss-based estimation (4, 6%), regret regression (3, 5%), and other less common approaches (10, 16%). Notably, studies that were primarily intended to address real-world clinical questions (18, 29%) tended to use inverse-probability weighting and the parametric G-formula, relatively well-established methods, along with a large amount of data. Studies focused on methodological developments (45, 71%) tended to be more complicated and included a demonstrative real-world application only.


As chronic and progressive conditions become more common, the need will grow for personalised treatments and methods to estimate the effects of DTRs. Observational DTR studies will be necessary, but so far their use to inform clinical practice has been limited. Focusing on simple DTRs, collecting large and rich clinical datasets, and fostering tight partnerships between content experts and data analysts may result in more clinically relevant observational DTR studies.

Peer Review reports


The medical needs of patients with chronic or progressive conditions often evolve over time and the treatments administered to these patients need to be regularly reviewed. Treatment decisions may depend on the dynamics of a number of factors or require continual switching between different treatments. Therefore, making optimal treatment decisions requires information across many time intervals. Dynamic treatment regimens (or regimes) (DTRs) formalise the multi-stage and dynamic decision problems clinicians often face when treating chronic or progressive conditions [1,2,3,4,5]. A DTR can be thought of as a set of rules describing how treatment could be assigned in response to some dynamically changing factor, for example, treatment response.

A DTR can be defined using decision rules, functions that map each patient’s accumulated clinical and treatment history to the subsequent treatment at each treatment decision point. These rules are typically derived from parametric models. An optimal decision rule is one that optimises the long-term value of the decision, for example, expected overall survival. The values of the decision rules are estimated using statistical methods that can account for time-varying treatment effect mediation and confounding. In order for the estimated treatment effects that inform the decision rules to have a causal interpretation, a number of conditions must be met, which are summarised in the next section.

One real-world example of a decision problem that has been framed and optimised as a DTR is ‘when to begin’ antiretroviral treatment in patients with human immune-deficiency virus (HIV), which is often based on their CD4 count history [6, 7]. The decision to start a patient’s treatment may not be appropriate if it is based only on their most recent clinical history, ignoring whether their CD4 count has been stable or not. Another real-world example of a DTR is ‘how to modify’ prophylaxis for graft-versus-host disease following stem-cell transplantation for blood cancer, when a patient may receive either the standard or an experimental prophylaxis [8, 9]. If the patient subsequently develops acute graft-versus-host disease (i.e., the allocated prophylactic treatment has not been effective) they may then receive either a standard or an experimental salvage treatment. The selection of treatment at each stage is based on a suite of time-varying disease characteristics.

Optimising DTRs relies on estimating the value of the decision rules using data from either sequential multiple assignment randomised trials (SMARTs) [1, 10,11,12], which are designed to randomise and re-randomise participants to different treatments over time conditional on their observed outcomes, or from observational sources such as cohort studies, electronic health records (EHRs), and clinical registries. Estimating optimal DTRs using SMART data provides the highest-quality evidence of regimen efficacy by reducing confounding bias through randomisation. However, SMARTs are more complex to design and implement than standard trial designs and therefore are resource intensive.

A potentially less costly and more operationally feasible alternative is to emulate a ‘target trial’ using existing observational data [6, 13, 14]. However, without treatment randomisation, the causal relationships between the covariates, treatments, outcomes must be carefully considered, and in particular, it is necessary that all relevant confounders are measured to obtain unbiased estimates of the causal effects of interest [1, 14]. Nevertheless, observational data has several potential advantages over trial data. For example, without rigid inclusion criteria and control protocols, observational data may better reflect the heterogeneity of both patient populations and treatment implementation, which may allow a broader range of treatment regimens to be evaluated and therefore represent actual treatment practice better than trial data. Some authors suggest that optimal DTR-based treatment decisions should be estimated using observational data, where possible, before proceeding to the relevant SMART design stage [1, 15]. Indeed for some dynamic treatment regimens, particularly for ‘when to treat’ regimens that involve delayed treatment, it may be neither feasible nor ethical to conduct a randomised trial.

The effective use of observational data to evaluate dynamic treatment decisions has the potential to provide insight into the management of chronic or progressive conditions, yet it is unclear to what extent it is done in practice. This study provides a scoping review [16] to systematically map how observational data have been used to estimate the value of DTRs in practice with the following specific aims:

  • ▪ To summarise what medical areas, participant numbers, types of outcomes, and statistical methods have been used in real-world practice.

  • ▪ To describe whether key methodological aspects of the real-world applications were considered.

  • ▪ To ascertain whether the real-world application was designed more to inform statistical or clinical practice.

The overarching aim was to identify whether any particular domains dominate the literature and why this may be so, in order to understand the potential for evidence regarding DTRs to be developed using observational data, and to identify existing gaps in the methodological quality of published studies.

The remainder of this article proceeds as follows. We first provide terminology and describe a DTR using a simple two-stage example, selected modelling and estimation approaches for DTR-based decision rules, and the necessary conditions for causal inference. We follow by describing the methods and results of the scoping review to explore the context, methods, and reporting of studies which have modelled DTRs using observational data. We follow with a summary of the results, and general discussion and concluding summary of the key concepts.

Dynamic treatment regimens

Concept and notation

A simple two-stage, two-treatment scenario that can be formalised using DTRs can be described by the following notation:

$$ {O}_1\to {A}_1\to {O}_2\to {A}_2\to Y $$

where Ok describes the set of prognostic factors available for treatment decision, Ak, and the terminal outcome, Y, and kK = {1, 2} indexes the first and second treatment stages. The accumulated history, Hk, includes all covariates and treatments preceding Ak. Therefore, in our simple example, H1 = O1 and H2 = {O1, A1, O2}. We follow standard convention and denote random variables and their observed values using upper- and lower-case letters, respectively. DTR models define decision rules dk as functions that map a patient’s history (Hk) to a certain course of action (Ak): dk(Hk) → Ak. Note that a DTR can be generalised to more than two stages and treatments, multiple covariates with different data types, and different outcome types [1]. In Fig. 1, we present a decision tree depicting many possible realised DTRs, where each Ok and Ak are binary variables.

Fig. 1

A decision tree containing several possible dynamic treatment regimens (DTRs). Shown are binary covariates (O1, O2), binary treatments (A1, A2), and a terminal outcome (Y) that is a function of patient history. The decisions that map the accumulated patient history to a treatment are represented as the functions d1(.) and d2(.)

The same two-stage scenario presented in Fig. 1 may also be described using a causal diagram or directed acyclic graph (DAG) (see Fig. 2). Causal diagrams are a graphical and intuitive way of encoding the causal assumptions that are made when considering how to analyse a problem [14, 17, 18].

Fig. 2

A dynamic treatment regimen (DTR) causal diagram. Covariates (O1, O2), treatments (A1, A2), and the outcome (Y) are each represented by a node with causal relationships shown as directed edges (arrows). Note that the edges are directional and it not possible for a node to cycle back to itself along the graph’s edges, hence it is a directed acyclic graph

Modelling dynamic treatment regimens

A suboptimal approach to estimating the value of the dynamic treatment decisions in the example two-stage scenario might be to specify an ‘all-at-once’ regression model for the outcome Y as a function of all the covariates, treatments, and various interactions among them, and to find the treatments a1 and a2 that optimise the expected value of Y (perhaps conditional on values of o1 and o2) [8]. As appealingly simple as the ‘all-at-once’ approach may seem, it may result in poor treatment decisions because the causal effects of treatment are improperly estimated for the following reasons:

  • ▪ The effect of A1 on Y can be decomposed into direct and indirect effects. If O2 is a ‘child’ of A1 (i.e., the value of O2 is influenced by A1), including O2 (a treatment-outcome confounder) as a model covariate blocks the indirect effect of A1 on Y, as seen in Fig. 3, attenuating the estimated treatment effect of A1. In the language of causal inference, we say that O2 mediates the effect of A1 on Y.

  • ▪ Even if O2 were not a mediator of A1, or treatment (A2)-outcome (Y) confounder, including O2 as a model covariate could induce collider stratification bias in the presence of unmeasured covariate (O2)-outcome confounders (Y) as seen in Fig. 4.

Fig. 3

Causal diagram demonstrating effect mediation. Note: a Indirect effects of A1 → Y (dashed) mediated by conditioning on O2 (boxed). b Direct effect of A1 → Y (dotted) is not affected

Fig. 4

Causal diagram demonstrating collider stratification bias. Note: a O2 does not mediate O1, and unmeasured confounders (U) and O1 are unrelated. b conditioning on O2 (boxed) may induce collider stratification bias (dashed) between A1 and U

Because standard regression methods fail to account for the complexities inherent in DTRs, more sophisticated statistical methods are required. The exact methodology employed often depends on, and is tailored to, the clinical question of interest. The typical approach is to specify and estimate either a dynamic conditional model or a dynamic marginal structural model (MSM).

A dynamic conditional model defines the average effects of treatments conditional on patient history as target parameters for estimation. The estimated effects can therefore be considered to be personalised in that it is defined only for patients who have the same histories. To account for the effect mediation and biases depicted in Figs. 3 and 4, dynamic conditional models typically specify the treatment effects on a stage-by-stage basis. Estimating the treatment effects in dynamic conditional models often proceeds using Q-learning [19, 20], the parametric G-formula [1, 14, 21], or G-estimation [3, 22].

A dynamic MSM defines the average treatment effects of following different regimens as the target parameters for estimation. Key to this approach is identifying that many individuals will have histories that are, at least in part, compatible with several regimens. Approaches that use dynamic MSMs rely on creating, for each candidate regimen, replicates of the original data where individuals are artificially censored if they no longer follow the candidate regimen and aim to estimate the treatment effect of the candidate regimen while balancing prognostic factors among the treatment groups using inverse probability weighting (IPW) [1, 4, 14, 23, 24].

Although estimation methods such as Q-learning or IPW typically use relatively simple generalised linear models (for example linear and logistic regression), other estimation methods using the parametric G-formula or G-estimation may require complex estimating equations and/or large sets of models. In all cases, estimation performance can be sensitive to model misspecification, particularly when using the parametric G-formula which tends to use many interrelated models [1]. Although bias can be minimised through the use of ‘doubly-robust’ estimators—which produce unbiased estimates if at least one of the treatment or outcome submodels is correctly specified—there are efficiency gains to be made when both submodels are correctly specified [1]. Therefore, principled model selection, evaluation, and sensitivity analysis methods are highly recommended to mitigate the risk of model misspecification. Furthermore, as with most longitudinal data, missingness is often an additional source of bias, and principled approaches to handle missing data should also be used.

Causal assumptions

Several conditions must be met for the estimated DTR effects to have a causal interpretation [1, 14]. This is true whether using data from studies with a SMART design or observational data. Broadly, the key necessary conditions for causal inference can be summarised as exchangeability, consistency, and positivity. These conditions require that there are no unmeasured confounders (exchangeability), well-defined treatments (consistency), and that the probability of receiving each treatment regimen of interest is greater than zero for each patient included in the analysis (positivity). A complete and rigorous description of these assumptions is beyond the scope of this review, however Hernán and Robins [14] provide an accessible explanation of these conditions, and Chakraborty and Moodie [1] formalise each condition in the context of DTRs.



The review protocol was developed by RKM and JAS in consultation with the co-authors. The original version of the protocol, along with the changes to the protocol, is available as an additional file (see Additional file 1).

Eligibility criteria

To be included in the review, studies must have used statistical methods to estimate the value of DTR decision rules from observational data, either as a demonstration of the methodology or to provide real-world evidence to support specific treatment policies. Statistical methods were defined in this context as any method that fits a parametric, semi-parametric, or non-parametric statistical model to data using methods such as maximum likelihood estimation or estimating equations. This definition was broad enough to encompass most conceivable data analytical methods, including methods that are traditionally less aligned with biostatistical and epidemiological fields (for example, methods using artificial intelligence). Observational data were defined as any non-simulated data where the treatments of interest were not randomly allocated. No restriction was placed on study time period, publication type, statistical method, outcome types, sample size, country of origin, or participant characteristics.

Studies were excluded from this review if they met any of the following criteria:

  • ▪ only analysed data from experimental studies where the treatment/s were randomised (including SMART designs and other randomised trials),

  • ▪ analysed simulated data or provided theoretical discussion only,

  • ▪ provided a commentary, review, opinion, protocol, or description only,

  • ▪ either the abstract or full-text were not available,

  • ▪ analysed data from non-human subjects only,

  • ▪ studies were not available in the English language, or

  • ▪ did not use statistical methods to evaluate a DTR using observational data, for example provided only a graphical or textual description of the data.

Information sources

To identify potentially relevant studies the electronic bibliographic database PubMed was searched on 8 October 2020. The reference lists of the included articles identified from the PubMed database were manually screened to identify additional relevant studies. Grey literature, unindexed journals, and trial registries were not searched.

Search strategy

The search strategy was developed by RM and JAS, with input from all co-authors, and in consultation with the University of Melbourne Library. The electronic PubMed search strategy is described in Table 1.

Table 1 PubMed search terms

Selection of sources of evidence

RKM performed the search of the PubMed database, screened the titles and abstracts returned by the search, and reviewed the full text of all potentially eligible studies that satisfied the selection criteria for eligibility. Excluded studies were categorised by primary reason for exclusion. Titles and abstracts from each bibliography item of the included PubMed articles were also screened (not including books/book chapters, clinical guidelines, in proceedings, manuals/technical reports, software, posters, in press/submitted, theses, trial registries, or working papers), and all studies that satisfied the selection criteria for eligibility were included in the data synthesis.

Data items

The data extracted from each article included reference details, study characteristics, data type, statistical methods, and whether the study was primarily intended to inform statistical or clinical practice (as defined below, see Table 2). The data extraction items were initially piloted by RKM and JAS for a subset of six articles and refined in consultation with the co-authors. Note that a methodological study typically aims to extend an existing method, present a novel method, or demonstrate the application of an existing method in a novel way. These studies typically involve a precise mathematical description of the method under investigation, demonstration of the statistical properties of the method either analytically or using computer simulation, and often include a highly stylised application of the method with real-world data. In contrast, a clinical study applying a statistical method to investigate a clinical research question typically involves collecting real-world data (either prospectively or retrospectivity), applying a validated statistical method to the data to address the clinical research question, and interpreting the results in a way that they might be used to inform either clinical practice or future clinical research. Although the boundary between clinical and methodological studies is at times unclear, in general, the category a study belongs to can be clearly identified by its aims, journal, mathematical density, and tone of the discussion.

Table 2 Data extraction items

Data extraction

Data on the fields listed in Table 1 were extracted using a standardised form (in Microsoft Excel) for each article by RKM and confirmed by a second reviewer (JAS or MM) for approximately 10% of the included articles. Any differences in extracted data fields were resolved by consensus between RKM and the second reviewer.

Synthesis of results

The extracted data was explored using narrative synthesis and summarised using descriptive statistics. Studies were compared between subgroups defined by the primary focus of the study (clinical vs methodological). All data management and analysis was performed using the R programming language [25].


The initial search returned 209 studies. Of these, 156 (75%) were excluded following screening of titles and abstracts. Upon reviewing the full-texts for eligibility, 37 studies were included from the PubMed database and a further 26 studies were identified from the PubMed article reference lists. In total, 63 studies were included in the data synthesis [3, 4, 6,7,8,9, 24, 26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81]. The flow chart of study selection is presented in Fig. 5. A summary of the data synthesis is provided in Table 3, and the extracted data for each study are provided in an additional file (see Additional file 2). Of the seven included studies which were reviewed by a second author, there was disagreement on a single data item that was resolved by consensus.

Fig. 5

Search strategy flowchart. Note: DTR dynamic treatment regimen

Table 3 Descriptive summary of the characteristics of the included studies

The estimation of optimal DTRs using observational data is a recent development and has been most concentrated in in the area of HIV/AIDS (27, 43%), followed by cancer (8, 13%), and diabetes (6, 10%). All but three of the included studies were published after 2005, with all but nine in the last decade and almost half (25, 45%) in the last 5 years.

Outcome types, participant numbers, and funding sources varied considerably between the included studies. Time-to-event outcomes were most commonly investigated (36, 57%). The median number of participants was 3882 with an interquartile range (IQR) between 1420 and 23,602, and the total range between 133 and 218,217. Studies were funded mostly through public sources (51, 81%), with some studies acknowledging non-profit sources (6, 10%). Ten (16%) studies did not report on funding sources.

All of the common statistical approaches that we have described were implemented, yet there was a lack of transparency regarding some of the specific methodological approaches used across many studies. IPW-related methods were the most commonly used (26, 41%), followed by parametric G-formula related methods (16, 25%), Q-learning related methods (10, 16%), G-estimation (4, 6%), targeted maximum likelihood/minimum loss-based estimation (4, 6%), regret regression (3, 5%), and other less common approaches (10, 16%). Many studies did not clearly and explicitly describe the methods that they employed for either missing data (32, 51%), model evaluation (30, 48%), model selection (34, 54%), or model sensitivity (38, 60%), and only eight studies described all four methodological approaches. The studies that published statistical software code relevant to their analyses (21, 33%) provided it for either R (18, 29%) or SAS (3, 5%) only.

Eighteen (29%) studies had a clear primary focus of informing clinical practice. The remaining 45 (71%) of studies used observational data only to illustrate the application of statistical methodology [40, 79]. The median sample size of clinical studies was 9793 participants (IQR: 3084, 39,887), considerably higher than that of methodological studies (median: 2604, IQR: 710, 13,039). Compared to methodological studies, clinical studies were likely to focus on HIV/AIDS (12, 67% vs 15, 33%), time-to-event outcomes (13, 72% vs 23, 51%), and statistical models that used IPW (9, 50% vs 17, 38%) or the parametric G-formula (8, 44% vs 8, 18%). Although methodological and clinical studies described missing data and model evaluation methods in approximately equal proportions, a much greater proportion of clinical studies described their methods for model selection (15, 83% vs 19, 42%) and sensitivity analysis (15, 83% vs 23, 51%). Only one clinical study included statistical computing code used for analysis.


This review provided a summary of how DTRs can be modelled and an overview of how observational data have been used to estimate optimal DTRs. There was substantial variation in the scope, intent, complexity, quality, and statistical methodology between the 63 included studies.

DTR models are often necessary when formalising decisions about how best to treat chronic or progressive conditions to properly account for time-varying treatment confounding and mediation. A number of different statistical approaches can be used—including IPW, Q-learning, the parametric G-formula, G-estimation, or targeted maximum likelihood/minimum-loss based estimation—depending on the DTR model used and the nature of the research question. Almost all clinical studies used either IPW or the parametric G-formula methods, possibly because these methods are relatively well-established, less complex, and suited to simpler decision problems such as those encountered in HIV/AIDS treatment. Unsurprisingly, the included methodological studies were more diverse in the methods that they used and tended to detail model selection and sensitivity analyses less often. Encouragingly, this review found that many included studies often dealt with clinically relevant but complicated time-to-event outcomes.

Evaluation of dynamic treatment regimens was first described in 1987 by Robins [21], but most of the studies included in this review were published in the last 10 to 15 years, perhaps because both the methodology has matured and formal causal inference methods in epidemiology have become more established. Two-thirds of the clinical studies and one-third of methodological studies focused on HIV/AIDS, most likely because of the chronic and progressive nature of HIV infection and AIDS for which treatments are often dynamically, if informally, adapted to patient history.

Compared to randomised controlled trials, using observational data to estimate DTRs may allow researchers to both take advantage of the economics of using existing data and also evaluate a wider range of treatments. Despite this, the majority of included studies did not have a clinical focus. Of the clinical studies, most focused on HIV/AIDS, and analysed large datasets using either IPW or parametric G-formula methods to answer relatively simple questions. This result provides insight into the type and scale of resources, and research questions, that may give rise to feasible observational DTR studies.

The majority of studies were methodological investigations and typically included a simplified real-world application only. Many of the included methodological articles involved methods and results that were based on complex estimating equations and/or Monte Carlo simulations which, although no doubt critical for the advancement of the DTR methodology, may be difficult for clinical readers to interpret. It is likely that user-friendly software would make implementing the complex methods easier for clinicians and methodologists alike. Although almost half of the methodological studies included some form of statistical software code related to their methods, which may encourage the application of complex DTR methods, in general this software is not readily usable by non-experts. Furthermore, many studies did not describe the real-world applications or include details of the statistical methods and corresponding assumptions in detail, which may limit how the DTR methods and results are translated in practice.

We posit that the limited number of clinically relevant examples of optimised DTRs using observational data is because of the need to satisfy three conditions necessary for estimating causal treatment effects: exchangeability, positivity, and consistency. These conditions, required for valid causal inference, cannot be verified from the data alone and require judgement on biological plausibility.

To meet the exchangeability condition, explicit causal relationships must be considered by content experts to identify confounders and the confounder data must be available. Developing a causal model requires both clinical expertise and statistical knowledge to codify such expertise using the causal inference framework. Although the use of DAGs can streamline this process, it still requires substantial investment in learning and collaboration by both content experts and data analysts, particularly if multiple plausible causal models are developed to assess sensitivity of conclusions. Even when it is feasible to fully develop a plausible set of causal models it is not guaranteed that confounder information will be available, particularly when working with retrospectively collected data or electronic health records, which are often designed around clinical practice rather than for research purposes. It is worth noting that fewer than 50% of studies did not describe the model selection process in any way.

To ensure causal effects can be estimated, the positivity condition must be met. This requires that all regimens of interest are followed by at least some (and, in practical terms, many) patients for each potential combination of predictors and outcomes. Large clinical databases, and questions about non-rare medical conditions, are likely to be required for there to be sufficient numbers such that the positivity condition holds. We note that many of the clinical studies that we identified in this review used either very large EHR databases or data from large multinational collaborations, and focused on a relatively prevalent medical condition. Even with large clinical databases, structural factors such as clinical, regulatory, or reimbursement guidelines may completely prevent treatment sequences of interest (not to mention relevant patient histories) from being observed.

The consistency condition requires that treatments, and therefore potential outcomes under treatments, are sufficiently well-defined, which may be a difficult condition to meet for conditions where there are many different treatment modalities. A related point is that in clinical areas with rapid and continual treatment innovation the clinical paradigm may change so rapidly that DTRs modelled using data from observational cohort studies or EHRs, with patient treatment histories over a long time period, are less relevant to informing clinical practice. For example, management of many cancers often involves several consecutive lines of treatment following disease progression and determining the optimal sequence of treatments is an open area of research in modern oncology. But new cancer treatments and changing clinical paradigms often dramatically change the treatment landscape, which results in substantial variation in clinical practice. Over time, treatments become less well-defined, and it becomes difficult to satisfy the consistency condition.

Although we are satisfied that our scoping review provides a representative sample of the literature there are some limitations worth noting. Our exclusive focus on the PubMed database excludes any studies not indexed therein. We made this choice early on in the design process on the basis of our broad aims, the ‘scoping’ nature of our review, and also to simplify the review and make it as reproducible and transparent as possible. We note that searching the reference lists of the included PubMed articles served as a practical workaround of the limitation arising from using a single database. Further, the search strategy included only common phrases, and their variants, to capture both DTRs and observational data. There may be variants that we have missed, or there may be ad hoc implementations that use entirely different naming conventions or combinations thereof, although we note that the nomenclature concerning dynamic treatment regimens is relatively well-established in the literature.


Using observational data to model DTRs is a modern and methodologically principled approach to evaluating dynamic treatment decisions. There is great potential in using DTR models with existing observational data to support dynamic treatment decisions that improve patient outcomes, particularly where the relevant clinical trial is not feasible. Yet the use of observational DTR studies to inform clinical practice has been relatively limited, primarily because the underlying conditions that are necessary for causal inference are difficult to satisfy. Developing new methods that enable these conditions to be satisfied may more broadly enable additional and more diverse observational DTR studies. Our review suggests that the currently available methods are most likely to find feasible applications for relatively simple dynamic clinical decisions, either for simple treatment sequences or ‘when to treat’ type questions, where there are numerous and rich clinical data, where treatments can be well-defined, in clinical areas with slowly evolving treatment paradigms, and where content experts and data analysts work in tight partnership.

Availability of data and materials

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.



Acquired immune deficiency syndrome


Directed acyclic graph


Dynamic treatment regimen


Human immunodeficiency virus


Inverse probability weighted


Interquartile range


Marginal structural model


Sequential multiple assignment randomised trial


Targeted maximum-likelihood/minimum-loss based estimation


  1. 1.

    Chakraborty B, Moodie EEM. Statistical methods for dynamic treatment regimes. New York: Springer; 2013. (Statistics for Biology and Health)

    Book  Google Scholar 

  2. 2.

    Chakraborty B, Murphy SA. Dynamic treatment regimes. Annu Rev Stat Its Appl. 2014;1(1):447–64.

    Article  Google Scholar 

  3. 3.

    Murphy SA. Optimal dynamic treatment regimes. J R Stat Soc B. 2003;62(2):331–66.

    Article  Google Scholar 

  4. 4.

    Murphy SA, van der Laan MJ, Robins JM. Conduct problems prevention research group. Marginal mean models for dynamic regimes. J Am Stat Assoc. 2001;96(456):1410–23.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  5. 5.

    Lavori PW, Dawson R. Adaptive treatment strategies in chronic disease. Annu Rev Med. 2008;59(1):443–53.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Cain LE, Saag MS, Petersen M, May MT, Ingle SM, Logan R, et al. Using observational data to emulate a randomized trial of dynamic treatment-switching strategies: an application to antiretroviral therapy. Int J Epidemiol. 2016;45(6):2038–49.

    PubMed  Article  Google Scholar 

  7. 7.

    Cain LE, Robins JM, Lanoy E, Logan R, Costagliola D, Hernán MA. When to start treatment? A systematic approach to the comparison of dynamic regimes using observational data. Int J Biostat. 2010;6(2):18.

    PubMed Central  Article  PubMed  Google Scholar 

  8. 8.

    Krakow EF, Hemmer M, Wang T, Logan B, Arora M, Spellman S, et al. Tools for the precision medicine era: how to develop highly personalized treatment recommendations from cohort and registry data using Q-learning. Am J Epidemiol. 2017;186(2):160–72.

    PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Moodie EEM, Stephens DA, Alam S, Zhang M-J, Logan B, Arora M, et al. A cure-rate model for Q-learning: estimating an adaptive immunosuppressant treatment strategy for allogeneic hematopoietic cell transplant patients. Biom J. 2019;61(2):442–53.

    PubMed  Article  Google Scholar 

  10. 10.

    Murphy SA. An experimental design for the development of adaptive treatment strategies. Stat Med. 2005;24(10):1455–81.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  11. 11.

    Lavori PW, Dawson R. A design for testing clinical strategies: biased adaptive within-subject randomization. J R Stat Soc Ser A Stat Soc. 2000;163(1):29–38.

    Article  Google Scholar 

  12. 12.

    Lavori PW, Dawson R. Dynamic treatment regimes: practical design considerations. Clin Trials. 2004;1(1):9–20.

    PubMed  Article  PubMed Central  Google Scholar 

  13. 13.

    Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183(8):758–64.

    PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Hernán MA, Robins JM. Causal inference. 2017 [cited 2019 Jul 24]. Available from:

    Google Scholar 

  15. 15.

    Wallace MP, Moodie EEM. Personalizing medicine: a review of adaptive treatment strategies. Pharmacoepidemiol Drug Saf. 2014;23(6):580–5.

    PubMed  Article  PubMed Central  Google Scholar 

  16. 16.

    Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73.

    PubMed  PubMed Central  Article  Google Scholar 

  17. 17.

    Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  18. 18.

    Pearl J. An introduction to causal inference. Int J Biostat. 2010;6(2):1–59.

    Article  Google Scholar 

  19. 19.

    Watkins CJCH, Dayan P. Q-learning. Mach Learn. 1992;8(3):279–92.

    Google Scholar 

  20. 20.

    Murphy SA. A generalization error for Q-learning. J Mach Learn Res. 2005;6:1073–97.

    PubMed  PubMed Central  Google Scholar 

  21. 21.

    Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7(9–12):1393–512.

    Article  Google Scholar 

  22. 22.

    Robins JM. Optimal structural nested models for optimal sequential decisions. In: Proceedings of the second Seattle symposium in biostatistics. New York: Springer New York; 2004. p. 189–326.

    Chapter  Google Scholar 

  23. 23.

    Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550–60.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  24. 24.

    Hernan MA, Lanoy E, Costagliola D, Robins JM. Comparison of dynamic treatment regimes via inverse probability weighting. Basic Clin Pharmacol Toxicol. 2006;98(3):237–42.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  25. 25.

    R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2019. Available from:

    Google Scholar 

  26. 26.

    Arjas E, Saarela O. Optimal dynamic regimes: presenting a case for predictive inference. Int J Biostat. 2010;6(2):10.

    PubMed Central  Article  Google Scholar 

  27. 27.

    Barrett JK, Henderson R, Rosthøj S. Doubly robust estimation of optimal dynamic treatment regimes. Stat Biosci. 2014;6(2):244–60.

    PubMed  Article  PubMed Central  Google Scholar 

  28. 28.

    Boatman JA, Vock DM. Estimating the causal effect of treatment regimes for organ transplantation. Biometrics. 2018;74(4):1407–16.

    PubMed  Article  PubMed Central  Google Scholar 

  29. 29.

    Cain LE, Logan R, Robins JM, Sterne JA, Sabin C, Bansi L, Justice A, Goulet J, van Sighem A, de Wolf F, Bucher HC, von Wyl V, Esteve A, Casabona J, del Amo J, Moreno S, Seng R, Meyer L, Pérez-Hoyos S, Muga R, Lodi S, Lanoy E, Costagliola D, Hernán MA (HIV-CAUSAL Collaboration). When to initiate combined antiretroviral therapy to reduce mortality and AIDS-defining illness in HIV-infected persons in developed countries: an observational study. Ann Intern Med. 2011;154(8):509–15.

  30. 30.

    Cole SR, Li R, Anastos K, Detels R, Young M, Chmiel JS, et al. Accounting for leadtime in cohort studies: evaluating when to initiate HIV therapies. Stat Med. 2004;23(21):3351–63.

    PubMed  Article  Google Scholar 

  31. 31.

    Edwards JK, Cole SR, Moore RD, Mathews WC, Kitahata M, Eron JJ. Sensitivity analyses for misclassification of cause of death in the parametric G-formula. Am J Epidemiol. 2018;187(8):1808–16.

    PubMed  PubMed Central  Article  Google Scholar 

  32. 32.

    Edwards JK, Cole SR, Westreich D, Mugavero MJ, Eron JJ, Moore RD, et al. Age at entry into care, timing of antiretroviral therapy initiation, and 10-year mortality among HIV-seropositive adults in the United States. Clin Infect Dis. 2015;61(7):1189–95.

    PubMed  PubMed Central  Article  Google Scholar 

  33. 33.

    Jonsson-Funk M, Fusco JS, Cole SR, Thomas JC, Porter K, Kaufman JS, et al. Timing of HAART initiation and clinical outcomes in human immunodeficiency virus type 1 seroconverters. Arch Intern Med. 2011;171(17):1560–9.

    Article  Google Scholar 

  34. 34.

    Garcia-Albeniz X, Chan JM, Paciorek A, Logan RW, Kenfield SA, Cooperberg MR, et al. Immediate versus deferred initiation of androgen deprivation therapy in prostate cancer patients with PSA-only relapse. An observational follow-up study. Eur J Cancer. 2015;51(7):817–24.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Guan Q, Reich BJ, Laber EB, Bandyopadhyay D. Bayesian nonparametric policy search with application to periodontal recall intervals. J Am Stat Assoc. 2020;115(531):1066–78.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  36. 36.

    Henderson R, Ansell P, Alshibani D. Regret-regression for optimal dynamic treatment regimes. Biometrics. 2010;66(4):1192–201.

    PubMed  Article  PubMed Central  Google Scholar 

  37. 37.

    Hu L, Hogan JW. Causal comparative effectiveness analysis of dynamic continuous-time treatment initiation rules with sparsely measured outcomes and death. Biometrics. 2019;75(2):695–707.

    PubMed  Article  PubMed Central  Google Scholar 

  38. 38.

    Huang B, Qiu T, Chen C, Zhang Y, Seid M, Lovell D, et al. Timing matters: real-world effectiveness of early combination of biologic and conventional synthetic disease-modifying antirheumatic drugs for treating newly diagnosed polyarticular course juvenile idiopathic arthritis. RMD Open. 2020;6(1):e001091.

    PubMed  PubMed Central  Article  Google Scholar 

  39. 39.

    Huang X, Ning J. Analysis of multi-stage treatments for recurrent diseases. Stat Med. 2012;31(24):2805–21.

    PubMed  PubMed Central  Article  Google Scholar 

  40. 40.

    Johnson KW, Glicksberg BS, Hodos RA, Shameer K, Dudley JT. Causal inference on electronic health records to assess blood pressure treatment targets: an application of the parametric g formula. In: Proceedings of the Pacific symposium on Biocomputing, January 3–7, 2018. Big Island: World Scientific Publishing Company; 2018. p. 180–91.

    Google Scholar 

  41. 41.

    Kitahata MM, Gange SJ, Abraham AG, Merriman B, Saag MS, Justice AC, et al. Effect of early versus deferred antiretroviral therapy for HIV on survival. N Engl J Med. 2009;360(18):1815–26.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Kreif N, Sofrygin O, Schmittdiel JA, Adams AS, Grant RW, Zhu Z, et al. Exploiting nonsystematic covariate monitoring to broaden the scope of evidence about the causal effects of adaptive treatment strategies. Biometrics. 2020. Available from:

  43. 43.

    Lavori PW, Dawson R, Mueller TB. Causal estimation of time-varying treatment effects in observational studies: application to depressive disorder. Stat Med. 1994;13(11):1089–100.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  44. 44.

    Li Z, Valenstein M, Pfeiffer P, Ganoczy D. A global logrank test for adaptive treatment strategies based on observational studies. Stat Med. 2014;33(5):760–71.

    PubMed  Article  PubMed Central  Google Scholar 

  45. 45.

    Liu N, Liu Y, Logan B, Xu Z, Tang J, Wang Y. Learning the dynamic treatment regimes from medical registry data through deep Q-network. Sci Rep. 2019;9(1):1495.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  46. 46.

    Liu Y, Logan B, Liu N, Xu Z, Tang J, Wang Y. Deep reinforcement learning for dynamic treatment regimes on medical registry data. In: Proceedings of 2017 IEEE international conference on healthcare informatics, 23–26 august, 2017. Park City: Institute of Electrical and Electronics Engineers; 2017. p. 380–5.

    Google Scholar 

  47. 47.

    Lodi S, Phillips A, Logan R, Olson A, Costagliola D, Abgrall S, et al. Comparative effectiveness of immediate antiretroviral therapy versus CD4-based initiation in HIV-positive individuals in high-income countries: observational cohort study. Lancet HIV. 2015 Aug;2(8):e335–43.

    PubMed  Article  PubMed Central  Google Scholar 

  48. 48.

    Lu X, Johnson BA. Direct estimation for adaptive treatment length policies: methods and application to evaluating the effect of delayed PEG insertion. Biometrics. 2017;73(3):981–9.

    PubMed  Article  Google Scholar 

  49. 49.

    Moodie EEM, Richardson TS, Stephens DA. Demystifying optimal dynamic treatment regimes. Biometrics. 2007;63(2):447–55.

    CAS  PubMed  Article  Google Scholar 

  50. 50.

    Moore KL, Neugebauer R, van der Laan MJ, Tager IB. Causal inference in epidemiological studies with strong confounding. Stat Med. 2012;31(13):1380–404.

    PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Nabi R, Kanki P, Shpitser I. Estimation of personalized effects associated with causal pathways. In: Proceedings of the thirty-fourth conference on uncertainty in artificial intelligence Aug 6–10, 2018. Monterey: AUAI Press; 2018. p. 673–82.

    Google Scholar 

  52. 52.

    Neugebauer R, Schmittdiel JA, van der Laan MJ. A case study of the impact of data-adaptive versus model-based estimation of the propensity scores on causal inferences from three inverse probability weighting estimators. Int J Biostat. 2016;12(1):131–55.

    PubMed  PubMed Central  Article  Google Scholar 

  53. 53.

    Neugebauer R, Fireman B, Roy JA, O’Connor PJ, Selby JV. Dynamic marginal structural modeling to evaluate the comparative effectiveness of more or less aggressive treatment intensification strategies in adults with type 2 diabetes. Pharmacoepidemiol Drug Saf. 2012;21(S2):99–113.

    PubMed  Article  PubMed Central  Google Scholar 

  54. 54.

    Neugebauer R, Fireman B, Roy JA, O’Connor PJ. Impact of specific glucose-control strategies on microvascular and macrovascular outcomes in 58,000 adults with type 2 diabetes. Diabetes Care. 2013;36(11):3510–6.

    PubMed  PubMed Central  Article  Google Scholar 

  55. 55.

    Neugebauer R, Schmittdiel JA, van der Laan MJ. Targeted learning in real-world comparative effectiveness research with time-varying interventions. Stat Med. 2013;33(14):2480–520.

    Article  Google Scholar 

  56. 56.

    Petersen M, Schwab J, Gruber S, Blaser N, Schomaker M, van der Laan M. Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models. J Causal Inference. 2014;2(2):147–85.

    PubMed  PubMed Central  Article  Google Scholar 

  57. 57.

    Petersen ML, Deeks SG, van der Laan MJ. Individualized treatment rules: generating candidate clinical trials. Stat Med. 2007;26(25):4578–601.

    PubMed  PubMed Central  Article  Google Scholar 

  58. 58.

    Petersen ML, van der Laan MJ, Napravnik S, Eron JJ, Moore RD, Deeks SG. Long-term consequences of the delay between virologic failure of highly active antiretroviral therapy and regimen modification. AIDS. 2008;22(16):2097–106.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  59. 59.

    Rosthøj S, Fullwood C, Henderson R, Stewart S. Estimation of optimal dynamic anticoagulation regimes from observational data: a regret-based approach. Stat Med. 2006;25(24):4197–215.

    PubMed  Article  PubMed Central  Google Scholar 

  60. 60.

    Schomaker M, Luque-Fernandez MA, Leroy V, Davies MA. Using longitudinal targeted maximum likelihood estimation in complex settings with dynamic interventions. Stat Med. 2019;38(24):4888–911.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. 61.

    Schomaker M, Davies M-A, Malateste K, Renner L, Sawry S, N’Gbeche S, et al. Growth and mortality outcomes for different antiretroviral therapy initiation criteria in children aged 1-5 years: a causal modelling analysis. Epidemiology. 2015;27(2):237–46.

    Google Scholar 

  62. 62.

    Schomaker M, Leroy V, Wolfs T, Technau K-G, Renner L, Judd A, et al. Optimal timing of antiretroviral treatment initiation in HIV-positive children and adolescents: a multiregional analysis from southern Africa, West Africa and Europe. Int J Epidemiol. 2017;46(2):453–65.

    PubMed  PubMed Central  Google Scholar 

  63. 63.

    Schomaker M, Egger M, Ndirangu J, Phiri S, Moultrie H, Technau K, et al. When to start antiretroviral therapy in children aged 2–5 years: a collaborative causal modelling analysis of cohort studies from southern Africa. PLoS Med. 2013;10(11):e1001555.

    PubMed  PubMed Central  Article  Google Scholar 

  64. 64.

    Shen J, Wang L, Taylor JMG. Estimation of the optimal regime in treatment of prostate cancer recurrence from observational data using flexible weighting models. Biometrics. 2017;73(2):635–45.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  65. 65.

    Shepherd BE, Liu Q, Mercaldo N, Jenkins CA, Lau B, Cole SR, et al. Comparing results from multiple imputation and dynamic marginal structural models for estimating when to start antiretroviral therapy. Stat Med. 2016;35(24):4335–51.

    PubMed  PubMed Central  Article  Google Scholar 

  66. 66.

    Shepherd BE, Jenkins CA, Rebeiro PF, Stinnette SE, Bebawy SS, McGowan CC, et al. Estimating the optimal CD4 count for HIV-infected persons to start antiretroviral therapy. Epidemiology. 2010;21(5):698–705.

    PubMed  PubMed Central  Article  Google Scholar 

  67. 67.

    Simoneau G, Moodie EEM, Azoulay L, Platt RW. Adaptive treatment strategies with survival outcomes: an application to the treatment of type 2 diabetes using a large observational database. Am J Epidemiol. 2020;189(5):461–9.

    PubMed  Article  PubMed Central  Google Scholar 

  68. 68.

    Simoneau G, Moodie EEM, Nijjar JS, Platt RW. Scottish early rheumatoid arthritis inception cohort Inv. estimating optimal dynamic treatment regimes with survival outcomes. J Am Stat Assoc. 2020;115(531):1531–9.

    CAS  Article  Google Scholar 

  69. 69.

    Sofrygin O, Zhu Z, Schmittdiel JA, Adams AS, Grant RW, van der Laan MJ, et al. Targeted learning with daily EHR data. Stat Med. 2019;38(16):3073–90.

    PubMed  Article  PubMed Central  Google Scholar 

  70. 70.

    Sterne JAC, May M, Costagliola D, de Wolf F, Phillips AN, Harris R, Jönsson Funk M, Geskus RB, Gill J, Dabis F, Miró JM, Justice AC, Ledergerber B, Fätkenheuer G, Hogg RS, D'arminio Monforte A, Saag M, Smith C, Staszewski S, Egger M, Cole SR (When To Start Consortium). Timing of initiation of antiretroviral therapy in AIDS-free HIV-1-infected patients: a collaborative analysis of 18 HIV cohort studies. Lancet. 2009;373(9672):1352–63.

  71. 71.

    Tao Y, Wang L. Adaptive contrast weighted learning for multi-stage multi-treatment decision-making. Biometrics. 2017;73(1):145–55.

    PubMed  Article  Google Scholar 

  72. 72.

    Taubman SL, Robins JM, Mittleman MA, Hernán MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. Int J Epidemiol. 2009;38(6):1599–611.

    PubMed  PubMed Central  Article  Google Scholar 

  73. 73.

    van der Laan MJ, Petersen ML. Statistical learning of origin-specific statically optimal individualized treatment rules. Int J Biostat. 2007;3(1):6.

    Google Scholar 

  74. 74.

    van Geloven N, Balan TA, Putter H, le Cessie S. The effect of treatment delay on time-to-recovery in the presence of unobserved heterogeneity. Biom J. 2020;62(4):1012–24.

    PubMed  PubMed Central  Article  Google Scholar 

  75. 75.

    Wallace MP, Moodie EEM, Stephens DA. Reward ignorant modeling of dynamic treatment regimes. Biom J. 2018;60(5):991–1002.

    PubMed  Article  PubMed Central  Google Scholar 

  76. 76.

    Wang S, Moodie EE, Stephens DA, Nijjar JS. Adaptive treatment strategies for chronic conditions: shared-parameter G-estimation with an application to rheumatoid arthritis. Biostatistics. 2020. Available from:

  77. 77.

    Young JG, Cain LE, Robins JM, O’Reilly EJ, Hernán MA. Comparative effectiveness of dynamic treatment regimes: an application of the parametric G-formula. Stat Biosci. 2011;3(1):119–43.

    PubMed  PubMed Central  Article  Google Scholar 

  78. 78.

    Zajonc T. Bayesian inference for dynamic treatment regimes: mobility, equity, and efficiency in student tracking. J Am Stat Assoc. 2012;107(497):80–92.

    Article  Google Scholar 

  79. 79.

    Zhang Y, Young JG, Thamer M, Hernán MA. Comparing the effectiveness of dynamic treatment strategies using electronic health records: an application of the parametric G-formula to anemia management strategies. Health Serv Res. 2018;53(3):1900–18.

    PubMed  Article  PubMed Central  Google Scholar 

  80. 80.

    Zhang Y, Thamer M, Kaufman J, Cotter D, Hernán MA. Comparative effectiveness of two anemia management strategies for complex elderly dialysis patients. Med Care. 2014;52(3):S132–9.

    PubMed  PubMed Central  Article  Google Scholar 

  81. 81.

    Zhao Y, Zhu R, Chen G, Zheng Y. Constructing dynamic treatment regimes with shared parameters for censored data. Stat Med. 2020;39(9):1250–63.

    PubMed  PubMed Central  Article  Google Scholar 

Download references


Not applicable.


JAS acknowledges support from the National Health and Medical Research Council through a Centre of Research Excellence grant (ID 1035261) awarded to the Victorian Centre of Biostatistics (ViCBiostat), and Senior Research Fellowship (ID 1104975) awarded to JAS. BC acknowledges support from a start-up grant from Duke-NUS Medical School, Singapore. Design of the study, collection, analysis, and interpretation of data, and writing of the manuscript was done completely independently of any funding bodies.

Author information




RKM, JAS, and MJI conceived of and designed the review with assistance from MM, BC, and JBC. RKM performed the review, with JAS or MM confirming subsets of data extraction. RKM drafted the manuscript with input from all co-authors. All authors were responsible for critical revision of the manuscript and have read and approved of the final version.

Corresponding author

Correspondence to Robert K. Mahar.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that there are no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Original scoping review protocol.

Additional file 2.

Extracted data for individual studies.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mahar, R.K., McGuinness, M.B., Chakraborty, B. et al. A scoping review of studies using observational data to optimise dynamic treatment regimens. BMC Med Res Methodol 21, 39 (2021).

Download citation


  • Dynamic treatment regimens
  • Adaptive treatment policies
  • Sequential multiple assignment randomised trials
  • Observational data
  • Causal inference
  • Directed acyclic graphs