Implementation of the trial emulation approach in medical research: a scoping review

Background When conducting randomised controlled trials is impractical, an alternative is to carry out an observational study. However, making valid causal inferences from observational data is challenging because of the risk of several statistical biases. In 2016 Hernán and Robins put forward the ‘target trial framework’ as a guide to best design and analyse observational studies whilst preventing the most common biases. This framework consists of (1) clearly defining a causal question about an intervention, (2) specifying the protocol of the hypothetical trial, and (3) explaining how the observational data will be used to emulate it. Methods The aim of this scoping review was to identify and review all explicit attempts of trial emulation studies across all medical fields. Embase, Medline and Web of Science were searched for trial emulation studies published in English from database inception to February 25, 2021. The following information was extracted from studies that were deemed eligible for review: the subject area, the type of observational data that they leveraged, and the statistical methods they used to address the following biases: (A) confounding bias, (B) immortal time bias, and (C) selection bias. Results The search resulted in 617 studies, 38 of which we deemed eligible for review. Of those 38 studies, most focused on cardiology, infectious diseases or oncology and the majority used electronic health records/electronic medical records data and cohort studies data. Different statistical methods were used to address confounding at baseline and selection bias, predominantly conditioning on the confounders (N = 18/49, 37%) and inverse probability of censoring weighting (N = 7/20, 35%) respectively. Different approaches were used to address immortal time bias, assigning individuals to treatment strategies at start of follow-up based on their data available at that specific time (N = 21, 55%), using the sequential trial emulations approach (N = 11, 29%) or the cloning approach (N = 6, 16%). Conclusion Different methods can be leveraged to address (A) confounding bias, (B) immortal time bias, and (C) selection bias. When working with observational data, and if possible, the ‘target trial’ framework should be used as it provides a structured conceptual approach to observational research. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-023-02000-9.


Background
In medical research, randomised controlled trials (RCTs) are considered the gold-standard study to evaluate the effectiveness of a treatment [1].However, RCTs are sometimes not feasible due to factors such as their high cost, and even when viable, can still take too long to provide answers to inform pressing clinical and health policy decisions.In this scenario the careful analysis of observational data might provide an alternative to generate evidence to guide those decisions [2][3][4].
Observational data is a broad term that includes any patient data, health, and care information collected in non-experimental settings (e.g.RCTs) [5,6].In this paper, we make the distinction between two types of observational data: research-generated data and non-researchgenerated data (Table 1).
Accurate estimation of treatment effects from observational data is challenging.The main reason for that is the possibility of confounding of the effect of treatment on the clinical outcome(s).Unlike in RCTs, in observational studies, patients are not randomly assigned to treatment groups at baseline.Instead, each patient is prescribed a treatment by a clinician according to their demographic and clinical characteristics (e.g.gender, age, severity of illness etc.), which is likely to result in an unequal distribution of these characteristics across treatment groups.If these characteristics are also prognostic factors for the outcome(s), and hence confounders, they must be accounted for, otherwise this may result in confounding bias [13,14].
Moreover, poorly designed or ill-thought-out observational studies can result in additional issues due to misalignments in treatment initiation, eligibility, and follow-up periods, as well as loss to follow-up [4,13,15].Bias can result from a misalignment of the start of follow-up, eligibility, and treatment initiation.In a welldesigned prospective trial, baseline assessment is carried out just before random allocation to treatment, and participant follow-up starts with randomisation.In contrast, in an observational study of treatment initiation vs. no initiation, there can be a delay between start of follow-up (i.e. when the eligibility criteria are met and the study outcome(s) begin to be considered) and treatment initiation.This will result in a period of follow-up time, commonly referred to as 'immortal time' , when participants in the treated group specifically cannot have died or experienced the outcome(s) and are essentially 'immortal' .Participants in the treated group are not truly 'immortal' during this period of time; however, they must have survived it (i.e.be alive and event-free) to be initiating treatment [13,14,[16][17][18][19]. Inadequate consideration of this unexposed period of time as part of the design or analysis of the observational study, results in 'immortal time bias' [18].Loss to follow-up in observational studies can lead to selection bias since participants lost to follow-up may systematically differ from those who were not lost to follow-up in terms of their treatment status as well as prognostic variables.If this is not accounted for appropriately in the study's analysis, it may compromise its validity [3,20].
Additional complexity arises in observational studies which aim to evaluate the causal effect of a sustained treatment strategy or treatment regimen rather than that of a 'point treatment' .Treatment regimens often consist of a number of treatments that might be sustained over time, such as repeat prescriptions for human immunodeficiency virus (HIV) [21].When evaluating the causal effect of a particular treatment regimen, e.g. the causal contrast between continuously being prescribed HIV medication versus

Table 1 Sources of two different types of observational data
In this paper no distinction is made between electronic health records and electronic medical records.This table was adapted from a lecture given by Miguel Hernán [12] Abbreviations: EHRs Electronic health records, EMRs Electronic medical records

Epidemiological studies EHRs/EMRs
Data from cohort, cross-sectional, and case-control studies.
EHRs are digital records of patients' medical data.Data stored in EHRs are structured (i.e.tabular data) and unstructured (e.g.free-text in clinical notes or image reports) [7].

Patient registries National registry
A patient registry is an organised collection of uniform data to evaluate a pre-specified outcome(s) for a population with a specific disease, condition, or exposure [8].
A national registry collects uniform demographics and/or health related data on all its respective country nationals [9].

Biobanks Health insurance claims databases
A biobank collects biological samples and in-depth health information on a specific group of people [10].
A health insurance claims database collects data entered on bills (claims) by hospitals, nursing homes, etc. [11].
no prescription at all, the observed treatment histories may depart from these regimens as clinical decisions to re-prescribe drugs may depend on previous drug responses or side effects.Therefore, in such studies there may be (observable) variables such as intermediate treatment response or side effects that are (i) affected by past treatments, and (ii) drive both future treatments allocations as well as the long-term outcome.Such variables are known as 'time-varying confounders' to distinguish them from 'baseline/pretreatment confounders' .This statistical issue is often overlooked as more complex analysis methods are needed to avoid bias arising from these confounders [21,22].
In 2016 Hernán and Robins put forward a solution to avert most of those biases, that is the 'target trial' framework.This framework consists of three steps.First, clearly defining a causal question about a treatment.Second, specifying the protocol of the 'target trial' (i.e. the eligibility criteria, the treatment strategies being compared (including their start and end times), the assignment procedures, the follow-up period, the outcome(s) of interest, the causal contrast(s) of interest and a plan to estimate them without bias).In other words, the protocol of the RCT you would like to perform but cannot due to impracticality.Last, explaining how the observational data will be used to explicitly emulate it.Meticulously following this structured process step by step when planning observational studies can help prevent biases such as immortal time bias and selection bias.Avoiding confounding bias tends to be more difficult in practice.To emulate randomisation, all baseline (and where relevant time-varying) confounders must be measured.However, there is no guarantee that the observational database contains sufficient information on the confounders.Furthermore, there might be confounders that the study investigator is not aware of and therefore does not attempt to measure nor control for (i.e.unobserved confounders).Hence, successful emulation of randomisation is never guaranteed, and there is no certainty that residual confounding is not present [3].Nonetheless, the 'target trial' framework is a rigorous approach for evaluating treatment effects from observational data.
The aim of this scoping review is to identify and review all explicit attempts of trial emulations across all medical fields.This work will provide an overview of the medical fields that have been covered, the types of observational data that have been most frequently used and the statistical methods that have been employed to address the following biases: (A) confounding bias, (B) immortal time bias, and (C) potential selection bias due loss to followup, henceforth simply referred to as selection bias.

Search strategy and selection criteria
Three bibliographic databases (Embase (Ovid), Medline (Ovid) and Web of Science) were searched for studies published in English from database inception (Embase (Ovid): 1974, Medline (Ovid): 1946 and Web of Science: 1900) to February 25, 2021, using predefined search terms.These were related to concepts such as trial emulation and observational data (see file Additional file 1).
The studies' selection process consisted of two key steps.First, identifying and removing all duplicates.This was done automatically in EndNote X9 [23] and was manually checked and completed by one reviewer (GS).Next, identifying eligible studies based on their titles, abstracts and/or keywords.For a study to be considered eligible, it must explicitly mention in its title, abstract or keywords that it emulated a trial using observational data.One reviewer (GS) systematically checked each study's title, abstract and keywords.

Data extraction
One reviewer (GS) extracted the data from the studies.Only when further methodological details were necessary, the studies' supplementary materials were also checked.A custom Excel spreadsheet was used to record specific information, such as the studies' subject area, what type of observational data were used, the causal contrast(s) of interest, and the statistical methods used for analysing the primary outcome(s) and for addressing the following biases: (A) confounding bias, (B) immortal time bias and (C) selection bias (see Table 2).

Quality check
A second reviewer (AC) re-screened 100 articles (16%) and extracted data from eight out of the 38 eligible articles (21%) to assess the reliability of study selection and data extraction.There were no disagreements between the first and the second reviewer (GS and AC).

Observational data sources
Out of the 38 studies we reviewed, most used electronic health records (EHRs)/electronic medical records

Subject area
What is the study's subject area?Cardiology, Oncology, Psychiatry, Neurology, etc.

Data type
Were EHRs or EMRs data used?Yes or no.
If not, what type of data were used?Cohort study data, Patient registry data, etc.
Specify the name of the observational database.Free text.

Data structure
Were structured data used?Yes or no.
Were unstructured data used?Yes or no.
If unstructured data were used, were these manually or automatically processed?
Manually or automatically.

Eligibility criteria
What is the target population?
Free text.

Treatments
How many treatments were compared?Number of treatments.
What treatments were compared?Free text.

Outcomes
What was(were) the primary outcome(s)?Free text.

Follow-up
Was the follow-up duration pre-specified?Yes or no.

Statistical objectives
What is the estimand of interest?Causal effect of point treatment offer ('intention-to-treat effect'), causal effect of point treatment receipt ('per-protocol effect'), causal effect of treatment regimen initiation ('intention-to-treat effect') or causal effect of sustained treatment regimen ('per-protocol effect').
What was the measurement scale of the outcome(s)?Continuous, ordinal, binary, time-to-event, other.
Which effect size measure was used to quantify the causal contrast of interest?Mean difference, odds ratio, hazard ratio, other.
Which statistical method was used for analysing the primary outcome(s)?Pooled logistic regression, Cox proportional hazards model, etc.
Were sample size or statistical power calculations provided?
Yes or no.
If yes, what was determined?Power or the effect size.

Treatment assignment procedures
Were treatments administered at one point in time or sustained over time?
Point treatment or treatment regimen.
In either case have pre-initiation confounders been adjusted for?
Yes or no.
If the answer to the last question is 'yes' , what statistical method has been used for this purpose?
Inclusion of covariates in model, stratification, inverse probability of treatment weighting, propensity score methods, parametric g-formula, other, method not specified.
If treatment regimen, are the investigators interested in the effect of initiating a treatment or the effect of sustaining a treatment?Initiation or sustained treatment.
If interested in the effect of a sustained treatment, did they account for time-varying confounders?Yes or no.
If the answer to the last question is 'yes' , what statistical method has been used for this purpose?
Inverse probability of treatment weighting, parametric g-formula, other, method not specified.

Other bias handling
Was immortal-time bias addressed?
Yes or no.
If yes, how was immortal-time bias handled?Avoided at the study design stage or using the cloning technique.
Was selection bias due to loss to follow-up addressed explicitly?Yes or no.
If so, how were missing outcome data handled?Inverse probability of censoring weighting, multiple imputation, etc.
(EMRs) data (N = 12, 29%) and cohort studies data (N = 12, 29%) (see Table 3).Among those that used EHRs/ EMRs data, only Keyhani and colleagues mentioned using a natural language processing (NLP) algorithm to retrieve and extract unstructured data, i.e. 'carotid imaging results showing stenosis of less than 50% or hemodynamically insignificant stenosis' [57].Three studies (2, 3 and 36 in Table 3) used different observational data sources, and therefore the percentages were calculated out of 41 datasets rather than 38.

Causal contrast of interest
Most of the trial emulation studies we reviewed aimed to assess the causal effect of treatment initiation -the observational analogue of the intention-to-treat effect (ITT) in trials (25 out of 38 studies reviewed, with 21 out of those 25 considering the initiation of a treatment regimen rather than point treatments).Seven studies assessed the causal effect of receiving a point treatment and 15 studies compared the effect of two or more alternative sustained treatment regimens including no treatment-the observational analogue of a per-protocol (PP) effect.Nine studies (1, 4, 6, 13, 17, 18, 26, 28 and 31 in Table 4) assessed both types of causal contrasts.Most of the primary outcomes of the reviewed studies were measured on a time-to-event scale (N = 34/38, 89%).As a result, the most common effect size measure used was the hazard ratio (N = 22, 65%), which was estimated by fitting a Cox proportional hazards model (N = 14, 61%), a pooled logistic regression (N = 8, 35%) or a timeto-event Fine and Gray regression model (N = 1, 4%).One study used both a Cox proportional hazards model and a pooled logistic regression, which resulted in the calculation of percentages based on 23 datasets instead of 22 (17 in Table 4).

Handling of confounding
When estimating the observational analogue of an ITT effect, trial emulation studies used different statistical methods to adjust for baseline confounders, such as conditioning on the confounders (N = 18, 37%), propensity score methods (propensity score matching, stratification on the propensity score and adjustment based on the propensity score, etc., N = 10, 20%), and g-methods: inverse probability of treatment weighting (IPTW, N = 10, 20%), the parametric g-formula (N = 3, 6%) and doubly robust methods, i.e. targeted maximum likelihood estimation (TMLE, N = 1, 2%).Six studies (12%) used the cloning approach in combination with inverse probability of censoring weighting (IPCW), as suggested by Hernán within the context of the 'target trial' framework (3, 8, 10, 19, 29 and 38 in Table 4).Out of these six studies, four additionally conditioned on confounders in their analyses (3,8,19 and 29 in Table 4).Despite trying to adjust for confounders at the design stage, one study (2%) still relied on conditioning on those confounders in their analyses (20 in Table 4).Ten studies used more than one method, and therefore the percentages were calculated out of 49 datasets rather than 38 (3, 8, 17, 19, 20, 22, 26, 29, 30, and 33 in Table 4).
Out of the 15 studies that reported the observational analogue of the PP effect for sustained treatment strategies most used g-methods to adjust for time varying-confounding.More specifically, nine studies (60%) used IPTW, two studies (13%) used the cloning approach combined with IPCW, and an additional three studies (20%) used the parametric g-formula.For one study (7%) it was unclear which statistical method they had used (13 in Table 4).

Immortal time bias
All studies reviewed attempted to address immortal time bias.This was achieved in on one of three ways: (1) by designing studies so that participants are assigned to treatment strategies at start of follow-up based on their data available at that specific time (N = 21, 55%), (2) using the cloning approach (N = 6, 16%) or (3) by using the sequential trial emulations approach (N = 11, 29%) (Table 4).The symbol ' a ' indicates that the information is not explicitly stated and was assumed given the methodological details provided

Discussion
Out of the 38 trial emulation studies we reviewed, most concerned cardiology, infectious diseases, and oncology.Furthermore, those studies leveraged different types of observational data, predominantly EHRs/EMRs data and cohort study data.It is worth noting that among those studies that used EHRs/EMRs data, only one study mentioned using unstructured EHRs/EMRs data.However, we do not exclude the possibility of some EHRs/EMRs databases having already pre-processed and converted unstructured EHRs/EMRs data to a structured tabular format.
The reviewed trial emulation studies used conventional or more advanced statistical methods to adjust for baseline confounders when estimating the observational analog of an ITT effect.Conventional statistical methods include conditioning on the putative confounders (i.e.including the confounding variables in the statistical model), whereas more advanced statistical methods include propensity score methods and g-methods (IPTW, the parametric g-formula and TMLE).
Conversely, when estimating the observational analog of the PP effect of sustained treatment strategies, the reviewed studies used g-methods, specifically IPTW and the parametric g-formula, to account for time-varying confounders.Such more advanced statistical methods were needed because time-varying confounders can themselves be affected by prior treatment and adjusting for them using conventional statistical or propensity score methods would prevent the identification of the total causal effect of treatment.
In summary, both conventional and more advanced statistical methods can be used to adjust for confounding at baseline.However, to properly account for time-varying confounding, specific statistical methods, such as the parametric g-formula and IPTW must be used.
To address immortal time bias different approaches can be used.One common approach is to assign individuals to treatment strategies at the start of follow-up based on their data available at that specific time.Additionally, alternative approaches, such as the sequential trial emulation approach or the cloning approach, can be used.
Start of follow-up is the time when an individual meets the eligibility criteria and is assigned a treatment strategy.In some instances, however, an individual might meet the eligibility criteria at multiple times.For example, when comparing initiators and non-initiators of treatment, a non-initiator at one specific point in time might be an initiator at a subsequent point in time and meet the eligibility criteria at both time points.When that is the case, there are two unbiased options for choosing the start of follow-up.One option is to consider a single eligible time point.The other is to consider both time points and use the sequential trial emulation approach.This consists in emulating a sequence of trials, with different starts of follow-up, thereby making it possible for a non-initiator to enter a subsequent trial as an initiator if they meet all the eligibility criteria at the start of that subsequent trial.It should be noted, however, that since the same individuals might contribute to multiple emulated trials, the variance estimators must be adjusted for appropriately.Furthermore, emulating a sequence of trials is expected to yield more precise results compared to emulating a single trial, given the additional data available for analysis [3,60].
As regards the cloning approach, it is used when the treatment strategies of the individuals are unknown at baseline.It consists of three key steps for implementation.First, in the case of a trial emulation study with two treatment groups under study, if individuals cannot yet be assigned to a specific treatment strategy at baseline, two exact copies (clones) of each individual are created.One clone is assigned to one treatment group, whilst the other is assigned to the other treatment group.Next, clones are followed over time and are censored when they deviate from their assigned treatment strategy.Last, IPCW is used to account for potential selection bias resulting from censoring [14,60].Given that only clones who comply with their assigned treatment strategy are kept under study, the cloning approach only allows for the estimation of the observational analog of the PP effect in trial emulations with point treatments or sustained treatment strategies.Furthermore, the cloning approach can be used in combination with a grace period.This is a predefined time period of the follow-up during which treatment initiation can happen and its length is chosen based on real-world clinical scenarios (e.g.hospital delays before surgery).Using the grace period makes it possible to better reflect real-world clinical scenarios and can increase the number of eligible individuals from the observational database [3,14,61].In relation to confounding bias when using the cloning approach, cloning patients removes confounding at baseline.However, artificially censoring clones introduces selection bias, which is accounted for using IPCW [14,60].Nonetheless, most of the studies using the cloning approach still adjusted for confounders at baseline.
In summary, different strategies can be used to address immortal time bias, assigning individuals to treatment strategies at baseline based on their data available at that specific time; using the sequential trial emulations approach or the cloning approach.
Potential selection bias resulting from loss to follow-up was primarily accounted for using IPCW.Other methods include complete case analysis, the parametric g-formula, TMLE, multiple imputation, last observation carried forward, and non-responder imputation.
As a general remark, it should be noted that not all trial emulation studies we reviewed have mentioned explicitly using the 'target trial' framework, or if they did use it, have not reported the use of it clearly.Those that did use the 'target trial' framework tended to follow its reporting guidelines, usually provided a table in their papers outlining the protocol of the 'target trial' and explicitly specifying how each component of its protocol was emulated using observational data.Reporting these details is crucial, and is advised going forward, as it allows readers to readily understand the aim of the study and the statistical methods used to address confounding bias, immortal time bias and selection bias.

Limitations
This scoping review has one main limitation which is that our search strategy has most certainly not identified all trial emulation studies published by February 25, 2021.This is a result of varying nomenclature -where not every trial emulation study refers to itself as such.For instance, to our knowledge, the first ever trial emulation study that was published was defined as an: 'observational study analysed like a randomised experiment' [2].We refrained from using search terms like 'randomised experiment' and/or 'randomised clinical trial' in our search strategy because, when combined with search terms such as 'observational study' and/or 'observational data' , our search strategy would yield thousands of studies, which for the most part would be most likely irrelevant.Instead, we decided to use search terms such as 'trial emulation' and 'target trial' , which were coined by Hernán and Robins in 2016, who were the first to formalise the idea of using observational data to emulate a randomised trial.This, however, could have resulted in omitting some trial emulation studies, as we acknowledge the fact that not every researcher/research group might refer to trial emulation as such.Future trial emulations work should clearly label themselves as such going forward, both in their abstracts and throughout their papers.

Future directions
Currently there is much interest regarding the suitability of EHRs/EMRs data for trial emulation purposes given the increased availability of big electronic healthcare databases.The main concern is the quality of EHRs/ EMRs data.These should be free from errors, inconsistencies and inaccuracies, and provide all the information required to answer the causal research question under study, including data on exposure, outcome, baseline confounders, time-varying confounders (if applicable), eligibility criteria and missingness predictors.Furthermore, the data should be available in standardized format, trustworthy, and up-to-date [3,4,62].
Trial emulation studies that have used EHRs/EMRs data, extracted data from multiple sources.For instance, The Health Improvement Network database, which was used in some studies, consists of EHRs/EMRs data from over 500 primary care practices in the United Kingdom (UK) [63].This type of EHR/EMR database has proved useful for research purposes.It remains to be determined, however, whether EHRs/EMRs data from a single healthcare facility can be used successfully to emulate trials, inform clinical decisions, and ultimately contribute to improving patient care at the facility itself.In England specifically, large National Health Service (NHS) Trusts, such as King's College Hospital, the University College London Hospitals, and the University Hospitals Birmingham NHS Foundation Trusts store plentiful amounts of EHRs/EMRs data.It would be worth evaluating the feasibility of emulating trials using specifically these EHRs/ EMRs data, especially given the recent advances in health informatics (e.g.NLP) that enable quick access to and full use of these data.If these trial emulations are proven to be feasible and do indeed provide valid findings, these approaches could then be applied on a wider scale in order to gain scientific insights at a fast pace and with lower cost.

Conclusions
This study reviewed explicit attempts of trial emulation studies across all medical fields and provides a comprehensive overview of the types of observational data that were leveraged, and the statistical methods used to address the following biases: • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year

Fig. 2
Fig. 2 Medical fields most covered Note.Studies were classified based on their outcomes, whenever possible (A) confounding bias, (B) immortal time bias and (C) selection bias.Different methods can used to address those biases.Future trial emulation studies should clearly define the causal Page 23 of 23 Scola et al.BMC Medical Research Methodology (2023) 23:186 • fast, convenient online submission

Table 2
Data extraction form Abbreviations: EHRs Electronic Health Records, EMRs Electronic Medical Records

Table 3
(continued) Psychiatry VACS The VACS (https:// www.vacsp.resea rch.va.gov/ CSPEC/ Studi es/ INVES TD-R/ Veter an-Aging-Cohort-Study.asp) is an ongoing prospective cohort study of HIV-positive and age/race/site matched control group of HIV-negative US veterans in care, launched in 1997.Oncology NCDB The NCDB ( https:// www.facs.org/ quali ty-progr ams/ cancer-progr ams/ natio nal-cancer-datab ase/) is a clinical oncology database that collects cancer patients' hospital registry data from over 1,500 hospitals in the US.

Table 3
(continued) HPFS Health Professionals Follow-up Study, THIN The Health Improvement Network, UK United Kingdom, USRDS United States Renal Data System, CKD chronic kidney disease, ESRD end-stage renal disease, COHERE the Collaboration of Observational HIV Epidemiological Research in Europe, HIV Human immunodeficiency virus, NCRAS National Cancer Registration and Analysis Service, VACS the Veterans Aging Cohort Study, STOP-COVID The Study of the Treatment and Outcomes in Critically Ill Patients with COVID-19, COVID-19 coronavirus disease, ICUs intensive care units, NCDB National Cancer Database, PDR Prescribed Drug Register, NPR National Patient Register, VA Department of Veterans Affairs, HIV-CAUSAL HIV Cohorts Analyzed Using Structural Approaches to Longitudinal data, REIN the French Renal Epidemiology and Information Network, RRT renal replacement therapies, BADBIR British Association of Dermatologists Biologic and Immunomodulators Register, ROI Republic of Ireland, SRTR Scientific Registry of Transplant Recipients, OPTN Organ Procurement and Transplant Network, NA-ACCORD the North American AIDS Cohort Collaboration on Research and Design, BladderBaSe the Bladder Cancer Data Base Sweden, SNRUBC The Swedish National Register of Urinary Bladder Cancer, NDB the National Database of Health Insurance Claims and Specific Health Check-ups of Japan, GHS Geisinger Health System, SHCS Swiss HIV Cohort Study, CER 2 The Comparative Effectiveness Research through Collaborative Electronic Reporting Consortium Abbreviations: CALIBERClinicAI research using Linked Bespoke studies and Electronic health Records, CPRD Clinical Practice Research Datalink, HES Hospital Episode Statistics, ONS Office for National Statistics, EHRs electronic health records, EMRs electronic medical Records, SEER Surveillance, Epidemiology and End Results program, US United States,

Table 4
Causal contrast of interest and methods used to address different biases