Skip to main content

Impact of methodological choices in comparative effectiveness studies: application in natalizumab versus fingolimod comparison among patients with multiple sclerosis



Natalizumab and fingolimod are used as high-efficacy treatments in relapsing–remitting multiple sclerosis. Several observational studies comparing these two drugs have shown variable results, using different methods to control treatment indication bias and manage censoring. The objective of this empirical study was to elucidate the impact of methods of causal inference on the results of comparative effectiveness studies.


Data from three observational multiple sclerosis registries (MSBase, the Danish MS Registry and French OFSEP registry) were combined. Four clinical outcomes were studied. Propensity scores were used to match or weigh the compared groups, allowing for estimating average treatment effect for treated or average treatment effect for the entire population. Analyses were conducted both in intention-to-treat and per-protocol frameworks. The impact of the positivity assumption was also assessed.


Overall, 5,148 relapsing–remitting multiple sclerosis patients were included. In this well-powered sample, the 95% confidence intervals of the estimates overlapped widely. Propensity scores weighting and propensity scores matching procedures led to consistent results. Some differences were observed between average treatment effect for the entire population and average treatment effect for treated estimates. Intention-to-treat analyses were more conservative than per-protocol analyses. The most pronounced irregularities in outcomes and propensity scores were introduced by violation of the positivity assumption.


This applied study elucidates the influence of methodological decisions on the results of comparative effectiveness studies of treatments for multiple sclerosis. According to our results, there are no material differences between conclusions obtained with propensity scores matching or propensity scores weighting given that a study is sufficiently powered, models are correctly specified and positivity assumption is fulfilled.

Peer Review reports


Natalizumab [1, 2] and fingolimod [3, 4] are two high-efficacy treatments used in Relapsing Remitting Multiple Sclerosis (RRMS) patients. Interestingly, the comparative effectiveness studies comparing these therapies showed results that were somewhat inconsistent [5,6,7,8,9]. In particular, we focus on three studies which used data from three multiple sclerosis (MS) registries, with differences in methods and conclusions [5,6,7]. We have already shown that some of this variability can be attributed to differences between the study populations [10, 11] . In the present work, we focus on the impact of methodological choices on the results—in particular, the methods used to control treatment indication bias and to manage censoring in time-to-event analysis.

In the absence of randomized clinical trials, many decisions need to be made to conduct observational studies. In the framework of “target trial”, developed by Hernan and Robins, we will focus on two protocol components, first, the assignment procedure and, second, the causal contrast [12]. First, to emulate the random assignment, we need to adjust for all known confounders [12]. Propensity score (PS), utilized in several ways, is a popular instrument used to control indication bias effect on the results of comparisons of intervention [13, 14]. The studies in the Danish MS Registry and MSBase used PS matching [6, 7] while the study in OFSEP used PS weighting [5]. Second, attrition bias and informative censoring result from systematic differences in the follow-up duration between cohorts. Two causal contrasts, per-protocol and intention-to-treat, were considered to evaluate follow-up information. While the per-protocol framework includes only outcomes that were recorded while patients were exposed to the relevant intervention, intention-to-treat framework mitigates the risk of informed censoring, which is of particular importance where clinical outcomes between interventions are delayed [12, 15]. The per-protocol framework was originally used in the studies in the Danish MS Registry and MSBase [6, 7] while the intention-to-treat framework was used in the OFSEP study [5]. Moreover, the study in MSBase used pairwise censoring that consists of censoring data within each PS matched pair to the shorter of the recorded follow-up times within the pair, in order to balance the analysed follow-up time between the groups [16].

The objective of this empirical study is to elucidate the influence of methodological decisions on the results of a comparison of two potent interventions, using the example of natalizumab and fingolimod among patients with MS and combined data from three large clinical registries [5,6,7].


Data source

This study is a result of a collaborative project [11, 17]. Longitudinal demographic and clinical data were extracted from MSBase on 15th of May 2018 [18, 19]. The Danish MS Registry cohort included all patients treated with natalizumab or fingolimod from 1st of July, 2011 when fingolimod became available in Denmark, until 1st of March, 2018 [20, 21]. The OFSEP cohort included data from 27 French university hospitals extracted from the European Database for Multiple Sclerosis (EDMUS) software in July 2014 [22]. No patient from OFSEP was recorded in MSBase. Some Danish patients who were recorded both in MSBase and Danish MS Registry (2% of Danish MS Registry) have been excluded from MSBase and only considered in the Danish MS Registry.

Eligibility criteria

All patients were diagnosed with RRMS. The required disability follow-up consisted of: a recorded visit with Expanded Disability Status Scale (EDSS)[23] score assessment within six months before treatment initiation (the baseline visit), two post-baseline visits with EDSS at least six months apart, and at least one on-treatment visit.


Treatments of interest were the first exposure to natalizumab or fingolimod on or after 1st January 2011 and continued for a minimum of three months. Patients who participated in randomized trials or patients treated with off-label treatment (cyclophosphamide), or with therapies known to have extended duration of effect [24,25,26] (mitoxantrone, alemtuzumab, cladribine, daclizumab, rituximab, ocrelizumab) before the study therapy were excluded. Each patient could contribute only once to the follow-up analysis. When multiple eligible treatment starts were recorded, the earliest treatment was considered.


Four outcomes were evaluated to compare the relative effectiveness of the two study therapies:

  • (1) Count of relapses.

  • (2) Time to first relapse.

  • (3) Time to first confirmed disability worsening event. Worsening was defined as an increase of ≥ 1.5 EDSS steps if baseline EDSS was 0, or 1.0 if baseline EDSS was 1.0–5.5, or 0.5 steps if baseline EDSS was > 5.5, and sustained at all consecutive visits over ≥ 6 months (confirmation cannot be preceded by a relapse within 30 days).

  • (4) Time to first confirmed disability improvement event. An improvement was defined as a decrease of 1.5 if baseline EDSS was 1.5, or 1.0 if baseline EDSS was 2.0–6.0, or 0.5 if baseline EDSS was > 6, sustained at all consecutive visits over ≥ 6 months.

The end of analyzed study or period (count of relapses) depended on the definition of right-censoring (see below).

Assignment procedure: propensity score matching and weighting

In the present work, baseline was defined as the date of the start of the index therapy. To emulate the random assignment of treatments at baseline, PS [13, 27] was defined as the probability of being treated with natalizumab, conditional on the following baseline characteristics (based on expert opinion and prior analyses): sex, age, MS duration (from first MS symptoms to baseline), EDSS score, number of previous treatments, and, evaluated in the past 12 months: number of relapses, and the nature of clinical activity recorded (disability worsening only, relapses only, both or no clinical activity). Country was added as random effect. We estimated both the average treatment effect for the treated (ATT) which is the average treatment effect among those patients who were exposed to natalizumab, and the average treatment effect for the entire eligible population (ATE) [28]. One-to-one, greedy, nearest neighbor, random matching on PS was used, allowing for approximating ATT only [29]. Matching caliper values of 0.1 (used in the original studies), 0.2 (as recommended by literature [30]) and 0.02 standard deviations of the PS (to prioritize close matching) were used. Two weighting procedures were explored. First, using Inverse Probability of Treatment Weighting (IPTW), the weights for a treated patient and for a control are defined as \({w}_{i }=\frac{1}{{p}_{i}}\) and \({w}_{i}=\frac{1}{1-{p}_{i}}\), respectively, where \({p}_{i}\) is the PS for a patient \(i\). In order to reduce issue due to extreme weights, the weights were stabilized by multiplication by the marginal probability of receiving the treatment actually received [31], referred to as sIPTW. Second, using odds [32], the weight for a treated patient is 1 and the weight for control is defined\({w}_{i}=\frac{{p}_{i}}{1-{p}_{i}}\). Weighting with IPTW allows estimation of ATE while weighting by the odds allows estimation of ATT.

Causal contrast of interest

Intention-to-treat analysis retained all matched or weighted patients in the group as initial treatment allocation regardless of their following exposure, until either the last data entry or the study outcome. Per-protocol analysis retained all matched or weighted patients until the date of treatment discontinuation (or the date of last data entry if it occurs earlier). Pairwise-censoring was used as a technique of censoring after matching. In each pair, study follow-up of both patients was censored when the follow-up of one of the two patients was censored. This approach prevented imbalance due to differential duration of follow-up in the matched groups.

Sensitivity analysis without the positivity assumption

The primary analysis ensured that the positivity assumption was fulfilled by only including patients who commenced natalizumab or fingolimod after the more recent of the two therapies became available on 1st January 2011. In a sensitivity analysis, all patients who commenced a study therapy were included, irrespective of the commencement date. Therefore, patients that were considered as ineligible in the primary analysis were included in this sensitivity analysis. Before 2011, MS patients had no chance to receive fingolimod, and could only started natalizumab; that is why the positivity assumption was violated.

Statistical analysis

Characteristics of the patients included in the analyses as well as those excluded by the matching procedure were described – overall and by treatment groups, before and after PS matching/weighting. Standardized mean differences (SMD) or Mahalanobis distances were computed, with 10% considered to be an acceptable difference [33]. Incidence of relapses was evaluated using a negative binomial model, with an offset term for follow-up durations. The cumulative hazards of first relapse, first EDSS improvement and first EDSS worsening were studied using Cox proportional hazards models with robust estimation of variance [34]. The models were either weighted by sIPTW or odds, or matched on PS. A cluster term (generalized estimating equations with negative binomial distribution) or a frailty term (Cox models) for pair identifier was used. As the probability of disability worsening and improvement events is associated with the frequency of EDSS scores [35], models with time to disability outcomes were adjusted for annualized visit density. All analyses were conducted for both the intention-to-treat and the per-protocol causal contrasts. Analyses using matching were completed with and without pairwise-censoring. Table 1 gives an overview of all the analytical approaches considered in the present work. The analyses were performed using R-software (R 3.4.0).

Table 1 Overview of the analytical approaches used in the present work according to the outcomes


Patients’ characteristics

Overall, 5,148 patients were included in this study [10]; 1,989 (39%) were treated with natalizumab and 3,159 (61%) with fingolimod. Patient’s characteristics are described in Table 2 (overall median age at baseline: 37.7 years; median MS duration at baseline: 6.9 years). Most of the patients had a clinically active disease and 70% had a baseline EDSS score equal or greater than 2. Table 3 presents the median durations of follow-up (overall: 3.1 years (interquartile range (IQR): 2.0–4.5)). The median durations of natalizumab and fingolimod treatments were 2.00 (1.3–3.1) and 2.2 (1.2–3.6) years, respectively.

Table 2 Baseline characteristics of the overall study population, as well as the subgroups of patients unmatched and matched within different calipers
Table 3 Follow-up duration according to the outcomes of interest (in years)

Patients’ characteristics after propensity score balancing procedures (matching and weighting)

The distributions of PS showed a good overlap between the treatment groups, except in the tails (Fig. 1). The use of three caliper values for PS-matching led to three similar matched datasets (Table 2). The characteristics of the matched groups were comparable to the characteristics of the overall sample. The excluded patients tended to experience less disease activity. Table 4 presents patients’ characteristics by treatment group. Overall, 35% of patients treated with fingolimod had an EDSS score < 2 at treatment start while it was 22% in the group treated with natalizumab. The matching procedure improved the balance between the compared groups, except for the data source and the number of previous MS treatments.

Fig. 1
figure 1

Distribution of propensity scores by treatment group (probability of being treated with natalizumab) 

Table 4 Characteristics at baseline according to treatment group in the overall population and when three matching calipers were used

Table 5 presents patients’ characteristics by treatment group after weighting on sIPTW or odds. The treatment groups were well balanced, with SMD or Mahalanobis distances around 10% for all patient characteristics, except for the number of previous MS treatments, as natalizumab tended to be prescribed as first treatment more frequently than fingolimod. Exposure following the study therapy is shown in Table S1.

Table 5 Characteristics at baseline by treatment group in the overall study sample, and cohorts weighted on sIPTW and odds

Comparison of effectiveness between natalizumab and fingolimod

Figure 2 summarises the results of all comparative analyses. While the estimated 95% confidence intervals of the estimated differences between natalizumab and fingolimod largely overlapped in all analyses, some variation in point estimates was observed.

Fig. 2
figure 2

Estimated treatment effects for the 4 outcomes, 3 matching and 2 weighting strategies and 2 causal effects, with and without pairwise censoring in matched cohorts

With a few exceptions, the results of the analyses with matching and weighting led to the same conclusions, i.e., superiority of natalizumab (for relapse outcomes and EDSS improvement) or no evidence of difference (for EDSS worsening). Inconsistencies were observed mainly in the intention-to-treat frameworks, for relapse counts and first EDSS improvement. Weighting by the odds (ATT) tended to provide lower point estimates and similar margins of error of the relative effect compared to weighting by sIPTW (ATE). The value of the matching caliper did not influence the magnitude of the estimated differences.

Most of the variability in the estimates was linked to the causal contrast. The intention-to-treat paradigm led to less stable results, especially for the count of relapses and first EDSS improvement. For all outcomes except time to first EDSS worsening, the intention-to-treat analyses underestimated the differences between the therapies in comparison to per-protocol analyses with or without pairwise-censoring. Per-protocol analyses and pairwise-censored analyses returned similar point estimates, even though the margin of error varied. In the pairwise-censored analyses, confidence intervals were relatively smaller for relapse counts but larger for the disability outcomes compared to the per-protocol analysis.

Sensitivity analysis: positivity assumption

To test the effect of violation of the positivity assumption, 7,118 patients were included irrespectively of the date of their treatment start, of whom 3,726 were treated with natalizumab. The other baseline characteristics were similar to those of the main cohort (Table S3). The PS distribution was left-skewed in patients who commenced natalizumab before fingolimod became available (Figure S1). Using weighting, the comparison of the treatment effects on relapses was similar to the main analysis (Table 6). However, the point estimates for the difference in the treatment effects on EDSS worsening were substantially lower than in the primary analysis, although confidence intervals overlapped. When matching was used, the estimates for EDSS outcomes were less influenced by the violation of the positivity assumption. Nevertheless, the estimates of the differences between treatment effects on relapses were substantially inflated when the assumption was violated, especially for the intention-to-treat causal effect.

Table 6 Comparison of treatment effect on relapses and disability violating the positivity assumption


In this empirical study conducted on a complex chronic neurological condition, with long-term follow-up data, several non-linear outcomes and well powered dataset, most of the methodological choices (PS matching/weighting, caliper values, weighting on IPTW vs. odds, and pairwise censoring) resulted in consistent overall conclusions, in accordance with two of the three original studies [5, 6], the pooled analysis [11] and a recent French head-to-head prospective study [36]. In a longitudinal observational study conducted over the long-term in the presence of frequent changes of therapy, an intention-to-treat causal contrast tends to be associated with more variability in the observed effects than a per-protocol contrast. Importantly, violation of the positivity assumption demonstrated the most pronounced negative effect on the consistency of reported results.

Propensity score to control indication bias

Among the four methods using PS, matching and weighting have shown a superior performance to adjustment and stratification in achieving balance on baseline characteristics [37], reduction of bias and estimation of variance [38,39,40]. Therefore, we restricted our present work to PS matching and weighting. The results of the weighting and matching procedures were consistent, confirming that both methods performed well in sufficiently powered data sets and correctly specified models. The width of the matching caliper did not have much influence on the consistency of the results, confirming that 0.2 is a sufficiently conservative caliper, as previously reported [30]. The only detectable systematic variability was noted for the type of estimated effect, with the magnitude of the ATE effect trending towards higher values for relapse incidence and time to first relapse.

The matched study sample corresponds to an overlap between the fingolimod- and the natalizumab-treated target populations, with inclusion of comparable cases and exclusion of cases outside the common distribution of the PS (ATT effect of interest). Such reductions in sample size may lead one to study a very specific sub-population and, so, impact the precision and the generalizability of the results [41]. An IPTW-weighted sample is closer to the entire study population, especially where ATE is the effect of interest. It is therefore not surprising, given that the use of natalizumab and fingolimod in MS differs in clinical settings, that we have observed differences in the point estimates obtained with the matched and weighted analyses. Weighting could potentially be subject to influential cases with extreme weights, which are excluded from matching, as they fall outside of the central portion of the PS distribution [42]. In this work, we used stabilized weights to mitigate the risk of influential cases, as an alternative to weight trimming or truncation [33].

Management of censoring

In the present study, most irregularities were related to the intention-to-treat causal contrast, which resulted in less stable and often deflated estimates than the per-protocol analysis. These fluctuations were more pronounced for the outcomes defined as counts of events and time to medium-term events (first disability worsening or improvement) than for time to short-term events (first relapse). The intention-to-treat evaluates the association with the outcome, irrespective of treatment status over-time, and addresses the question of the effect of treatment decision, irrespective of further persistence on the assigned therapy. Therefore, such an approach leads to conservative estimates, which explains the observed overall deflation of effect sizes in comparison to the per-protocol approach and the minimum impact on short-term outcomes.

On the other hand, patients and neurologists may be more interested in a per-protocol effect, which estimates the effect of an intervention while being adhered to. However, a per-protocol treatment effect can be inflated by attrition bias and informed censoring, especially when one of the compared interventions is a-priori perceived as being more effective [43]. This would lead to the selection of “treatment responders”, because patients who respond well to treatment are more likely to remain treated than non-responders [44]. In addition, the per-protocol requirement of adherence to treatment may introduce additional selection bias, which may limit generalizability of conclusions [45], whereas the intention-to-treat approach preserves the balance established at baseline. A pairwise-censoring procedure can be combined with either causal contrast. Its purpose is to sustain the balance between the matched cohorts even when censoring / treatment cessation is systematically different between the compared groups. This sustained balance is achieved at the expense of loss of part of study follow-up due to right-censoring of the paired cases. However, in the present empirical analysis, per-protocol and pairwise-censored analyses led to similar conclusions and point estimates. The observed increase in the margin of error in pairwise-censored analysis suggests some loss of power. Marginal structural models with IPTWs accounting for the probability of censoring may provide a more efficient solution, as they do not lead to loss of follow-up information [46,47,48].

Positivity assumption

The positivity assumption can be objectively assessed in several steps. First, the definition of study timeline and area should be such as both treatments are available to all included patients. Second, the common support of PS distribution in the two groups needs to be established [31]. In our main analysis, these two steps confirmed that the positivity assumption was met. To examine the importance of the positivity assumption, in a different analysis, we allowed inclusion of patients before one of the studied therapies (fingolimod) became available. This included more natalizumab-treated patients from a time period when the probability of exposure to fingolimod was zero. The results of this analysis showed the most pronounced variability and the largest deviation from the primary analysis. Therefore, in a sufficiently powered longitudinal dataset, non-zero probability of exposure to both compared therapies at all baseline time-points is the most important aspect of methodological considerations explored in this study.


Through consistency and exchangeability assumptions, it is assumed that there were no unmeasured confounders. Nevertheless, our study was limited by incomplete MRI data, while MRI activity is a known prognostic factor in MS [49]. Reassuringly, two of our three previous studies that accounted for MRI at treatment start showed results consistent with our primary analysis [5, 6].

In addition, heterogeneity of data in multisite registries (with potential differences in therapeutic practices, health care systems and treatment access) may increase variance of the associations between treatments and outcomes [50]. On the other hand, heterogeneity that is representative of clinical use of the compared therapies extends generalizability of the results. We have mitigated the potential heterogeneity in the present dataset by including country as a random term in the PS modeling.

Finally, this study did not attempt to compare the efficiency and robustness of different analytical methods, as this can be done only with simulation studies. Instead, we have focused on the evaluation of practical methodological questions in the context of a specific clinical choice.


This empirical study provides practical insights into the effects of several methodological choices on the estimates of the difference between two therapies in the context of a chronic neurological disease, in a sufficiently powered analysis and correctly specified models. Our results lead us to conclude that methodological considerations such as PS matching/weighting and their specifications, causal contrast and management of censoring have a negligible effect on the overall analyses, given that the model assumptions are met. The choice between ATT or ATE as the preferred approach should be driven by the clinical question of interest. In our clinical example, when both treatments can be prescribed to patients with relapsing–remitting MS following similar rules, there is no apparent reason to restrict the analysis to the natalizumab- or the fingolimod-treated patients, and ATE may be the preferred estimator of interest.

A recent review highlighted the good practice in the use and reporting of PS in MS [41]. While methodological choices in observational studies remain challenging, our present work illustrates the priorities for methodological aspects of PS-based analyses of comparative treatment effectiveness in large registries.

Availability of data and materials

OFSEP: The individual data from the present study can be obtained upon request and after validation from the OFSEP scientific committee (see website: MSBase: MSBase is a data processor, and warehouses data from individual principal investigators who agree to share their datasets on a project-by-project basis. Each principal investigator will need to be approached individually for permission to access the datasets. DMSR: Anonymized data will be shared on request from any qualified researcher under approval from the Danish Data Protection Agency.



Average Treatment effect for the Treated


Average Treatment effect for the Entire eligible population


European Database for Multiple Sclerosis


Expanded Disability Status Scale


Inverse Probability of Treatment Weighting


Multiple Sclerosis


Propensity Score


Relapsing Remitting Multiple Sclerosis


Stabilized Inverse Probability of Treatment Weighting


Standardized Mean Differences


  1. Rudick RA, Stuart WH, Calabresi PA, Confavreux C, Galetta SL, Radue E-W, et al. Natalizumab plus Interferon Beta-1a for Relapsing Multiple Sclerosis. N Engl J Med. 2006;354(9):911–23.

    Article  CAS  PubMed  Google Scholar 

  2. Polman CH, O’Connor PW, Havrdova E, Hutchinson M, Kappos L, Miller DH, et al. A Randomized, Placebo-Controlled Trial of Natalizumab for Relapsing Multiple Sclerosis. J new Engl Med. 2006;354(9):899–910.

    Article  CAS  Google Scholar 

  3. Kappos L, Radue EW, O’Connor P, Polman C, Hohfeld R, Calabresi PA, et al. A Placebo-Controlled Trial of Oral Fingolimod in Relapsing Multiple Sclerosis Ludwig. New Engl J. 2010;362(5):387–401.

    Article  CAS  Google Scholar 

  4. Cohen JA, Barkhof F, Comi G, Hartung H-P, Khatri BO, Montalban X, et al. Oral Fingolimod or Intramuscular Interferon for Relapsing Multiple Sclerosis. N Engl J Med. 2010;362(5):402–15.

    Article  CAS  PubMed  Google Scholar 

  5. Barbin L, Rousseau C, Jousset N, Casey R, Debouverie M, Vukusic S, et al. Comparative efficacy of fingolimod vs natalizumab A French multicenter observational study. Neurology. 2016;86(18):771–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kalincik T, Horakova D, Spelman T, Jokubaitis V, Trojano M, Lugaresi A, et al. Switch to natalizumab versus fingolimod in active relapsing-remitting multiple sclerosis. Ann Neurol. 2015;77(3):425–35.

    Article  CAS  PubMed  Google Scholar 

  7. Koch-Henriksen N, Magyari M, Sellebjerg F, Soelberg SP. A comparison of multiple sclerosis clinical disease activity between patients treated with natalizumab and fingolimod. Mult Scler J. 2017;23(2):234–41.

    Article  CAS  Google Scholar 

  8. Lorscheider J, Benkert P, Lienert C, Hänni P, Derfuss T, Kuhle J, et al. Comparative analysis of natalizumab versus fingolimod as second-line treatment in relapsing–remitting multiple sclerosis. Mult Scler J. 2018;24(6):777–85.

    Article  CAS  Google Scholar 

  9. Prosperini L, Saccà F, Cordioli C, Cortese A, Buttari F, Pontecorvo S, et al. Real-world effectiveness of natalizumab and fingolimod compared with self-injectable drugs in non-responders and in treatment-naïve patients with multiple sclerosis. J Neurol. 2017;264(2):284–94.

    Article  CAS  PubMed  Google Scholar 

  10. Sharmin S, Lefort M, Andersen JB, Leray E, Horakova D, Havrdova EK, et al. Natalizumab Versus Fingolimod in Patients with Relapsing-Remitting Multiple Sclerosis: A Subgroup Analysis From Three International Cohorts. CNS Drugs [Internet]. 2021;35(11):1217–32.

    Article  CAS  PubMed  Google Scholar 

  11. Andersen JB, Sharmin S, Lefort M, Koch-Henriksen N, Sellebjerg F, Sørensen PS, et al. The effectiveness of natalizumab vs fingolimod–A comparison of international registry studies. Mult Scler Relat Disord. 2021;53.

  12. Hernán MA, Robins JM. Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. Am J Epidemiol. 2016;183(8):758–64.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Austin PC. A tutorial and case study in propensity score analysis: An application to estimating the effect of in-hospital smoking cessation counseling on mortality. Multivariate Behav Res. 2011;46(1):119–51.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011;46(3):399–424.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Sedgwick P. Intention to treat analysis versus per protocol analysis of trial data. BMJ [Internet]. 2015;350(February):1–2.

  16. Kalincik T, Jokubaitis V, Izquierdo G, Duquette P, Girard M, Grammond P, et al. Comparative effectiveness of glatiramer acetate and interferon beta formulations in relapsing–remitting multiple sclerosis. Mult Scler J. 2015;21(9):1159–71.

    Article  CAS  Google Scholar 

  17. Sharmin S, Lefort M, Andersen JB, Leray E, Horakova D, Havrdova EK, et al. Natalizumab Versus Fingolimod in Patients with Relapsing-Remitting Multiple Sclerosis: A Subgroup Analysis From Three International Cohorts. CNS Drugs. 2021;35(11):1217–32.

    Article  CAS  PubMed  Google Scholar 

  18. Kalincik T, Butzkueven H. The MSBase registry: Informing clinical practice. Mult Scler J. 2019;25(14):1828–34.

    Article  Google Scholar 

  19. Butzkueven H, Chapman J, Cristiano E, Grand’Maison F, Hoffmann M, Izquierdo G, et al. MSBase: An international, online registry and platform for collaborative outcomes research in multiple sclerosis. Mult Scler. 2006;12(6):769--74.

    Article  PubMed  Google Scholar 

  20. Magyari M, Joensen H, Laursen B, Koch-Henriksen N. The Danish Multiple Sclerosis Registry. Brain Behav. 2021;11(1):1–10.

    Article  Google Scholar 

  21. Magyari M, Koch-Henriksen N, Sørensen PS. The Danish multiple sclerosis treatment register. Clin Epidemiol. 2016;8:549–52.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Vukusic S, Casey R, Rollot F, Brochet B, Pelletier J, Laplaud DA, et al. Observatoire Français de la Sclérose en Plaques (OFSEP): A unique multimodal nationwide MS registry in France. Mult Scler J. 2018;1–5.

  23. Kurtzke JF. Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology. 1983;33(11):1444–52.

    Article  CAS  PubMed  Google Scholar 

  24. Foo EC, Russell M, Lily O, Ford HL. Mitoxantrone in relapsing-remitting and rapidly progressive multiple sclerosis: Ten-year clinical outcomes post-treatment with mitoxantrone. Mult Scler Relat Disord. 2020;44:102330.

    Article  CAS  PubMed  Google Scholar 

  25. Chartier N, Epstein J, Soudant M, Dahan C, Michaud M, Pittion-Vouyovitch S, et al. Clinical follow-up of 411 patients with relapsing and progressive multiple sclerosis 10 years after discontinuing mitoxantrone treatment: a real-life cohort study. Eur J Neurol. 2018;25(12):1439–45.

    Article  CAS  PubMed  Google Scholar 

  26. Le Page E, Leray E, Taurin G, Coustans M, Chaperon J, Morrissey SP, et al. Mitoxantrone as induction treatment in aggressive relapsing remitting multiple sclerosis: treatment response factors in a 5 year follow-up observational study of 100 consecutive patients. J Neurol Neurosurg Psychiatry. 2008;79(1):52–6.

    Article  PubMed  Google Scholar 

  27. Rosenbaum PR, Rubin DB. The Central Role of the Propensity Score in Observational Studies for Causal Effects Effects. Biometrika. 1983;70(1):41–55.

    Article  Google Scholar 

  28. Hernán M, Robins J. Causal inference. Boca Raton. 2019.

  29. Pirracchio R, Carone M, Rigon MR, Caruana E, Mebazaa A, Chevret S. Propensity score estimators for the average treatment effect and the average treatment effect on the treated may yield very different estimates. Stat Methods Med Res. 2013;25(5):1938–54.

    Article  PubMed  Google Scholar 

  30. Austin PC. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat. 2011;10(2):150–61.

    Article  PubMed  Google Scholar 

  31. Vaughan AS, Kelley CF, Luisi N, Del Rio C, Sullivan PS, Rosenberg ES. An application of propensity score weighting to quantify the causal effect of rectal sexually transmitted infections on incident HIV among men who have sex with men Data analysis, statistics and modelling. BMC Med Res Methodol. 2015;15(1):1–9.

    Article  Google Scholar 

  32. Hirano K, Imbens GW, Geert R. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score. Econometrica. 2003;71(4):1161–89.

    Article  Google Scholar 

  33. Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34(28):3661–79.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Cole SR, Hernán MA. Adjusted survival curves with inverse probability weights. Comput Methods Programs Biomed. 2004;75:45–9.

    Article  PubMed  Google Scholar 

  35. Kalincik T, Cutter G, Spelman T, Jokubaitis V, Havrdova E, Horakova D. Defining reliable disability outcomes in multiple sclerosis. Brain. 2015;138(11):3287–98.

    Article  PubMed  Google Scholar 

  36. Cohen M, Mondot L, Bucciarelli F, Pignolet B, Laplaud DA, Wiertlewski S, et al. BEST-MS: A prospective head-to-head comparative study of natalizumab and fingolimod in active relapsing MS. Mult Scler J. 2020;1–8.

  37. Austin PC. The relative ability of different propensity score methods to balance measured covariates between treated and untreated subjects in observational studies. Med Decis Mak. 2009;29(6):661–77.

    Article  Google Scholar 

  38. Austin PC. The performance of different propensity score methods for estimating marginal odds ratios. Stat Med. 2007;26:3078–94.

    Article  PubMed  Google Scholar 

  39. Austin PC. The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Stat Med. 2010;29(20):2137–48.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Austin PC. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med. 2013;32(16):2837–49.

    Article  PubMed  Google Scholar 

  41. Karim ME, Pellegrini F, Platt RW, Simoneau G, Rouette J, de Moor C. The use and quality of reporting of propensity score methods in multiple sclerosis literature: A review. Mult Scler J. 2020;1–7.

  42. Seeger JD, Bykov K, Bartels DB, Huybrechts K, Schneeweiss S. Propensity Score Weighting Compared to Matching in a Study of Dabigatran and Warfarin. Drug Saf. 2017;40(2):169–81.

    Article  CAS  PubMed  Google Scholar 

  43. Hernán MA, Hernández-Díaz S. Beyond the intention-to-treat in comparative effectiveness research. Clin Trials. 2012;9(1):48–55.

    Article  PubMed  Google Scholar 

  44. Sormani MP, Bruzzi P. Can we measure long-term treatment effects in multiple sclerosis? Nat Rev Neurol. 2014;11(3):176–82.

    Article  PubMed  Google Scholar 

  45. Mansournia MA, Higgins JPT, Sterne JAC, Hernán MA. Biases in randomized trials: a conversation between trialists and epidemiologists. Epidemiology. 2017;28(1):54–9.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Robins JM, Hernán MÁ, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550–60.

    Article  CAS  PubMed  Google Scholar 

  47. Karim ME, Gustafson P, Petkau J, Zhao Y, Shirani A, Kingwell E, et al. Marginal structural cox models for estimating the association between β-interferon exposure and disease progression in a multiple sclerosis cohort. Am J Epidemiol. 2014;180(2):160–71.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Kalincik T, Diouf I, Sharmin S, Malpas C, Spelman T, Horakova D, et al. Effect of Disease-Modifying Therapy on Disability in Relapsing-Remitting Multiple Sclerosis Over 15 Years. Neurology [Internet]. 2021 Feb 2;96(5):e783 LP-e797. Available from:

  49. Tintore M, Rovira À, Río J, Otero-Romero S, Arrambide G, Tur C, et al. Defining high, medium and low impact prognostic factors for developing multiple sclerosis. Brain. 2015;138(7):1863–74.

    Article  PubMed  Google Scholar 

  50. Bovis F, Signori A, Carmisciano L, Maietta I, Steinerman JR, Li T, et al. Expanded disability status scale progression assessment heterogeneity in multiple sclerosis according to geographical areas. Ann Neurol. 2018;84(4):621–5.

    Article  PubMed  Google Scholar 

Download references


Not applicable


OFSEP was supported by a grant provided by the French State and handled by the "Agence Nationale de la Recherche", within the framework of the "Investments for the Future" program, under the reference ANR-10-COHO-002, by the Eugène Devic EDMUS Foundation against multiple sclerosis and by the ARSEP Foundation. ML has recieved travel grant from ARSEP foundation for this project. The Clinical Outcomes Research unit at the University of Melbourne received funding from NHMRC (grant number 1140766, 1129789, and 1157717) to support this study. The MSBase Foundation is a not-for-profit organization that receives support from Biogen, Novartis, Merck, Roche, Teva Pharmaeutical Industries and Sanofi Genzyme. The Danish Multiple Sclerosis Registry did not receive any funding to collaborate in this study.

Author information

Authors and Affiliations



ML, SS, and JBA contributed to the design of the study, conducted and interpreted the analysis, and drafted, revised, and approved the manuscript. HB, MM, TK, and EL conceptualized and designed the study, contributed to data acquisition, interpreted the results, and revised and approved the manuscript. SV, NKH, RC, and DAL contributed to the design of the study, interpreted the results, and revised and approved the manuscript. MD, GE, JC, AR, JDS, EM, HZ, PL, GD, CLF, TM, EB, PC, JP, BS, OG, ET, OH, AA, BB, OC, PC, AM, AW, JPC, AM, HBN, KH, CP, NM, DDB, CN, DH, EKH, RA, GI, SE, SO, FP, MO, AL, MT, PG, FG, BY, AP, MG, PD, CB, MT, PMC, MS, JLS, RT, PS, DF, FG, VS, JP, DM, OS, KB, AVW, RK, BVW, TC, DS, SV, FS, PSS, CCHC, PVR, MBJ, JLF, SB, HKM, and KIS, contributed substantially to data acquisition and interpretation of the analysis, and revised and approved the manuscript. ML, SS, and JBA contributed equally. HB, MM, TK, and EL contributed equally. The authors read and approved the final manuscript. 

Corresponding authors

Correspondence to T. Kalincik or E. Leray.

Ethics declarations

Ethics approval and consent to participate

OFSEP (Observatoire Français de la Sclérose en Plaques; French MS registry), ID: NCT02889965, prospectively collects longitudinal data on clinical, biological, and imaging markers from patients who provided written informed consent following the French law on Bioethics. Storing data for research purposes was approved by the French Commission Nationale de l'Informatique et des Libertés (CNIL). MSBase is an international multiple sclerosis registry (World Health Organization International Clinical Trials Registry ID: ACTRN12605000455662) of observational data collected longitudinally as part of routine clinical care from 129 mostly tertiary multiple sclerosis centres in 36 countries. MSBase was approved by the Melbourne Health Human Research Ethics Committee and by the site institutional review boards, unless exemptions were granted according to the local regulations. Written informed consent was obtained from enrolled patients. The Danish Multiple Sclerosis Registry (DMSR) is a nationwide population-based registry consisting longitudinal data of all patients receiving disease-modifying treatments. Data are collected prospectively and stored following the data protection law of the Danish Data Inspection. The study obtained approvals from the Center for Data Review applications (j. nr. 2012–58-0004/VD-2018–121 I-suite 6361). Consent has been obtained from each patient included in this study. Please see ‘Ethics statement’ in the manuscript for details. All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

OFSEP: The authors report the following relationships: speaker honoraria, advisory board or steering committee fees, independent data monitoring committees fee, consultancy and lecturing fees, principal investigator in clinical trials, research support, unconditional PhD donation and/or conference travel support from Actelion (PC, ET), Ad Scientiam (EM), Akcea (JPC), Alnylam (JPC), Almirall (OH), Bayer (GE, HZ, OH), Biogen (GE, JC, AR, JDS, EM, HZ, PL, GD, TM, EB, PC, JP, BS, ET, OH, BB, OC, AMo, JPC, AMa, IP, NM, DAL, SV, WA), Celgene (JC, ET, DAL), CSL-Behring (JPC), FHU Imminent (HZ), Geneuro (SV), Genzyme-Sanofi (GE, JC, AR, JDS, EM, HZ, PL, GD, CLF, TM, EB, PC, JP, BS, ET, OH, BB, OC, AMo, JPC, AMa, IP, NM, DAL, SV), Grifols (JPC), Laboratoire Français des Biotechnologies (JPC), LFB (GE), LFSEP (HZ), Merck / EMD (GE, JC, AR, EM, HZ, PL, GD, PC, JP, BS, ET, OH, BB, OC, AMo, JPC, AMa, NM, DAL, SV), Medday (EL, AR, TM, PC, JP, DAL, SV), Natus (JPC), Novartis (EL, GE, JC, AR, JDS, EM, HZ, PL, GD, CLF, TM, EB, PC, JP, BS, ET, OH, BB, OC, AMo, JPC, AMa, IP, NM, DAL, SV), Pfizer (JPC), Pharmalliance (JPC), Roche (ML, EL, GE, JC, AR, JDS, EM, HZ, PL, GD, CLF, EB, PC, JP, BS, ET, OH, BB, OC, AMa, NM, DAL, SV, WA), SNF-Floerger (JPC), Teva (GE, JC, AR, JDS, EM, HZ, PL, GD, EB, PC, JP, ET, OH, BB, AMo, JPC, AMa, SV), Académie de Médecine (HZ), Agence Nationale de la Recherche (DAL), French National Security Agency of Medicines and Health Products (EL), the EDMUS Foundation (EL), the ARSEP foundation (GE, HZ, ET, DAL,ML), PHRC Foundation (ET), Rennes University Hospital (GE). MSBase: The authors report the following relationships: speaker honoraria, advisory board or steering committee fees, research support and/or conference travel support from Actelion (EKH), Almirall (GI, FP, MT), Bayer (RA, FP, AL, MT, CB, MT, MS, JLS, BVW, TC, DS), BioCSL (KB, TK), Biogen (DH, EKH, RA, GI, FP, AL, PG, FGM, MG, PD, CB, MT, MS, JLS, PS, DF, FG, JP, BVW, TC, HB, TK), Canadian Multiple sclerosis society (PG, PD), Canadian Institutes of Health Research (MG, PD), Celgene (EKH, FP, TK), Czech Minsitry of Education (DH, EKH), Fondazione Italiana Sclerosi Multipla (FP, AL), Grifols (KB), Genzyme-Sanofi (DH, EKH, RA, GI, FP, AL, MT, PG, FGM, MG, PD, CB, MT, MS, JLS, PS, DF, FG, JP, BVW, TC, DS, HB, TK), GSK (RA), Merck / EMD (DH, EKH, RA, GI, FP, AL, MT, PG, MG, PD, CB, MT, MS, JLS, PS, DF, FG, KB, BVW, TC, DS, HB, TK), Mitsubishi (FGM), Ministero Italiano della Universit e della Ricerca Scientifica (FP), Mylan (FP, AL), Novartis (DH, EKH, RA, GI, FP, AL, MT, PG, FGM, MG, PD, CB, MT, MS, JLS, PS, DF, FG, JP, KB, BVW, TC, DS, HB, TK), ONO Pharmaceuticals (FGM), Roche (DH, EKH, RA, GI, FP, AL, MT, CB, FG, KB, BVW, TC, TK), Teva (DH, EKH, GI, FP, AL, MT, PG, FGM, MG, PD, CB, MT, JLS, PS, DF, JP, KB, BVW, TC, DS, TK), WebMD Global (TK). DMSR: The authors report the following relationships: speaker honoraria, advisory board or steering committee fees, independent data monitoring committees fee, consultancy fee, research support and/or conference travel support from Almirall (JF), Alexion (PVR), Bayer (HKM), Biogen (NKH, FS, CH, PVR, MBJ, JF, SB, HKM, KIS, MM), Bristol Myers Squibb (PVR), Celgene (PSS), Genzyme-Sanofi (FS, PSS, CH, PVR, MBJ, JF, SB, HKM, KIS, MM), GSK (PSS), Medday (PSS), Merck / EMD (JBA, NKH, FS, PSS, CH, PVR, MBJ, JF, SB, HKM, KIS, MM), Novartis (NKH, FS, PSS, CH, PVR, MBJ, JF, KIS, MM), Roche (FS, CH, PVR, MBJ, JF, SB, KIS, MM), Teva (NKH, FS, PSS, PVR, MBJ, JF, HKM, KIS, MM).

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: 

Figure S1. Propensity score distribution without the positivity assumption. Table S1. Treatment exposure after natalizumab or fingolimod start during the follow-up. Table S2. Baseline characteristics of the unmatched cohorts by treatment group. Table S3. Baseline characteristics of cohort violating the positivity assumption.  

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lefort, M., Sharmin, S., Andersen, J.B. et al. Impact of methodological choices in comparative effectiveness studies: application in natalizumab versus fingolimod comparison among patients with multiple sclerosis. BMC Med Res Methodol 22, 155 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: