Skip to main content

Alternative statistical methods for estimating efficacy of interferon beta-1b for multiple sclerosis clinical trials



In the randomized study of interferon beta-1b (IFN beta-1b) for multiple sclerosis (MS), it has usually been evaluated the simple annual relapse rate as the study endpoint. This study aimed to investigate the performance of various regression models using information regarding the time to each recurrent event and considering the MS specific data generation process, and to estimate the treatment effect of a MS clinical trial data.


We conducted a simulation study with consideration of the pathological characteristics of MS, and applied alternative efficacy estimation methods to real clinical trial data, including 5 extended Cox regression models for time-to-event analysis, a Poisson regression model and a Poisson regression model with Generalized Estimating Equations (GEE). We adjusted for other important covariates that may have affected the outcome.


We compared the simulation results for each model. The hazard ratios of real data were estimated for each model including the effects of other covariates. The results (hazard ratios of high-dose to low-dose) of all models were approximately 0.7 (range, 0.613 - 0.769), whereas the annual relapse rate ratio was 0.714.


The precision of the treatment estimation was increased by application of the alternative models. This suggests that the use of alternative models that include recurrence event data may provide better analyses.

Peer Review reports


Multiple sclerosis (MS) is the most common demyelinating disorder of the central nervous system, and is characterized by repeated episodes of neurological dysfunction with variable remission. Since 1993, the beneficial effects of interferon beta have been shown [1], and in Japan, interferon beta-1b (IFN beta-1b) has significantly reduced relapse rates and reduced MRI lesion areas in patients with relapsing-remitting MS [2]. Recently, Kappos et al. [3] reported that IFN beta-1b can delay the conversion to clinically definite MS. Carroll [4] performed a comprehensive review of clinical studies of MS therapies.

The long-term treatment effects for chronic recurrent diseases such as MS should be evaluated in clinical trials. In the past, the primary endpoint in clinical trials of MS has been the annual relapse rate, the change in a clinical indicator such as the Expanded Disability Status Scale (EDSS) score or total area of MS lesions on the MRI scan from entry time, the proportion of non-relapsed patients, or the time to the first recurrence [1, 2, 510]. Meanwhile, extended methods of survival analysis for time-to-event data have been proposed, and such methods are useful when study subjects experience 2 or more events. Considering the recurrent events in survival analysis should theoretically increase the estimation efficiency regarding the effects of treatment [11]. Although these methods have not generally been applied to MS clinical trial data, Wang et al. [12] recently examined some of the models. Excellent reviews are available regarding how these methods can contribute to the estimation of treatment effects [11, 13, 14]. When using these models, it is important to pay attention to the nature of the models because the results of the estimation are highly dependent on the clinical situation [15]. In real clinical studies, the concerned events might occur rarely, several events might occur simultaneously, or several events might occur separately with high correlation. Appropriate models should be selected after considering the relationship between the assumptions of the models and the manner in which the events occur. For example, if we analyse the disease data such that the deteriorations of many lesions are found simultaneously, we should select the model that can manage the count data approach rather than the gap time modeling of event history analysis.

In this study, we focused on the extended Cox proportional models and Poisson regression model using Generalized Estimating Equations (GEE), which can be analyzed using existing statistical packages such as SAS. Using these regression models, we can estimate the adjusted treatment effect while considering the important covariates that might affect the outcomes, whereas the relapse rates provide only non-adjusted estimate [16]. The objective of this study was to investigate the performance of these models through a simulation study with MS-specific data generation processes and to apply various models that are used for estimating the treatment effect to a real clinical trial data set. This data set comprises the effect of IFN beta-1b on MS with special attention to subjects with relapsing-remitting MS.



A phase II randomized controlled clinical trial was conducted to compare the effect of 2 different doses (high-dose: 250 μg and low-dose: 50 μg) of IFN beta-1b on relapsing-remitting MS in Japan. Details of the trial design, inclusion criteria, baseline demographics, and efficacy results have been published [2]. In the trial, 205 patients with relapsing-remitting MS were randomized, and efficacy was assessed in 188 patients (55 male and 133 female patients). The primary endpoint of the study was the evaluation of the annual relapse rate. The percentage of patients who experienced a relapse more than once during follow-up was 55.8% (53/95) of patients in the high-dose group and 65.6% (61/93) of patients in the low-dose group. In these groups, the maximum number of relapses was 7, and the minimum,- 0, with a median of 1. The annual relapse rates in the high- and low-dose groups as estimated by the person-time method were 0.763 and 1.069, respectively (relapse rate ratio = 0.714; 95% CI 0.560- 0.910; p = 0.006).


Various survival models used for analysis of recurrent event data and that handle clustered and multiple event data have been proposed. Let λ ij (t) be the hazard function of the jth recurrence of the ith subject at time t; λ 0 j (t) be the baseline hazard function of the jth recurrence at time t; Y ij (t) be the indicator variable for the jth recurrence of the ith subject at time t, which is 1 when the subject is at risk and under observation and 0 otherwise; X ij (t) be the jth covariate vector of the ith subject at time t; and βj be the parameter vector for the jth recurrence, which includes the treatment effect. When each recurrence is assumed to have common effect, we omit the subscript j. Schematic forms of the models are shown in Figure 1.

Figure 1
figure 1

Schematic forms of the various extended time-to-event Cox regression models. Each arrow represents a stratum. Arrow diagrams describe about the behavior of ID = 1 in the sample data (DATA = MS) in Appendix (Additional file 1), who experiences the first event at day 51, second event at day 185, third event at day 413 and finally have censored at day 692. (A) "Time-to-first-event Cox model" only uses the information of the time to first event (the day 51). (B) "AG model" shows the renewal process of the events. (C) "PWP model" shows how the conditional models are constructed. (D) "WLW model" and "LWA model" model the marginal distribution of each event occurrence time.

The first model to be considered is the ordinary time-to-first-event model, which is formulated with the Cox proportional hazard model (hereafter referred to as "time-to-first-event Cox model"). It handles only the time-to-first-event data and ignores the information of the second or more events. Hereafter, the models that can deal with this lack of information are shown.

Andersen and Gill [17] extended the Cox proportional hazard model in the counting process formation (AG model). A Poisson process in which, each counting process has independent increments is assumed so that multiple events within the same subjects are regarded independently. The hazard of subject i at time t is

Although subjects who have once experienced an event are excluded from the risk set from that time in the usual Cox model, subjects who have experienced at least 1 event and are under observation can also be included in the risk set in the AG model. Because the baseline hazard is assumed to be common among subjects, this model ignores the individual differences and might be effective when the overall treatment effects are of interest.

Prentice, Williams, and Peterson [18] also extended the Cox model. They proposed the conditional model, which assumes that a subject is not at risk for the jth event until he/she has experienced the (j-1)th event, where Y ij (t) is 0 until the (j-1)th event and after which it becomes 1 (PWP model). In terms of the time scale, 2 models are used. One model measures from the entry time and is called the total time model (PWP-T model). The hazard of the jth recurrence of subject i at time t is

The other model resets the clock at every recurrence and is called the gap time model (PWP-G model). Assigning t j-1 as the time at which the (j-1)th event occurs, the hazard of the jth recurrence of subject i at time t is

Although the PWP model makes the interpretation easy, the sizes of the risk sets become relatively small as the number of events increases, making the estimates unstable.

Wei, Lin, and Weissfeld [19] modeled the marginal distribution of the time of each occurrence of the event using the Cox model (WLW model). The hazard of the jth recurrence of subject i at time t is

In this model, each recurrence is modeled as a separate stratum, and each subject appears in all of the strata so that no assumptions are made with respect to the recurrence process. However, this may result in substantial efficiency loss because it ignores the obvious dependency structure, in that the (j+1)th recurrent time must exceed the jth.

On the other hand, Lee, Wei, and Amato [20] proposed a model (LWA model) that assumes a common baseline hazard, where the hazard can be written as

The same subjects can enter several risk sets simultaneously, although its unnaturalness is discussed at the same time.

In terms of the inference of the parameter vector, the use of robust variance, which can handle intra-subject correlations, is considered to be desirable for all models described above (AG, PWP, WLW, and LWA models). Regarding the parameter estimation of the PWP, WLW, and LWA models, each recurrence is assumed to have a common effect in this study.

The Poisson regression model fits the framework of the generalized linear models in which, the response variable, which is the number of occurrences of the event in a fixed time interval, follows Poisson distribution. Let μ(X) be the expected value of the number of relapses; N(X), the total observation period; λ(X), the constant relapse rate of MS; X, the covariate vector; and β, the parameter vector to be estimated. The relapse rate can then be written as

Thus, μ(X) = N(X)exp(X'β).

The relapse rate is not necessarily constant throughout the observation period; it is better to partition the time axis into intervals of constant rates.

Consequently, the intra-subject correlation of the relapse rate among the intervals can be discussed in terms of GEE. GEE is an extension of generalized linear models and regards a subject as a cluster so that the treatment effect can be estimated considering the correlation structure among response variables [21]. It is expected to be a flexible method for analyzing recurrent event data because it can be used even if many of the aforementioned assumptions regarding the proportional hazard models do not hold. In this study, the GEE-Poisson model was applied, and intervals were set at 6 months, each with the common rate.

Simulation study

To determine which model is the most suitable for analyzing MS clinical trial data, we conducted the simulation study with consideration of the disease progression process or natural history. When performing a simulation study, we should examine the event generation process, which might be suited to the situation of the disease progression process [22]. The data generation process of this study was as follows. In a hypothetical randomized controlled clinical trial with placebo (n = 100) and active (n = 100) groups, we assumed that each patient had 10 hypothetical latent lesions in their brain and that the lesions were in the inactive phase at the entry time. The recurrence time was recorded after each lesion developed to the active phase. This setting modeled some MS pathological characteristics, such as time and spatial distribution of the lesions. The total follow-up period was set to 3 years, and the censoring time, which was assumed to be independent from the recurrence time, was generated using a Weibull distribution S(t) = exp (-λt γ) with the shape parameter γ = 2.1399 and the scale parameter λ = 0.000000576. The time to recurrence was also generated using a Weibull distribution, and 2 different scenarios were considered.

Scenario 1: All patients have individual identical Weibull distribution parameters, γ = 1.1452 and λ = 0.00141.

Scenario 2: Mixture population of 3 different sets of parameters; 46% of the population has γ = 1.2442 and λ = 0.000604, 45% of the population has γ = 1.1550 and λ = 0.001578, and 9% of the population has γ = 1.9694 and λ = 0.0000661.

The parameters used in our simulation study were calculated from other clinical trial data in Japan [23], especially for the placebo group, which can be regarded as the natural history cohort [24]. The mixture proportions described in Scenario 2 were obtained from the distribution of the number of recurrences in the year preceding the study, which was one of the important covariates. The hazard ratio (relapse rate ratio) of the active group to the placebo group was set at 1/1.3, such the true value of the Cox regression parameter (log-hazard ratio) was log(1/1.3) = -0.26236. Each simulation was repeated 1000 times, and the results were evaluated via the bias and mean square error (MSE). The bias is the difference between the estimated and true (or reference) values; thus, the treatment effect would be underestimated if we obtained positive bias and, overestimated if we obtained negative bias. The MSE considers both bias and variability as gauged by the variance of parameter estimates.

After the simulation study, the models were applied to the IFN beta real clinical trial data separately. All statistical analyses, including the simulation study, were performed using SAS version 9.1.3 (SAS Institute Inc., Cary, NC, USA). The SAS sample programs for the use of these models are shown in the appendix of this article (Additional file 1), and a dummy data set is used to clarify the use of the models.


The results of the simulation study are presented in Table 1. AG, Poisson, and GEE-Poisson models showed similar results, with positive bias and relatively small MSE in both scenarios. Almost no bias was detected in PWP-T, PWP-G, and LWA models in Scenario 1, whereas they showed larger bias in Scenario 2. In the WLW model, a relatively large bias with a negative direction was noted, indicating that an overestimation of the treatment effect and a large MSE were detected in both scenarios.

Table 1 Bias and MSE from the simulation study

The various aforementioned models were then applied to the real clinical trial data introduced in the Methods section with adjustment for some important covariates, such as sex, age, EDSS score at entry time, total area of MS lesions on the MRI scan at the entry time, and number of recurrences in the year preceding the study. Table 2 shows the results of the analysis. The hazard ratio indicates the relative risk of the high-dose group to the low-dose group, and it is distributed from 0.613 (WLW model) to 0.769 (time-to-first-event Cox model). All models except the time-to-first-event Cox model showed a significant effect of high-dose IFN beta-1b. The standard error of the WLW model was the largest, while the PWP-T and PWP-G models showed relatively small values. The width of the confidence intervals of the AG, PWP-T, PWP-G, WLW, LWA, Poisson, and GEE-Poisson models was smaller than that of the time-to-first-event Cox model.

Table 2 Estimates of treatment effects for MS clinical trials in Japan

Regarding the behavior of the other covariates besides the IFN beta-1b variable, "the number of recurrences in the year preceding the study" showed significant differences in all 8 models. As the number increased, the hazard of recurrence in the study increased (range of hazard ratio among the 8 models: 1.164-1.375).


Because MS is a heterogeneous disease with a variety of subtypes and transitional cases, it is not easy to evaluate drug efficacy. By conducting a simulation study and applying it to real clinical trial data, we examined various extended Cox regression models and a Poisson regression model using GEE, which can handle recurrent events - not only the number of recurrences or the time to the first event, but all recurrences that occurred during the follow-up period. With the use of the extended models, significant effects were detected and the importance of utilizing more than 1 recurrent time was suggested by our analyses.

From the simulation study results, treatment effect was relatively overestimated in the WLW model. The same tendency was observed in the analysis of the real data; the WLW model showed the smallest hazard ratio. This overestimation tendency might not be desirable, especially in confirmatory trials. The bias and MSE of the LWA model in Scenario 1 were small for homogeneous population because of the similarity of the assumption of the data generation process; however, in Scenario 2, both bias and MSE became larger to some extent for heterogeneous population. In terms of MSE of the PWP-T and PWP-G models, totally preferable results were obtained, but the bias differences between Scenarios 1 and 2 for each model were approximately 2 times larger than those of the other models; this finding suggests of possible unstable features in the PWP models. For the time-to-first-event Cox model, AG model, Poisson regression model, and GEE-Poisson model, no extreme differences were found.

We are then left to select the best model for our data. All models have their own assumptions and characteristics, and so, our decision must consider the nature and system of disease progression that we have analyzed in advance in order to make the correct choice. When we consider the pathological condition in MS, such as the time and spatial distribution of latent lesions, the LWA model seems to be reasonable because of the assumption of a common baseline hazard, which means that each latent lesion has the same risk of development. If we can assume that all subjects have the same number of lesions that can develop at the same risk, the LWA model becomes conceivable. In the same way, if we can assume that all subjects have the same number of lesions that can develop at different risks, the WLW model seems to be best fitted. However, such settings would be unrealistic. In addition, the precision of the estimates in the WLW and LWA models is relatively poor. As the number of lesions increases, the number of strata also increases, which might lead to unstable estimates.

If we assume that the independent increments for all events are even among subjects, then the AG model is reasonable; however, this assumption would be unnatural in this case. PWP models would involve a similar situation despite small standard errors. In fact, the martingale residuals, which enabled us to examine the increment dependency, showed negative slopes throughout the period. This suggests that the assumptions that the AG and PWP models required would not exactly hold. The estimates of the Poisson regression and GEE-Poisson models were quite similar, so the advantage of using the GEE-Poisson model was not entirely clear in our study. However, if we had had a longer follow-up period and more time intervals, the method that accounts for the intra-subject correlation structure among intervals would be the more attractive model.

Realistically speaking, the assumptions needed in the extended Cox regression models (AG, PWP, WLW, and LWA models) would be difficult to be completely examined because of the uncertainty of MS pathological and/or clinical deterioration mechanisms and that fact that no one can prove the correctness of these assumptions. The Poisson regression and GEE-Poisson models are free of such assumptions. Moreover, in terms of the advanced nature regarding consideration of intra-subject correlation for recurrences in the GEE-Poisson model, the GEE-Poisson model is preferred over the Poisson regression model. However, further study regarding the behavioral characterization among these models is still needed.


Our results indicate that the use of alternative models that include recurrence event data, especially the GEE-Poisson model, may provide better analysis for estimating the treatment effect.


  1. The IFNB Multiple Sclerosis Study Group: Interferon beta-1b is effective in relapsing-remitting multiple sclerosis. I. Clinical results of a multicenter, randomized, double-blind, placebo-controlled trial. Neurology. 1993, 43: 655-661.

    Article  Google Scholar 

  2. Saida T, Tashiro K, Itoyama Y, Sato T, Ohashi Y, Zhao Z: Interferon beta-1b is effective in Japanese RRMS patients: a randomized, multicenter study. Neurology. 2005, 64: 621-630.

    Article  CAS  PubMed  Google Scholar 

  3. Kappos L, Polman CH, Freedman MS, Edan G, Hartung HP, Miller DH, Montalban X, Bauer L, Jakobs P, Pohl C, Sandbrink R: Treatment with interferon beta-1b delays conversion to clinically definite and McDonald MS in patients with clinically isolated syndromes. Neurology. 2006, 67: 1242-1249. 10.1212/01.wnl.0000237641.33768.8d.

    Article  CAS  PubMed  Google Scholar 

  4. Carroll WM: Clinical trials of multiple sclerosis therapies: improvements to demonstrate long-term patient benefit. Mult Scler. 2009, 15: 951-958. 10.1177/1352458509105426.

    Article  CAS  PubMed  Google Scholar 

  5. The PRISMS Study Group: Randomised double-blind placebo-controlled study of interferon beta-1a in relapsing/remitting multiple sclerosis. Lancet. 1998, 352: 1498-1504. 10.1016/S0140-6736(98)03334-0.

    Article  Google Scholar 

  6. Johnson KP, Brooks BR, Cohen JA, Ford CC, Goldstein J, Lisak RP, Myers LW, Panitch HS, Rose JW, Schiffer RB: Copolymer 1 reduces relapse rate and improves disability in relapsing-remitting multiple sclerosis: results of a phase III multicenter, double-blind, placebo-controlled trial. Neurology. 1995, 45: 1268-1276.

    Article  CAS  PubMed  Google Scholar 

  7. Fazekas F, Deisenhammer F, Strasser-Fuchs S, Nahler G, Mamoli B: Randomised placebo-controlled trial of monthly intravenous immunoglobulin therapy in relapsing-remitting multiple sclerosis. Lancet. 1997, 349: 589-593. 10.1016/S0140-6736(96)09377-4.

    Article  CAS  PubMed  Google Scholar 

  8. Sorensen PS, Wanscher B, Jensen CV, Schreiber K, Blinkenberg M, Ravnborg M, Kirsmeier H, Larsen VA, Lee ML: Intravenous immunoglobulin G reduces MRI activity in relapsing multiple sclerosis. Neurology. 1998, 50: 1273-1281.

    Article  CAS  PubMed  Google Scholar 

  9. The Lenercept Multiple Sclerosis Study Group and The University of British Columbia MS/MRI Analysis Group: TNF neutralization in MS: results of a randomized, placebo-controlled multicenter study. Neurology. 1999, 53: 457-465.

    Article  Google Scholar 

  10. Miller DH, Kahn OA, Sheremata WA, Blumhardt LD, Rive GP, Libonati MA, Willmer-Hulme AJ, Dalton CM, Miszkiel KA, O'Connor PW: A controlled trial of natalizumab for relapsing multiple sclerosis. N Engl J Med. 2003, 348: 15-23. 10.1056/NEJMoa020696.

    Article  CAS  PubMed  Google Scholar 

  11. Therneau TM, Grambsch PM: Modeling Survival Data: Extending the Cox Model. 2000, New York: Springer-Verlag

    Book  Google Scholar 

  12. Wang YC, Meyerson L, Tang YQ, Qian N: Statistical methods for the analysis of relapse data in MS clinical trials. J Neurol Sci. 2009, 285: 206-211. 10.1016/j.jns.2009.07.017.

    Article  CAS  PubMed  Google Scholar 

  13. Yamaguchi T: Recurrent event data analysis: A review. Japanese Journal of Biometrics. 2005, 26: 81-117.

    Article  Google Scholar 

  14. Cook RJ, Lawless JF: The Statistical Analysis of Recurrent Events. 2007, New York: Springer

    Google Scholar 

  15. Kuramoto L, Sobolev BG, Donaldson MG: On reporting results from randomized controlled trials with recurrent events. BMC Medical Research Methodology. 2008, 8: 35-10.1186/1471-2288-8-35.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Klein JP, Moeschberger ML: Survival Analysis: Techniques for Censored and Truncated Data. 2003, New York: Springer, 2

    Google Scholar 

  17. Andersen PK, Gill RD: Cox's regression model for counting processes: a large sample study. Ann Stat. 1982, 10: 1100-1120. 10.1214/aos/1176345976.

    Article  Google Scholar 

  18. Prentice RL, Williams BJ, Peterson AV: On the regression analysis of multivariate failure time data. Biometrika. 1981, 68: 373-379. 10.1093/biomet/68.2.373.

    Article  Google Scholar 

  19. Wei LJ, Lin DY, Weissfeld L: Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc. 1989, 84: 1065-1073. 10.2307/2290084.

    Article  Google Scholar 

  20. Lee EW, Wei LJ, Amato DA: Cox-type regression analysis for large numbers of small groups of correlated failure time observations. Survival Analysis: State of the Art. 1992, 237-247.

    Chapter  Google Scholar 

  21. Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika. 1986, 73: 13-22. 10.1093/biomet/73.1.13.

    Article  Google Scholar 

  22. Metcalfe C, Thompson SG: The importance of varying the event generation process in simulation studies of statistical methods for recurrent events. Stat Med. 2006, 25: 165-179. 10.1002/sim.2310.

    Article  PubMed  Google Scholar 

  23. Saida T, Ohashi Y, Tashiro K, Itoyama Y, Sato T, Hamaguchi K, Nishitani H, Shibasaki H, Araki S, Research committee for treatment of multiple sclerosis (Chairman: Igata A): Treatment of multiple sclerosis with mizoribine: (1) clinical results of a double-blind, placebo-controlled study. Neuroimmunology. 1998, 6: 160-161.

    Google Scholar 

  24. Wingerchuk DM, Weinshenker BG: The natural history of multiple sclerosis: implications for trial design. Curr Opin Neurol. 1999, 12: 345-349. 10.1097/00019052-199906000-00013.

    Article  CAS  PubMed  Google Scholar 

Pre-publication history

Download references


The authors are grateful to the pharmaceutical company for providing the multiple sclerosis clinical trial data.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Makiko N Mieno.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MNM, TY and YO participated in the design of the study. MNM performed the statistical analysis and drafted the manuscript. All the authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: Appendix: SAS programming codes (example). The SAS sample programs for the use of the regression models shown in this study are provided using a dummy data set. (PDF 171 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Mieno, M.N., Yamaguchi, T. & Ohashi, Y. Alternative statistical methods for estimating efficacy of interferon beta-1b for multiple sclerosis clinical trials. BMC Med Res Methodol 11, 80 (2011).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: