This study with IPD from both real-world patients and trial participants showed that through the arisen possibility of multivariable modeling potential causative factors for an efficacy-effectiveness gap can be identified. For OS, the HR for real-world versus trials moved to 1.07 (0.83-1.83) after adjustment, suggesting that differences in the available characteristics between the two settings partly explain the altered OS seen in real-world practice. The latter phenomenon was not observed for PFS, suggesting that for that outcome other unmeasured factors are involved.
The median PFS of real-world patients was longer compared to trial patients, resulting in an HR for PFS below 1.00. Although ECOG PS was statistically significant in the multivariate Cox analyses, the adjusted HR between real-world and trial patients did not change. The etiology for this gap in PFS is believed to be multifactorial, with contributing factors including differences in patient populations, healthcare delivery, and variability in the experience of treating health care providers. Multiple factors which could explain differences in patient populations were measured but did not lead to a difference in HR. Unmeasured factors involving PFS could be smoking status, comorbidities, and frailty. Previous research also showed that use of corticosteroids and the number of organs with metastases are associated with PFS [14]. Healthcare delivery was different in terms of response measurement. According to the original Checkmate-057 trial study protocol, response was evaluated in week 9 after nivolumab initiation and every 6 weeks thereafter [15]. In real-world practice, response was assessed every 8 weeks. This led to visible drops in the Kaplan-Meier for PFS of trial patients, while these are less obvious in the real-world PFS (supplement 1). Furthermore, measuring progressive disease using the Response Evaluation Criteria in Solid Tumors (RECIST)- criteria can be less structured and strict in real-world than in trial patients [16]. In clinical practice, the immune responses assigned using RECIST (iRECIST) criteria are used, which include unconfirmed progression [17]. Consequently, conclusions about progressive disease might be delayed in clinical practice what could result in considering possibilities for subsequent systemic treatment later as well. Hypothetically, real-world patients remain treated with nivolumab while with progressive disease, in turn leading to further clinical deterioration reducing the tolerability of subsequent docetaxel, eventually leading to the inverse of the HR for overall survival.
In contrast to PFS, the non-significant difference in OS between real-world and trial shifted towards a null effect after adjustment for the available characteristics in the data (aHR of 1.07 (95%CI, 0.83-1.38)). This suggests that differences in ECOG PS and presence of brain metastases are linked to the observed shorter OS in real-world practice.
Apart from the beforementioned potential, this study also confirms the results using the standard approach of trial and real-world comparison using software applications. The unadjusted calculated HRs for PFS and OS in the study of Cramer-van der Welle et al are identical to the findings of this study using IPD [10].
A strength of our study we consider the quality of the real-world data. Data were manually extracted from electronic healthcare records and with very few missing data. An exception is the PD-L1 expression status which was often missing in real-world (48.9%) since it is not mandatory to measure this before nivolumab treatment in second line. We therefore could not use this factor in the multivariate analyses. Besides this, we could also not test for smoking status that in the Checkmate-057 study was an effect modifier (less effect in never smokers). On the other hand, we expect most patients to be current or past smokers. Altogether, we argue that most of the characteristics with the high prognostic value were included in the analyses [4]. A possible limitation was that the trial data only included PFS and OS calculated from the date of randomization and not from the start of nivolumab treatment as in real-world practice. However, as stated in the RCT protocol, nivolumab treatment should be initiated within three business days after randomization [15]. This very short period is unlikely to affect the outcomes of this study and will not introduce bias in the comparison with the Cramer et al. paper because that study calculated survival times similarly. Finally, we focused in this study on the relative changes in the HR and not on significancy. In case only aggregated trial data are available, a covariate balancing method analogous to propensity score weighting could be used [18].
In the present study we assessed the value of IPD with second line nivolumab, while Cramer-van der Welle et al. also reported a significant impaired OS in real-world with first line pembrolizumab. Unfortunately, due to unavailability of trial IPD on pembrolizumab, we could not assess what the added value of adjustment with IPD would be for that regimen. The European Medicines Agency (EMA) started an initiative to publish clinical trial data submitted to EMA as part of marketing authorization applications [19]. At the moment, trial data on COVID-19 medicines do become publicly available [20]. Hopefully, initiatives from the EMA and others like ClinicalStudyDataRequest.com will help to improve the availability of much more clinical trial data, considering the privacy of patients included in the trial, to allow better identification of factors associated with an efficacy-effectiveness gap (if any), in turn facilitating individualized prognoses and treatment planning [21,22,23].