 Technical advance
 Open Access
 Published:
A nonlinear optimisation method to extract summary statistics from KaplanMeier survival plots using the published P value
BMC Medical Research Methodology volume 20, Article number: 269 (2020)
Abstract
Background
Metaanalyses of studies evaluating survival (timetoevent) outcomes are a powerful technique to assess the strength of evidence for a given disease or treatment. However, these studies rely on the adequate reporting of summary statistics in the source articles to facilitate further analysis. Unfortunately, many studies, especially within the field of prognostic research do not report such statistics, making secondary analyses challenging. Consequently, methods have been developed to infer missing statistics from the commonly published KaplanMeier (KM) plots but are liable to error especially when the published number at risk is not included.
Methods
We therefore developed a method using nonlinear optimisation (nlopt) that only requires the KM plot and the commonly published P value to better estimate the underlying censoring pattern. We use this information to then calculate the natural logarithm of the hazard ratio (ln (HR)) and its variance (var) ln (HR), statistics important for metaanalyses.
Results
We compared this method to the Parmar method which also does not require the number at risk to be published. In a validation set consisting of 13 KM studies, a statistically significant improvement in calculating ln (HR) when using an exact P value was obtained (mean absolute error 0.014 vs 0.077, P = 0.003). Thus, when the true HR has a value of 1.5, inference of the HR using the proposed method would set limits between 1.49/1.52, an improvement of the 1.39/1.62 limits obtained using the Parmar method. We also used Monte Carlo simulations to establish recommendations for the number and positioning of points required for the method.
Conclusion
The proposed nonlinear optimisation method is an improvement on the existing method when only a KM plot and P value are included and as such will enhance the accuracy of metaanalyses performed for studies analysing timetoevent outcomes. The nlopt source code is available, as is a simpletouse web implementation of the method.
Background
In many medical studies, the main outcome measured is the time until a specific event occurs, otherwise known as survival or timetoevent data. Analysing this data requires specific statistical methods to account for the fact that only some individuals will experience the event, a process known as censoring [1]. Censoring might occur because an individual has not experienced the event being measured by the end of the study, they are lost to followup during the study, or they experience another event which makes further followup impossible. Common methods used to analyse survival data which adjust for censoring include KaplanMeier plots, logrank tests and Cox (proportional hazards) regression [1].
With many studies reporting survival data published each year, systematic reviews and metaanalyses have become increasingly commonplace, assessing the strength of evidence accrued in aggregate across multiple studies analysing the same factor (e.g. therapeutic intervention or the prognostic role of a particular biomarker). The advantages of metaanalyses include increasing power, improving precision and providing an opportunity to deal with conflicting claims [2].
Although the gold standard for metaanalysis is using individual patient data (IPD), allowing a much more flexible approach to analysing survival data [3], the IPD is not always available and attempts to acquire it can involve a significant investment in time and cost [4]. As such, a metaanalysis based on aggregate data is a reasonable alternative that often generates similar conclusions when compared with an IPD metaanalysis, especially when comparing summary statistics such as ln (HR) and the variance (var.) of ln (HR) [5]. However, aggregate data metaanalyses (hereafter referred to as simply metaanalysis) can be challenging when primary studies fail to report sufficient data and statistics, leading to the exclusion of such studies from secondary analyses. This is of particular concern in research examining the clinical significance of prognostic factors, in which independent studies often report inconsistent or conflicting findings [2], and therefore where secondary analyses would be the most valuable. As a consequence, metaanalyses are often unable to conclude with confidence the role of a particular prognostic factor [6].
Work to improve the quality of prognostic studies has established a simplified checklist of recommendations derived from the REMARK criteria for reporting timetoevent data [7]. Items 15 and 17 of the REMARK criteria highlight the importance of publishing the univariate hazard ratio (HR), confidence intervals (CI) and statistical significance (P value) for timetoevent data. In the case where prognostic studies or any other studytype publish the above, ln (HR) and var. ln (HR) can then be used to calculate an average of ln (HR) with the weights inversely proportional to var. ln (HR) [8].
Although these studies might also report other summary statistics associated with survival, including single points estimates such as the median survival, these have been shown not to be a reliable marker of timetoevent outcomes [9] and thus the recommendation is still to use ln (HR) and var. ln (HR) where available.
Despite the above criteria, not all studies report these statistics [10], especially those published before the establishment of the REMARK criteria in 2005 [11]. As a result, several methods have been developed to try and accommodate data from primary studies which did not originally report either ln (HR) or var. ln (HR) (more commonly both) into metaanalyses. They are all based on estimating these statistics from more commonly reported information included in primary studies involving timetoevent data such as the number of events in each arm, number at risk values, and the KM survival curve itself. In particular, the methods of Parmar et al. [8] have become a widely utilised method of inferring ln (HR) and var. ln (HR) from primary literature. In this study, a hierarchy of methods depending on the reported evidence was presented, including both simple calculations to obtain the var. ln (HR) from reported confidence intervals and more resourceintensive methods to calculate ln (HR) and var. ln (HR) from KaplanMeier (KM) plots. For the latter, survival probabilities from each arm of a KM plot are first extracted at specific time points. Estimations of the minimum and maximum followup times as well as an assumption that patients are censored at a constant rate throughout the study period are then used to calculate ln (HR) and var. ln (HR) by creating pooled estimates.
Since this study was published, other methods have been developed that further increase the accuracy of estimating summary statistics by incorporating additional information, in particular the number at risk data that should be included underneath KaplanMeier plots. For instance, Vale et al. [12] used these values to better estimate the censoring pattern in calculating the odds ratios at fixed time points. In an attempt to establish a framework of common time intervals across trials, Williamson et al. [13] developed a method using the number at risk to improve estimation of ln (HR) by assuming that censoring was constant within time intervals rather than across the whole study (as in the Parmar method). However, by aiming to establish common intervals between trials, some of the survival probabilities were not included in the analysis, therefore Hoyle and Henley [14] extended Williamson’s method to use all the survival probabilities within the time interval stipulated by the number at risk to improve the estimation of ln (HR) and var. ln (HR) [15].
The most recent advance published by Guyot et al. [16] uses the number at risk and total number of events and an iterative numerical approach to more accurately identify the underlying censoring pattern and thus a more accurate estimation of the IPD and summary statistics [15]. However, when these two pieces of information are not included, the method, as acknowledged by the authors performs poorly [16].
Indeed, all these methods, with the exception of the Parmar method, rely on the number at risk data being published to give accurate estimates of ln (HR) and var. ln (HR). Whilst the inclusion of this information is more commonplace than it used to be [12], it can vary significantly depending on the field. Oncology randomisedcontrolled trials tend to be better at reporting associated information, including the number at risk, compared to biomarker studies [10, 15] although this is certainly not the case in every published study [15]. We therefore set out to develop a novel method that did not require inclusion of the number at risk, instead utilising the commonly published Pvalue to improve the accuracy of estimating censoring patterns in primary datasets.
Here we report the implementation of such a method using nonlinear optimisation to estimate ln (HR) and var. ln (HR) and show it improves on estimations using the Parmar method. We believe this method will be useful for metaanalysis studies, in particular for incorporating studies where the number at risk is not published. We have made our method available as an R script, and a simpletouse graphical webapp for researchers to use.
Methods
Derivation of the underlying equations and description of the algorithm
In collecting survival data, study participants either experience the measured event or are censored at a specific time point. The KaplanMeier survival estimate (eq. 1) used to create a KaplanMeier plot is simply the probability of surviving from one interval to the next multiplied together to give the cumulative survival probability [1].
 The KaplanMeier survival estimate
Where S(t_{j}) is the probability of being alive at time t_{j}, e_{j} is the number of events at t_{j} and n_{j} is the number of patients alive just before t_{j}.
If a study is carried out and all participants experience the event such that no censoring occurs, this survival estimate is simply the ratio of the number of individuals event free at time t divided by the number of people who entered the study. Thus, in cases where there is censoring, a combination of the survival probability and an accurate estimation of the number of censored participants would enable the complete KaplanMeier survival table to be reconstructed. In the KaplanMeier survival estimate, the number of censored participants at t_{j} is not formally defined but is contained within the number at risk and can be separated out by defining n_{j} (eq. 2).
 Define n_{j}
With the censor value isolated, the KaplanMeier survival estimate can now be rearranged to calculate the number of events at t_{j} based on the survival probability extracted from the KM plot.
 Rearrange (1) to solve for e_{j}
Finally, substitute (eq. 2) into (eq. 3) to derive an equation that calculates the number of events based on the survival probability and the level of censoring at t_{j}.
 Substitute n_{j} with (2) in (3) to redefine e_{j}
With the above equation, it is now possible to create a table detailing the number of events, degree of censoring, number at risk as well as the expected number of events for each time point. This allows calculation of the HR using (eq. 5) as well as calculating the degree of significance with the logrank test, a widely used method of generating a P value that is commonly published alongside a KM plot.
The hazard ratio function
Where O_{1/2} and E_{1/2} are the observed and expected total number of events for group 1 and 2, respectively.
The next step in the analysis relies on the fact the logrank statistic commonly used to calculate the level of significance in a KM plot is approximately distributed as a chisquare test statistic which itself contains information directly related to the observed and expected number of events in each arm (eq. 6).
The chisquared test statistic
Where O_{i} and E_{i} are the observed and expected total number of events for group i, respectively, with g the number of groups.
A workthrough of the above is presented in Additional file 1 to illustrate the method.
At this point, the censoring pattern is still unknown but there is a fixed point i.e. the Chisquared test statistic which is directly related to these values through the full survival table calculated for the logrank test. Each censoring value thus becomes an unknown number to be solved to satisfy the chisquare test statistic to calculate an optimal solution. Such problems can be solved using nonlinear optimisation (nlopt). Nlopt addresses general nonlinear optimisation problems of the form:
minimise f(x) x in R^{n}
So that
where f is the objective function to be minimised and x represents the n optimisation parameters. Lb and ub represent lower and upper limits for x. g(x) represent the inequality constraint(s), and h(x) represents the equality constraint(s).
In the case of using nlopt to solve this problem, our objective value to solve is the chisquared test statistic calculated as in (eq. 6). Where x is all the censor values from each time point to be solved. We specify lower and upper bounds for x as 0 and infinity respectively. Thus, our objective function, f(x) is all the steps in the table above which contribute to determine this.
The primary equality constraint, h(x) is:
In the case of a nonexact P value, our equality constraint becomes an inequality constraint when the P value is expressed as < than a certain value
In the case where the P value is expressed as more than 0.05, or nonsignificant, we reverse the sign for the chisquare statistic to set a minimum value for a righttailed P value as p = 0.05 (X^{2} =3.841) whilst we also set a maximum limit of p = 0.95 (X^{2} =3.9e3).
Our two additional equality constraints for which h(x) = 0 are the total number of deaths and censoring values for arm 1 and arm 2 and should equal the starting number at risk for each group and represent two additional equality constraints and are represented below for arm 1 and arm 2:
Where, C1 and C2 are censor values at time j in arm 1 and 2 respectively; E1 and E2 are events at time j in arm 1 and 2 respectively and N1_{0} and N2_{0} the starting number at risk in arm 1 and 2. Where n is the number of time points.
Many software packages exist to solve this problem, but the R interface to NLopt (an opensource library for nonlinear optimisation algorithms [17]) is simple to use and can be written using R (Version 3.5.2, Vienna, Austria). There are several optimisation routines available within the NLopt wrapper, but slsqp, a sequential (least squares) quadratic programming (SQP) algorithm was chosen as it supports both equality and inequality constraints. The R scripts can be found at the following online repository: https://gitlab.com/EdGreen21/irvinekm
Model assumptions
Since the tests (logrank and Cox proportional hazards) used to generate the P values required for this method are based on proportional hazards (PH) [18], it should be assumed that this method will perform most optimally under the assumption of proportional hazards. However, unless the assumption of PH is strongly violated, we believe the nlopt method will still perform well. See Discussion for more detail.
Extraction of X,Y coordinates from published KM plots
Points were extracted using the Fiji distribution of ImageJ (version 1.52p; NIH, USA). The KM plot is first loaded into ImageJ and a rectangle corresponding to a known X,Y area drawn within the figure to calibrate the axes using the Figure Calibration Plugin (Frederic V. Hessman, University of Gottingen). To aid reading points from the plot, vertical lines corresponding to specified times were automatically drawn on the figure using a custom script written in Fiji. Due to the design of the method, every time value should have a corresponding y value from arm 1 and 2. The corresponding data should be stored in three columns: Time (t), Arm 1 (y_{1}) Arm 2 (y_{2}). By definition, t_{0} should be 0 and the corresponding survival probability, 1 for each arm. Time values should not be duplicated i.e. they must increase in number each iteration although y1/y2 values can stay the same but obviously not increase (decreasing monotonicity function). Given this, a function to check the user input follows these rules has been included in the R script and online version of the method to ensure spurious results are not outputted by the nlopt method in case a user inadvertently inputs data which invalidates the rules above. This is similar to the input checks included with the writing of the Guyot method as a Stata function [19].
Extracting ln(HR) and var. ln(HR) from the Parmar paper
Figures 2 and 3 from the paper published by Parmar [4] are X, Y graphs plotting ln (HR) and var. ln (HR) values obtained from a comparison of the survival curve method and a direct or indirect estimation of these values. The X axis plots ((survival curve + direct/indirect)/2) whilst the Y axis plots (survival curve direct/indirect). The same method described for extracting values from KM plots was used to extract the X, Y coordinates for these figures. To extract these values from the above, the equations were rearranged to: survival curve ln(HR) or var ln(HR) = \( x+\frac{y}{2} \); Direct/indirect ln(HR) or var = \( x\frac{y}{2} \)
Where x and y represent the extracted values from Figs. 2 and 3 from Parmar et al. [8].
The mean absolute error (MAE) were then calculated as below.
Calculation of mean absolute error and analysis of statistical significance
The MAE for ln (HR) and var. ln (HR) were calculated by subtracting the calculated values from the known, published values and the absolute value taken. The mean percentage absolute values were similarly calculated. A oneway ANOVA with KruskalWallis multiple comparisons test was used to assess the statistical significance in Prism (GraphPad, Version 8).
Evaluating the number and position of points required for an optimal solution
To assess the number of points required for an optimal solution using nlopt, the individual patient data (IPD) from three studies was used [20,21,22,22]. The KaplanMeier survival probability table was first constructed using the “Survival” package in R and the actual time and survival probabilities every time an event or censoring occurred (i.e. the KM survival table) for each arm stored as vectors. A random time value was then taken from one of the survival curves and the corresponding survival value at that specific time value selected. If no identical value was available from the other curve (which was occasionally the case as different groups often experience events at slightly different times), the maximum value either side of this X value was chosen. This in effect mirrors the process required to read the survival probability off a KM plot at different time points. This was iterated at 5point intervals and 100 Monte Carlo simulations carried out using the nlopt method to determine ln (HR) and var. ln (HR) as described. In total, 50 points were used for two studies [20, 22] and 30 for the remaining study which represented the maximum number of timepoints in the actual dataset [21]. Summary statistics from these results were then calculated and the means and standard deviations plotted using Prism (GraphPad, Version 8).
The weighted simulation experiments were carried out using a total of 30 points distributed according to specific weights within the KM plot split into three equal sectors by time according to Table 3.
Benchmarking of the nlopt method
The benchmarking of the nlopt method using an exact and nonexact P value was carried out on a laptop computer with an Intel® Core™ i5–6200 CPU @ 2.30Ghz with 8GB of installed RAM. Benchmarking was calculated as the time difference between “Sys.time()” in R at the start and end of carrying out 100 iterations of each of the 13 KM plots described to validate the nlopt method in Fig. 1. The average of these times was taken and plotted for each KM plot.
Results
Rationale for developing a new method
As is widely acknowledged, many primary studies presenting timetoevent data do not report the necessary statistics to allow their inclusion in an aggregate metaanalysis [2]. However, while reviewing the literature on prognostic factors in lung cancer for a metaanalysis [23] we found numerous primary studies publish KM plots and a P value of unadjusted estimates. To further assess how common various scenarios of data reporting are, we carried out a secondary analysis of a published metaanalysis on prognostic factors in oesophageal adenocarcinoma [24]. This published metaanalysis excluded studies where ln (HR) and var. ln (HR) were not explicitly stated but made the entire database of screened studies available online. Of the 36 studies which were excluded due to a lack of these summary statistics, we encountered five different scenarios of reported data (Table 1).
In total, 17% of studies reported a P value calculated using logrank/Cox regression tests and the total number of events (scenario 1), whilst no study included the individual patient data (IPD). However, in the majority of cases (64%) a KM survival plot with associated P value but no number at risk or total number of events was included (scenario 4). This analysis therefore confirmed that a large number of studies fail to directly report the necessary statistics to carry out a subsequent metaanalysis whilst a common scenario of data reporting in these instances is a KM plot and associated P value but no other information.
A number of methods exist to extract summary statistics in such scenarios. The method developed by Guyot et al. [16] can be used when a KM plot is included but for a good degree of accuracy, as acknowledged by the authors, the number at risk and total number of events also need to be reported. In the above analysis, only 11% of studies included this information. Otherwise, the method by Parmar et al. [8] can be used when a KM plot is included with or without a P value since it is not a requirement of the analysis. However, as the P value/chisquared statistic is inherently related to the hazard ratio, it was theorised that this could be included in any reverse engineering of the KM plot to more accurately estimate ln (HR) and var. ln (HR) by better predicting the underlying censoring pattern. We therefore established a method based on nonlinear optimisation (nlopt) using the survival probabilities from the KM plot and associated P value.
Validating the nonlinear optimisation algorithm
To first validate the nlopt method, thirteen KM plots from 11 different studies were identified in a range of articles published between 1999 and 2019 with a median study length of 60 months (range 1–120 months) (Additional File 2). The time and survival probability for each KM plot were extracted using publicly available opensource software as described in the Methods section, and the nlopt method used to calculate the summary statistics for each KM plot. The resulting estimations of ln (HR) and var. ln (HR) were compared to those obtained using the method outlined by Parmar et al. [8] (Fig. 1a, b) which as previously stated, does not require the number at risk to be published. The only requirement for including a study in this set was that they also directly reported ln (HR) and var. ln (HR) so the method could be validated. Unfortunately, only a single study tested the assumption of PH (Additional File 2); an issue well acknowledged in the literature [25]. The full set of results is recorded in Additional File 3 with a summary of the mean absolute errors (MAE) and mean % absolute errors shown in Table 2.
Two scenarios for reported data were analysed using the nlopt method, one where the P value is and is not exactly stated thus corresponding to scenario 4(a) and 4(b) in Table 1 respectively. In the case of an exact P value, the nlopt method yielded a significantly improved mean absolute error (MAE) of 0.014 for ln (HR) compared to a MAE of 0.077 using the Parmar method (P = 0.003; Fig. 1a) or a mean % absolute error of 6.63% vs 43.47%. The validation of the Parmar method in this study was similar to that reported in their original study which gave a MAE of 0.079 [8]. Thus, for a HR of 1.5, reconstructed HRs differing by a factor of 1.08 and 1.01 would be expected i.e. 1.39/1.62 or 1.49/1.52 for the Parmar and nlopt method, respectively.
As outlined in Table 1, studies occasionally report a P value as a nonexact value such as < 0.01 or > 0.05. Although this is less common (Table 1: 4/23, nonexact P value vs 19/23, exact P value) it is of interest to establish how the nlopt method would perform in these cases. Such a scenario significantly changes the nlopt method as the objective value to solve for becomes an inequality and not an equality constraint as used for an exact P value.
To model the scenario where a P value is not explicitly stated but expressed as less than a certain value, the same thirteen KM plots were analysed but instead of the exact P value used for the analysis, a value one “scale” higher was chosen (Additional file 3). For example, if the exact P value was 0.01 or 0.005, then a nonexact value of < 0.05 or < 0.01 was chosen as the minimum value for the nonlinear optimisation. This approach was taken as it likely reflects the situation in published studies where “scales” of nonexact P values are often quoted i.e. < 0.001 or < 0.01 rather than the exact P values themselves. In cases where the P value was > 0.05, or otherwise ‘nonsignificant’, lower and upper limits of the P value, i.e. 0.05 < p ≤ 0.95 were used. Although the nlopt method using a nonexact P value generally performed well (Fig. 1a), a single outlier from Breslow et al. [26] significantly skewed the results which meant the MAE was similar to that of Parmar (MAE, 0.087 vs 0.077, P = 0.6735). In the case of this specific example, the actual P value was significantly lower than quoted on the KM plot (P < 0.001 vs P = 4.47E39). As might be expected, estimations of ln (HR) improved dependent on their proximity to the true P value (Additional File 4).
The variance of ln (HR) was also calculated using each of the methods and compared to the actual published values (Additional file 3). Unlike the results for ln (HR), the MAEs were similar with values of 0.0038, 0.0039 and 0.0033 for the Parmar, exact and nonexact nlopt methods, respectively, which were not statistically significant (P > 0.05; Fig. 1b).
Evaluating the number of points required for an optimal solution
We next analysed the number of points required to obtain an optimal solution for a particular KM plot. To answer this question, three studies analysing survival data which had also made the complete IPD accessible alongside the survival analysis were chosen [20,21,22,22]. Studies with IPD were used to ensure that there was no inherent error associated with reading the survival probabilities off the KM plot whilst having access to all the time points and the corresponding survival probabilities meant a truly random set of points could be used for the multiple simulations required to test the method.
To assess the number of points required, a set number of random time values ranging from 5 to 50 for two of the studies and 5 to 30 (this represented the entirety of all the timepoints) for the other study at 5point intervals were uniformly sampled along the entire timeframe. The corresponding survival probabilities at each time point were then calculated for both arms of the KM plot and 100 Monte Carlo simulations carried out using the nlopt method to obtain the mean and standard deviation of ln (HR) and var. ln (HR).
We first examined a study in which the majority of events occurred at a largely constant rate during the first third of the time course (Fig. 2a) [20]. In such a scenario, the mean ln (HR) and var. ln (HR) approximated the true values with as few as 5 points although more points were required to minimise the standard deviation. Fig. 2b shows a KM plot in which the majority of events are clustered in the first half of the time series [21], and Fig. 2c a case where events are dispersed across the entire time series [22]. In these cases, estimating ln (HR) and var. ln (HR) from such curves requires more points to be extracted from the KM plot, with estimates for Fig. 2b stabilising around 15 points and Fig. 2c around 30, with additional points further reducing the variance of the estimates in these Monte Carlo simulations.
The same analysis was also carried out with a nonexact P value to determine if more points would be required to achieve a better estimate of the summary statistics. Interestingly, the trend for each of three datasets analysed for ln (HR) and var. ln (HR) was in fact very similar (Fig. 3ac). However, in the case of the last study (Fig. 3c), these estimates were less accurate whilst the standard deviations in the first and third studies were significantly larger than for an exact P value (Fig. 3a, c). Importantly, this analysis therefore shows that taking more points in the case of a nonexact P value does not improve the accuracy of the estimation.
Assessing how the weighting of points in a KM plot affects the estimation of ln (HR) and var. ln (HR)
We next examined where these points should be distributed along the curve to ensure an optimal solution. To carry out this experiment, the timeframe of the first study (Fig. 2a) was split into three equal sectors. A total of 30 points were then distributed in a weighted manner within each of these sectors to create a set of weightedtime series according to Table 3 and 100 Monte Carlo simulations carried out.
In this study the majority of events occur in the first half of the recorded time scale (Fig. 4a). The least accurate estimations of ln (HR) and var. ln (HR) occurred when weighting the extraction of points within sector 3, during which few events occurred (Fig. 4c, d). The remaining weightings differed only very marginally between weighting points in sector 1 or 2 or a uniform distribution of points.
The results of these two experiments therefore show that points extracted from the curve should be weighted towards steeper areas of the curves where there are more events and less points are required where the curves are flat and provide less information to the overall hazard ratio. As a general rule, a simple recommendation would therefore be to take points at approximately every 2% drop in survival probability in any one of the KM curves. This would ensure the event rate as a proportion of the total is relatively small between each time point although knowing this might not be possible in some cases where the number of events might be very large at any one single point in time (large vertical drops in the KM plot).
To validate these rules, another set of five independent datasets were chosen to test the exact nlopt method positioning points for every 2% drop in survival. These are all IPD datasets freely accessible in the R package, ‘survival’ [27] and were chosen as they have a mix of cohort sizes (range 125–7874), study lengths (range 13–167 months) and KM survival probability end points. Figure 5 shows the KM plots in one panel and the exact points used for the analysis in the adjacent panel. The MAE for ln (HR) for this dataset was 0.0098 with a mean absolute percentage error of 7.09%. This again represents a significant improvement over the Parmar method whilst the var. ln (HR) values were similar.
We also validated the assumption of PH for each KM plot by calculating the Schoenfeld residuals (Additional File 5). In three cases, there was a nonsignificant relationship between these residuals and time but in the ‘rats’ dataset these were significant (P = 0.026) whilst the test for the ‘veteran’ dataset almost approached significance (P = 0.07). In both these cases, the nlopt method still produced excellent estimations of ln (HR) and var. ln (HR).
Benchmarking of the nlopt method using an exact and nonexact P value
The final test of the nlopt method in this study was to assess the running time required to output the results of the script. This analysis was carried out on the original series of 13 KM plots used to initially validate the nlopt method as it likely represents a more realworld situation than the simulations carried out for the point testing. The mean running time was 0.18 and 0.36 s using an exact and nonexact P value, respectively (Fig. 6).
Discussion
A variety of methods have been developed to extract data from studies which fail to report summary statistics [2]. The majority of these rely on extracting time and survival probabilities from KM plots and then a set of other information that may or may not be published alongside the graph or study. In the case of Parmar et al. [8], this requires the minimum and maximum followup time but no other information whilst the methods by Williamson et al. [13], Hoyle et al. [14] and Guyot et al. [16] also require the number at risk. In a study comparing the accuracy in estimating ln (HR) of all four of these methods, the Guyot method was the most accurate whilst the method by Parmar least accurate. However, whereas the methods by Guyot, Williamson and Hoyle can be seen as equivalent in terms of the data input required, the Parmar method can be seen as complementary to these as it does not require the number at risk. Thus, to ensure the most accurate method is used in each scenario, Guyot should be used if the number at risk is known and the Parmar method if not.
In establishing a protocol for a metaanalysis on prognostic factors in nonsmall cell lung cancer, we realised that the number at risk is only infrequently included thus necessitating the use of the Parmar method. During the course of carrying out this analysis, it was realised that the P value is commonly stated in studies whether the number at risk is included or not, and is inherently linked to the KM survival table and thus ln (HR) and var. ln (HR). Given this, it was hypothesised that the P value could be used to improve the accuracy of estimating summary statistics in the absence of the number at risk.
In developing such a method, the first step is appreciating that the KM plot essentially provides all the necessary information other than the censoring pattern and time the event occurred and thus various solutions to estimate the former are required. The Parmar method assumes a constant rate of censoring across the entire study whereas the Guyot method assumes a constant rate of censoring between each published number at risk. The concept used in this method does not make assumptions about the censoring pattern but rather uses the P value as a fixed point and the relationship between this and the KM survival table to create a series of unknown values corresponding to the censor values at each time point to solve for. Defining the problem as such allows estimates of these unknown values to be calculated using the mathematical technique, nonlinear optimisation.
Validating the nlopt method using an exact P value showed it is a significant improvement on the Parmar method. Reassuringly, the average ln (HR) and var. ln (HR) calculated using the Parmar method in this study (Table 3) was similar to that published in the original study suggesting there was no systemic error in the way the Parmar method was implemented. As might be expected, the Parmar method did perform well in some instances (see Addititonal file 2 and KM plots from [28, 29]), likely because the censoring was indeed constant throughout the study.
However, in the case of a nonexact P value, although the method generally performed well, a large error occurred in a single study when the quoted P value was significantly larger than the actual P value. Such a scenario might be expected when using nonlinear optimisation with a nonexact P value as two types of solution can exist in optimisation methods: local and global optima. The former is an optimal solution only with respect to feasible solutions local to the objective value (i.e. the P value) whereas the latter can be considered to be the overall solution to the optimisation problem. In the context of the nonexact P value, the further the quoted P value is away from the actual value, the more likely only a local solution is likely to be found and thus a larger error is to be expected. Indeed, using a P value progressively closer to the actual value improved the estimate of ln (HR) (Additional File 4). Although the nlopt method using a nonexact P value in the remaining studies was more accurate than the Parmar method, since it would never be clear how far away the actual P value is from the quoted value in the study, we would not recommend using it in place of the Parmar method.
As has been acknowledged previously, the number and position of points is an important aspect of ensuring an optimal solution is obtained in methods extracting information from KM plots [8, 16]. The simulation experiments showed that an optimal solution varies with the number and position of points. In this particular method, the role of the time points is to ensure there are enough constraints for the nonlinear optimisation to arrive at a global optimum. In some cases, even as few as 5 points (Fig. 2a) selected randomly in some of the Monte Carlo simulations proved sufficient for this when that particular set of points served as constraints which successfully minimised the objective function whilst in other cases, more points were required (Fig. 2b, c). However, since it would not be possible to determine which set of 5 points would create this scenario, a better solution would be to provide the nonlinear optimisation model with enough points to ensure an optimal solution is found each time. As would be expected, the weighting experiments showed that flatter areas of the curve provide less information so points should be weighted towards steeper areas of the curve which provide more information helping to create enough unique constraints for the nonlinear optimisation to arrive at this optimum. Given this, a general rule would be to take points for every 2% drop in survival probability to ensure enough points are taken for an optimal solution. In most cases this would mean between 20 and 50 points are required for any 1 KM plot. From a practical point of view, this also means an exhaustive list of points is not required, meaning the nlopt method is simple to use.
In terms of the last timepoint, this should be equal to the last point on the KM plot for both arms. Where this is not possible, i.e. one arm finishes before the other, the latter time point for either arm should be taken and the survival probablity for the other arm at its last time point used.
Although we did not directly test the relationship between the number at risk or the time length of any one individual study and estimating summary statistics, it is clear from analysing the KM plots that these are not directly contributing factors. Indeed, the largest study with over 7000 observations only required 15 points for a highly accurate estimate of the summary statistics (Fig. 5c). Moreover, longer or shorter study lengths per se do not affect the number of points required although one might expect more events (thus lower survival probabilities) in studies that have followed up their cohort for a greater length of time thus necessitating extracting more points. Our analysis has shown that the crucial determinant is ensuring the survival curve is adequately sampled along its length regardless of the size or timeframe of the study.
Importantly, the point testing simulations using a nonexact P value also showed that taking more points does not increase the accuracy of the estimation with an optimal solution for that particular P value occurring after a similar number of points. Thus, in the case of the single study where the nonexact method resulted in a large error, taking more points will likely not increase the accuracy of the estimate. Given this, we would recommend that the nlopt method using a nonexact P value is not used in preference to the Parmar method. However, since more studies (Table 1) publish an exact P value than a nonexact one, we believe the nlopt method will still be applicable in a significant number of cases.
In terms of assessing how the nlopt method would perform in studies which violate the PH assumption, this was difficult to test experimentally as only one study in the first dataset tested the PH assumption. We were able to calculate the Schoenfeld residuals for the second IPD validation datasets but since we were only able to find one dataset which violated PH and another which came close (Fig. 5d, e), we were unable to make any statistical inferences from this. Qualitatively though, in both cases, the nlopt method performed well (Fig. 5d, e). Since the nlopt method relies on a P value being calculated using tests that require PH [18], we would assume that it would perform better in cases where PH is not violated; however, as shown in the examples above, we believe unless the study strongly violates the assumption of PH, the nlopt method will still perform well. In such cases where the PH is clearly violated, we would expect the authors of the primary study to have stated estimates of comparing survival curves other than the hazard ratio [30].
Although the nlopt method was initially intended for studies which do not report the number at risk and thus an alternative to the Guyot method, it is of interest to see how the two compare. The MAEs presented by Guyot et al. [16] for each different scenario of data presentation are also included in Table 4. Interestingly, even in the scenario where all the information required for the Guyot method is presented, i.e. number at risk and total number of events, the nlopt method using an exact P value gives a more accurate estimation of ln (HR) (MAE, 0.0014 vs 0.0017) although the Guyot method is more accurate in estimating var. ln (HR) (MAE, 0.0039 vs 0.0026). However, when only one of these pieces of information is included the ln (HR) error increases (MAE, No risk: 0.036; No events: 0.028) whilst the var. ln (HR) error is variable (MAE, No risk: 0.0015; No events: 0.0065). Interestingly, Saluja et al. [15] observed in a series of oncology randomisedcontrolled trials, the total number of events is rarely reported even if the number at risk is, thus the Guyot method will commonly default as just the number at risk. In cases where no information is provided, the Guyot method performs worse than the Parmar method and substantially worse than the nlopt method described in this paper (Table 4).
In terms of deciding which method should be used to extract ln (HR) and var. ln (HR) from individual studies for use in a metaanalysis, different studies provide varying pieces of information and the most accurate method in estimating both ln (HR) and var. ln (HR) should be used in each individual case. For example, in the scenario where an exact P value is provided, the nlopt method described in this study could be used in all cases unless the number at risk and total number of events for each arm is known (this maximises both ln (HR) and var. ln (HR)). In the case where a nonexact P value is quoted, we would recommend that the nlopt method is not used at all. As such, the current, existing methods should be used depending on whether or not the number at risk is included or not.
When carrying out a metaanalysis, authors should quote the MAE associated with the method they used and could also perform a subset analysis on studies requiring extraction methods to determine if they significantly deviate from the other included studies or not.
There are several limitations of the proposed method. Firstly, if the P value or chisquare test statistic is not published alongside the KM plot this method cannot be used as the P value represents the value that the objective function has to be minimised on. As previously explained, we would also not recommend using this method when a nonexact P value is quoted.
As others have commented with similar methods, the quality of the initial input is important to ensure an optimal solution is obtained, particularly ensuring the points are accurately extracted from a highquality KM plot [16].
Furthermore, although this method was primarily designed to extract ln (HR) and var. ln (HR) from KaplanMeier curves, which it does with good accuracy, it cannot currently output the IPD from these results. This is in contrast to the methods developed by Hoyle et al. [14] and Guyot et al. [16] whose aim was to generate the IPD from the KM plots to facilitate further secondary analysis, and not simply calculate ln (HR) and var. ln (HR) for use in a metaanalysis. This has numerous advantages, for example allowing different parametric survival models to be fitted to the IPD [3].
Indeed, future research using the nlopt method could focus on creating IPD from the current output as well as using the remaining information contained within a published study to increase the accuracy and reliability of the method. In fact, in cases where researchers want to recapitulate IPD from a particular KM plot, any of the P value, hazard ratio, number at risk (where available) could be used as further constraints to improve upon existing methods as necessary.
As stated before, methods required to extract summary statistics such as ln (HR) and var. ln (HR) from published studies are only required when authors fail to report them [25], often in contrast to guidelines recommended to ensure good practice [11]. Thus, authors should be encouraged to publish as much associated information as possible when analysing survival data to facilitate these secondary analyses. Examples of such data include but are not limited to univariate hazard ratios and the corresponding variance, total number of events, number at risk values at regular intervals along the KM plot as well as exact P values or chisquare statistics.
Conclusion
The objective of this article was to present a new, more accurate method for extracting ln (HR) and var. ln (HR) for aggregate metaanalyses when studies do not publish the number at risk alongside KM plots. The proposed method outperforms the current existing method and we have also produced guidance for users in choosing the position and number of points to achieve optimal estimations. In addition to the R scripts available at an online repository, a publiclyavailable, freetouse web version of the software can be found at https://edgreen21.shinyapps.io/km_hr/. This is simple to use, requiring no previous programming experience, only requiring users to extract time/survival points from a KM plot using the above guidance and upload this data along with the published P value. The output includes the KM table and a selection of summary statistics including ln (HR) and var. ln (HR). The online version provides instructions on how to extract X,Y points from KM plots as well as a guide on how to use the webversion of the script.
Availability of data and materials
The repository for the R scripts can be found at: https://gitlab.com/EdGreen21/irvinekm
The web app based on the R script can be found at: https://edgreen21.shinyapps.io/km_hr/
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Abbreviations
 HR:

Hazard Ratio
 IPD:

Individual Patient Data
 KM:

KaplanMeier
 MAE:

Mean Absolute Error
 SE:

Standard Error
 SD:

Standard Deviation
 Var:

Variance
 Nlopt:

Nonlinear optimisation
 PH:

Proportional Hazards
References
 1.
Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br J Cancer. 2003;89(2):232–8. https://doi.org/10.1038/sj.bjc.6601118.
 2.
Riley RD, Moons KGM, Snell KIE, Ensor J, Hooft L, Altman DG, et al. A guide to systematic review and metaanalysis of prognostic factor studies. BMJ. 2019;364:k4597. https://doi.org/10.1136/bmj.k4597.
 3.
Riley RD, Lambert PC, AboZaid G. Metaanalysis of individual participant data: rationale, conduct, and reporting. BMJ. 2010;340:c221. https://doi.org/10.1136/bmj.c221.
 4.
AboZaid G, Sauerbrei W, Riley RD. Individual participant data metaanalysis of prognostic factor studies: state of the art? BMC Med Res Methodol. 2012;12:56. https://doi.org/10.1186/47122881256.
 5.
Huang Y, Tang J, Tam WW, Mao C, Yuan J, Di M, et al. Comparing the overall result and interaction in aggregate data metaanalysis and individual patient data metaanalysis. Medicine (Baltimore). 2016;95(14):e3312. https://doi.org/10.1097/MD.0000000000003312.
 6.
Sutcliffe P, Hummel S, Simpson E, Young T, Rees A, Wilkinson A, et al. Use of classical and novel biomarkers as prognostic risk factors for localised prostate cancer: a systematic review. Health Technol Assess. 2009;13(5):iii xixiii 1–219.
 7.
Sauerbrei W, Taube SE, McShane LM, Cavenagh MM, Altman DG. Reporting recommendations for tumor marker prognostic studies (REMARK): an abridged explanation and elaboration. J Natl Cancer Inst. 2018;110(8):803–11. https://doi.org/10.1093/jnci/djy088.
 8.
Parmar MK, Torri V, Stewart L. Extracting summary statistics to perform metaanalyses of the published literature for survival endpoints. Stat Med. 1998;17(24):2815–34.
 9.
Michiels S, Piedbois P, Burdett S, Syz N, Stewart L, Pignon JP. Metaanalysis when only the median survival times are known: a comparison with individual patient data results. Int J Technol Assess Health Care. 2005;21(1):119–25. https://doi.org/10.1017/s0266462305050154.
 10.
Sekula P, Mallett S, Altman DG, Sauerbrei W. Did the reporting of prognostic studies of tumour markers improve since the introduction of REMARK guideline? A comparison of reporting in published articles. PLoS One. 2017;12(6):e0178531. https://doi.org/10.1371/journal.pone eCollection 2017.
 11.
McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, Clark GM. REporting recommendations for tumour MARKer prognostic studies (REMARK). Br J Cancer. 2005;93(4):387–91. https://doi.org/10.1038/sj.bjc.6602678.
 12.
Vale CL, Tierney JF, Stewart LA. Effects of adjusting for censoring on metaanalyses of timetoevent outcomes. Int J Epidemiol. 2002;31(1):107–11. https://doi.org/10.1093/ije/31.1.107.
 13.
Williamson PR, Smith CT, Hutton JL, Marson AG. Aggregate data metaanalysis with timetoevent outcomes. Stat Med. 2002;21(22):3337–51. https://doi.org/10.1002/sim.303.
 14.
Hoyle MW, Henley W. Improved curve fits to summary survival data: application to economic evaluation of health technologies. BMC Med Res Methodol. 2011;11:139. https://doi.org/10.1186/471228811139.
 15.
Saluja R, Cheng S, Delos Santos KA, Chan KKW. Estimating hazard ratios from published KaplanMeier survival curves: a methods validation study. Res Synth Methods. 2019;10(3):465–75. https://doi.org/10.1002/jrsm.362 Epub 2019 Jun 24.
 16.
Guyot P, Ades AE, Ouwens MJ, Welton NJ. Enhanced secondary analysis of survival data: reconstructing the data from published KaplanMeier survival curves. BMC Med Res Methodol. 2012;12:9. https://doi.org/10.1186/4712288129.
 17.
The Nlopt nonlinearoptimization package. http://github.com/stevengj/nlopt. Accessed 1 Apr 2020.
 18.
Royston P, ChoodariOskooei B, Parmar MKB, Rogers JK. Combined test versus logrank/cox test in 50 randomised trials. Trials. 2019;20(1):172. https://doi.org/10.1186/s1306301932515.
 19.
Wei Y, Royston P. Reconstructing timetoevent data from published KaplanMeier curves. Stata J. 2017;17(4):786–802.
 20.
Kilvaer TK, Rakaee M, Hellevik T, Ostman A, Strell C, Bremnes RM, et al. Tissue analyses reveal a potential immuneadjuvant function of FAP1 positive fibroblasts in nonsmall cell lung cancer. PLoS One. 2018;13(2):e0192157.
 21.
Wei L, Lin D, Weissfeld L. Regression analysis of multivariate incomplete failure timedata by modeling marginal distributions. J Am Stat Assoc. 1989;84(408):1065–73.
 22.
Moertel CG, Fleming TR, Macdonald JS, Haller DG, Laurie JA, Tangen CM, et al. Fluorouracil plus levamisole as effective adjuvant therapy after resection of stage III colon carcinoma: a final report. Ann Intern Med. 1995;122(5):321–6. https://doi.org/10.7326/00034819122519950301000001.
 23.
Irvine AF, Waise S, Stuart B, Kendrick T, Thomas G. Prognostic value of immunohistochemical detection of carcinomaassociated fibroblasts in nonsmall cell lung cancer: a systematic review and metaanalysis Prospero, International Register of Systematic Reviews 2019. Available from: https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=130307.
 24.
McCormick Matthews LH, Noble F, Tod J, Jaynes E, Harris S, Primrose JN, et al. Systematic review and metaanalysis of immunohistochemical prognostic biomarkers in resected oesophageal adenocarcinoma. Br J Cancer. 2015;113(1):107–18. https://doi.org/10.1038/bjc.2015.179 Epub Jun 25.
 25.
Guyot P, Welton NJ, Ouwens MJ, Ades AE. Survival time outcomes in randomized, controlled trials and metaanalyses: the parallel universes of efficacy and costeffectiveness. Value Health. 2011;14(5):640–6. https://doi.org/10.1016/j.jval.2011.01.008 Epub May 31.
 26.
Breslow NE, Chatterjee N. Design and analysis of twophase studies with binary outcome applied to Wilms tumour prognosis. Appl Stat. 1998;48(4):457–68.
 27.
Therneau T (2020). A package for survival analysis in R. R package version 3.1–12, https://CRAN.Rproject.org/package=survival.
 28.
Little P, Stuart B, Moore M, Coenen S, Butler CC, GodyckiCwirko M, et al. Amoxicillin for acute lowerrespiratorytract infection in primary care when pneumonia is not suspected: a 12country, randomised, placebocontrolled trial. Lancet Infect Dis. 2013;13(2):123–9. https://doi.org/10.1016/S4733099(12)703006 Epub 2012 Dec 19.
 29.
Davies A, Cummin TE, Barrans S, Maishman T, Mamot C, Novak U, et al. Geneexpression profiling of bortezomib added to standard chemoimmunotherapy for diffuse large Bcell lymphoma (REMoDLB): an openlabel, randomised, phase 3 trial. Lancet Oncol. 2019;20(5):649–62. https://doi.org/10.1016/S4702045(18)309355 Epub 2019 Apr 1.
 30.
Uno H, Claggett B, Tian L, Inoue E, Gallo P, Miyata T, et al. Moving beyond the hazard ratio in quantifying the betweengroup difference in survival analysis. J Clin Oncol. 2014;32(22):2380–5. https://doi.org/10.1200/JCO.2014.55.208 Epub Jun 30.
 31.
Bonner JA, Harari PM, Giralt J, Azarnia N, Shin DM, Cohen RB, et al. Radiotherapy plus cetuximab for squamouscell carcinoma of the head and neck. N Engl J Med. 2006;354(6):567–78. https://doi.org/10.1056/NEJMoa053422.
 32.
Hanley CJ, Mellone M, Ford K, Thirdborough SM, Mellows T, Frampton SJ, et al. Targeting the Myofibroblastic CancerAssociated Fibroblast Phenotype Through Inhibition of NOX4. J Natl Cancer Inst. 2018;110(1):4060751. https://doi.org/10.1093/jnci/djx121.
 33.
Seymour MT, Brown SR, Middleton G, Maughan T, Richman S, Gwyther S, et al. Panitumumab and irinotecan versus irinotecan alone for patients with KRAS wildtype, fluorouracilresistant advanced colorectal cancer (PICCOLO): a prospectively stratified randomised trial. Lancet Oncol. 2013;14(8):749–59. https://doi.org/10.1016/S4702045(13)701633 Epub 2013 May 29.
 34.
Rakaee M, Busund LR, Jamaly S, Paulsen EE, Richardsen E, Andersen S, et al. Prognostic value of macrophage phenotypes in Resectable nonsmall cell lung Cancer assessed by multiplex immunohistochemistry. Neoplasia. 2019;21(3):282–93. https://doi.org/10.1016/j.neo.2019.01.005 Epub Feb 10.
 35.
Bosch LJW, Trooskens G, Snaebjornsson P, Coupe VMH, Mongera S, Haan JC, et al. Decoy receptor 1 (DCR1) promoter hypermethylation and response to irinotecan in metastatic colorectal cancer. Oncotarget. 2017;8(38):63140–54. https://doi.org/10.18632/oncotarget.702 eCollection 2017 Sep 8.
 36.
Seligmann JF, Elliott F, Richman SD, Jacobs B, Hemmings G, Brown S, et al. Combined Epiregulin and Amphiregulin expression levels as a predictive biomarker for Panitumumab therapy benefit or lack of benefit in patients with RAS wildtype advanced colorectal Cancer. JAMA Oncol. 2016;2(5):633–42. https://doi.org/10.1001/jamaoncol.2015.6065.
 37.
Corrie PG, Marshall A, Nathan PD, Lorigan P, Gore M, Tahir S, et al. Adjuvant bevacizumab for melanoma patients at high risk of recurrence: survival analysis of the AVASTM trial. Ann Oncol. 2018;29(8):1843–52. https://doi.org/10.1093/annonc/mdy229.
 38.
Therneau TM, Grambsch PM. Modeling survival data : extending the cox model. United States: Springer; 2000.
 39.
Kyle RA. “Benign” monoclonal gammopathyafter 20 to 35 years of followup. Mayo Clin Proc. 1993;68(1):26–36. https://doi.org/10.1016/s00256196(12)600159.
 40.
Kyle RA, Therneau TM, Rajkumar SV, Larson DR, Plevak MF, Offord JR, et al. Prevalence of monoclonal gammopathy of undetermined significance. N Engl J Med. 2006;354(13):1362–9. https://doi.org/10.1056/NEJMoa054494.
 41.
Mantel N, Bohidar NR, Ciminera JL. MantelHaenszel analyses of littermatched timetoresponse data, with modifications for recovery of interlitter information. Cancer Res. 1977;37(11):3863–8.
 42.
Kalbfleisch D, Prentice R. The statistical analysis of failure time data. New York: Wiley; 1980.
Acknowledgements
AI would like to thank Professor Tony Kendrick for creating a supportive research environment in which to work in as well as Mr. Patrick Irvine.
Funding
AI was supported by a University of Southampton National Institute of Health Research Academic Foundation Programme.
SW was supported by Cancer Research UK and Medical Research Council Clinical Research Training Fellowships (MR/R001286/1) and a Pathological Society Trainees’ Small grant.
Author information
Affiliations
Contributions
AI developed, tested and validated the algorithm. SW helped validate the algorithm. EG wrote the webapp. AI and EG wrote the paper. BS provided critical support throughout. All authors read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1.
Construction of the KaplanMeier survival table based on the equations defined in the Methods section. The calculations for 5 timepoints are shown but this would extend for n number of time points. By definition, t_{0} is the start of the trial so the survival probability is 1 and there are no events or censor values. The number at risk at t_{0} is the starting number of trial participants in each arm. For clarity, the columns for survival probability, censor values, events and number at risk are only presented for Arm 1. The equations for Arm 2 would otherwise be the same. The method works by first constructing the table above assuming no censoring. The corresponding Chisquare statistic or P value is then used as a fixed point for the nonlinear optimisation algorithm to calculate the censor values (C_{1}, C_{2}, .., C_{n}) by iterating through possible values to satisfy the fixed value to come to an optimal solution. Ln (HR) and var. ln (HR) are then calculated based on these updated values.
Additional file 2.
Summary of the length and status of the proportional hazards assumption in the included studies. Table of the length (months) of each study and whether the proportional hazards assumption had been checked or not from the 13 KaplanMeier plots used to validate the nlopt method [1, 26, 28, 29, 31,32,33,34,35,36,37].
Additional file 3.
Complete table of summary statistics extracted from 13 KaplanMeier plots comparing methods. This includes ln (HR), variance ln (HR), the lower and upper confidence intervals (95%), the P value, the nonexact P value used with nlopt and chisquared value. This compares the Parmar method, nlopt method from this study (both exact and nonexact P value) and the actual statistics extracted from each published study [1, 26, 28, 29, 31,32,33,34,35,36,37].
Additional file 4.
Quoted P values nearer to the actual P values better approximate ln (HR) using the nlopt method with a nonexact P value. The nlopt method using a nonexact P value was used to calculate ln (HR) and var. ln (HR) with a range of nonexact P values. This started at the value quoted on the KM plot (1e4) to values progressively closer to the actual value (4.48e39).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Irvine, A.F., Waise, S., Green, E.W. et al. A nonlinear optimisation method to extract summary statistics from KaplanMeier survival plots using the published P value. BMC Med Res Methodol 20, 269 (2020). https://doi.org/10.1186/s1287402001092x
Received:
Accepted:
Published:
Keywords
 Survival analysis
 KaplanMeier plot
 Metaanalysis
 Algorithm
 Life table
 Nonlinear optimisation