New methods for estimating follow-up rates in cohort studies

Background The follow-up rate, a standard index of the completeness of follow-up, is important for assessing the validity of a cohort study. A common method for estimating the follow-up rate, the “Percentage Method”, defined as the fraction of all enrollees who developed the event of interest or had complete follow-up, can severely underestimate the degree of follow-up. Alternatively, the median follow-up time does not indicate the completeness of follow-up, and the reverse Kaplan-Meier based method and Clark’s Completeness Index (CCI) also have limitations. Methods We propose a new definition for the follow-up rate, the Person-Time Follow-up Rate (PTFR), which is the observed person-time divided by total person-time assuming no dropouts. The PTFR cannot be calculated directly since the event times for dropouts are not observed. Therefore, two estimation methods are proposed: a formal person-time method (FPT) in which the expected total follow-up time is calculated using the event rate estimated from the observed data, and a simplified person-time method (SPT) that avoids estimation of the event rate by assigning full follow-up time to all events. Simulations were conducted to measure the accuracy of each method, and each method was applied to a prostate cancer recurrence study dataset. Results Simulation results showed that the FPT has the highest accuracy overall. In most situations, the computationally simpler SPT and CCI methods are only slightly biased. When applied to a retrospective cohort study of cancer recurrence, the FPT, CCI and SPT showed substantially greater 5-year follow-up than the Percentage Method (92%, 92% and 93% vs 68%). Conclusions The Person-time methods correct a systematic error in the standard Percentage Method for calculating follow-up rates. The easy to use SPT and CCI methods can be used in tandem to obtain an accurate and tight interval for PTFR. However, the FPT is recommended when event rates and dropout rates are high. Electronic supplementary material The online version of this article (10.1186/s12874-017-0436-z) contains supplementary material, which is available to authorized users.


Background
The follow-up rate, a standard index of the completeness of follow-up, is important for assessing the adequacy of a prospective or retrospective longitudinal cohort dataset for research purposes. In particular, a low follow-up rate raises concerns regarding the possibility of informative censoring, bias and diminishing statistical power [1][2][3][4][5]; concerns that increase incrementally with the extent of participant dropout from the cohort [5][6][7][8][9][10][11][12]. Common sources of "loss-to-follow-up" include, death due to causes other than the endpoint of interest, patient withdrawal, as well as other reasons for dropout, such as a change in at-risk status (e.g., undergoing a hysterectomy during a study of cervical cancer). For simplicity, in this paper we refer to all loss-to-follow-up and censoring due to any causes other than the event of interest or the end of the study as dropout.
Methods to accurately assess follow-up rates are likely to be of growing importance during the current, expanding era of electronic medical records (EMRs). That is, hospital and outpatient databases are increasingly being exploited for research purposes, but require careful scrutiny to determine whether they are truly adequate for use in scientific studies. Patients in routine clinical practice may be more likely than research volunteers in a prospective cohort to seek care from multiple, unaffiliated providers, leading to low follow-up rates observed at a specific health care facility, raising particular concerns regarding informative censoring. Investigators may therefore need to screen through multiple potential clinics or other sources of EMR data to find an appropriate population with adequate follow-up data.
Thus, while there are many sources of potential bias, the follow-up rate provides a quick and easy tool to initially screen potential retrospective clinical cohorts prior to doing more in depth evaluation of the adequacy of the data. Both the researchers and journal reviewers should therefore routinely examine the follow-up rate in an EMR-based study over a period of observation relevant to the study question.
The most commonly used method to assess the completeness of the follow-up, recommended by Cochrane Handbook [13] and the CONSORT guidelines [14] and often referred to as the "Percentage Method" [15], involves simply calculating the proportion of subjects present at baseline (e.g., enrollment) who remained through the end of the study interval or developed the event of interest by the end of the interval [7,13,14,16]. However, this definition is "naïve" in that it does not distinguish subjects who dropped out early during a study from subjects who dropped out late in the study. In fact, the Percentage Method essentially assumes that all the subjects who were lost to follow-up were lost at the very beginning of the study, and therefore can severely underestimate the follow-up rate in a cohort, leading to a false conclusion regarding the quality of the data.
Several attempts have been made to improve upon the Percentage Method for assessing the degree of followup. For example, the median follow-up time has been used as a measure to examine the length of follow-up. However, there have been disagreements regarding how the median follow-up time should be calculated: whether it should be calculated among all subjects, only dropouts, or other variations, each has its limitations [17][18][19][20]. Further, there is an increasing recognition that the median follow-up time does not directly measure the "completeness of the follow-up": e.g., the median follow-up can be low with excellent followup, and it can be high with poor follow-up [18,[20][21][22]. While time to event studies must have sufficient length of follow-up to capture enough events in order to have sufficient statistical power, as we mentioned earlier, poor follow-up raises concern on the validity of the study. Thus, to assess adequate of follow-up for a cohort study, we need to examine both the length and the completeness of follow-up.
Alternatively, a reverse Kaplan-Meier (KM) survival curve has also been used to assess the length as well as the completeness of the follow-up, which is constructed by reversing "censor" and "event" [18]. However, as explained in detail below, because the reverse KM method treats the events of interest as censoring, it exaggerates the cumulative loss to follow-up rate. In addition, a measure of follow-up completeness proposed by Clark et al. [21], which we explained more later, fails to account for possible events that could have occurred among those who were lost to follow-up if they had remained in the study. Further, the accuracy of this method, to our knowledge, was never formally examined using simulations.
In this paper, we review major existing methods for estimating follow-up, and propose a new person-time follow-up rate (PTFR)essentially, the observed person-time divided by the person-time assuming no dropoutsto address the limitations we found with existing methods. We then describe two methods to estimate PTFR. Simulation studies are used to examine the accuracy of the proposed methods and the existing methods, and each method is applied to a real-world prostate cancer recurrence "retrospective cohort" study based on EMR data [23].

Existing measures for following-up rates
Consider a cohort of size N, and that T i and C i represent the time to the development of event of interest and the censoring time for the ith subject, respectively, i = 1,2,…,N. For simplicity, we assume the study ends at a specified time,τ.
Standard "percentage method" The Percentage Method η percentage defines the follow-up rate as In brief, this method calculates the fraction of all enrollees who either developed the outcome of interest or were censored at τ. Note that although participants dropped out at different times, the percentage method essentially considers their follow-up time as zero no matter how long they contributed person-time to the study-systematically underestimating the true follow-up. To help illustrate these points, Fig. 1 provides a simple example of a hypothetical cohort of 100 subjects who were followed and assessed with annual visits for three years. There were 10, 5 and 5 outcome events in the 1st, 2nd and 3rd year, respectively with 40 dropouts in the 1st year in scenario (A) and in the 3rd year in scenario (B). The Percentage Method estimates follow-up rate to be 60%, regardless of whether the dropouts occurred at the beginning of the study or late in the study.
As mentioned above, alternative methods have been developed to address the length of actual observation within a cohort. Two of the most commonly referenced are the reverse KM Survival Curve and the Clark et al.'s Completeness Index method [21].

Reverse Kaplan-Meier (KM) survival curve
The reverse KM survival curve is constructed by reversing "censor" and "event" of the standard KM curve [18]. The advantage of this curve is that it describes the extent as well as the timing of loss to follow-up occurred during the study follow-up. If this curve remained closed to 1 until later in the study, then one can infer nearly complete early follow-up therefore more reliable survival estimates at earlier times than later. However, an important limitation to the reverse KM is that it removes events of interest developed during the study from all subsequent risk sets. Thus, studies with a high early event rate can have a low follow-up rate simply due to a smaller risk set. For example, for a hypothetical cohort of 100 subjects who were followed for two years, there were 30 outcome events in the 1st year in scenario (A) and 10 outcome events in the 1st year in scenario (B) while in both scenarios there was no dropout in the 1st year and 30 dropouts in the 2nd year. As indicated in Fig. 2, the reverse Kaplan-Meier Survival curve estimates a higher follow-up rate over time for scenario (B) simply because that Scenario (B) had less earlier events, despite that both scenarios had exactly the same level and timing of dropouts for cohorts of same size at baseline and of same length of follow-up time. Thus, the reverse KM can be very sensitive to earlier events. Another limitation is that the reverse KM survival curve does not provide a summary measure to assess the completeness of the follow-up by the end of the study. Fig. 1 Illustration of the differences in estimates of follow-up using existing and proposed methods. The figure depicts a hypothetical cohort of 100 subjects who were followed and assessed with annual visits for three years. There were 10, 5 and 5 outcome events in the 1st, 2nd and 3rd, respectively. There were 40 dropouts in the 1st year in scenario (A) and in the 3rd year in scenario (B). For simplicity, in this example all events and dropouts occurred on average at the middle of the year. Because the calculation of the true person-time follow-up rate requires the knowledge of the event time for dropouts, we further assumed two situations for the 40 dropouts: (1) none of them became events during the study and (2) 5 of them became events shortly after they dropped out. The Percentage Method (see Eq. (1)) estimates follow-up as the same in both scenarios, since it does not account for person-time in a cohort, and in essence assumes that all dropout occurs at the beginning of the study. Conversely, the Clark Completeness Index (see Eq. (2)) and the Simplified Person-Time Method (see Eq. (5)) both address person-time and provide accurate estimates of the True Person-Time Follow-up Rate (see Eq. (3)). The calculations for each method are shown based on the data from the two scenarios depicted above

Clark's completeness index (CCI)
Clark et al. [21] proposed a novel measure to assess completeness of follow-up based on person-time of follow-up: Specifically, PT observed = the actual total person-time observed in the study, while PT potential = total potential person-time of follow-up estimated by assuming that all dropouts had the full follow-up time. However, this approach fails to consider that those dropouts could have developed the event of interest during the study interval. Therefore, it can overestimate the total potential follow-up time and consequently underestimate the completeness of follow-up; the extent of underestimation would necessarily increase with higher event and dropout rates. In Fig. 1, η CCI = 62.3% for scenario (A) and η CCI = 92.5% for scenario (B), suggesting that the method takes into account observation time for dropouts. However, if in scenario (A) 5 of the 40 dropouts died shortly after dropping out, PT potential would be overestimated and thus η CCI would underestimate the true follow-up rate. The extent to which this affects the estimates given varying conditions and assumptions, to our knowledge, has not been examined before.

Methods
A new person-time definition of follow-up rate (PTFR) In this paper, we propose a new person-time follow-up rate (PTFR)essentially, the observed person-time divided by the person-time assuming no dropouts. Specifically, we define the follow-up rate η PTFR as: where PT no-dropout = the total person-time that would have been observed in the study if there were no dropouts. The denominator is the hypothetical situation of no dropout, with subjects contributing time to event T i or time to the end of the study, whichever came first. Note that the calculation of η PTFR requires that the time to event T i is known for all participants, whether they dropped out or not. It can be shown that η CCI underestimates η PTFR since as W i follows the distribution of T i truncated at τ. Using the example in Fig. 1, if none of the dropouts became events during the study, η PT = 62.3% for scenario (A) and η PT = 92.4% for scenario (B), η PTFR = η CCI ; however, if 5 of the dropouts became events shortly after they dropped out, then η PTFR = 65.3 % > η CCI . Because the PTFR cannot be calculated directly since the event times for dropouts are not observed, here we propose two estimation methods.
A formal method to estimate the person-time follow-up rate (FPT) We first consider an observational cohort study design that involves repeated serial assessments of participants at fixed time-intervals of equal length (e.g., annual or semi-annual clinical visits). In addition to the baseline follow-up rate simply because it has more earlier events; both scenarios share the same follow-up rate after addressing competing risk. Note: this is not the KM curve for the outcome events. In this plot, losses to follow-up were treated as "events" while development of outcome events were treated as "censored" visit at t 0 = 0, we denote the pre-specified visit times as (t 1 , t 2 ,…,t K ) where t K = τ, i.e., the end of the follow-up. It is then assumed that, on average, events and censoring occur midway through each interval, consistent with standard practice in life-table analysis [24]. Therefore, the numerator (i.e., the actual person-time of follow-up) of Eq. (3) is estimated to be where N k − 1 = number of subjects at risk at the beginning of the time interval k (i.e., at time t k-1 ) and ; N E k and N C k are number of events and dropouts that occurred during the interval k, respectively. While PT observed can be easily calculated by summing all participants their observed follow-up time during the study, calculation of the denominator, PT no-dropout in the definition of η PTFR , requires knowledge of the actual time to outcome event for each participant if it happened during the study, regardless whether or not the participant dropped out. This information is typically not available in a real-world study. In an earlier effort to address this problem, Chen, Wei and Huang used the known event rate for the population from which the cohort was derived to calculate "the maximum personyear", which in our nomenclature, is PT no-dropout [15]. However, it is often difficult to specify the population from which a cohort is derived [25], nor will the event rate be known except for certain general endpoints, such as all-cause mortality. Therefore, this approach is not applicable to most studies.
To estimate PT no-dropout , herein we propose estimating the event rate based on the observed data. The survival function and the conditional probability of developing the event of interest are estimated using a nonparametric maximum likelihood approach (NPMLE) proposed by Turnbull [26], equivalent of a Kaplan-Meier survival curve but appropriate for interval observations. To use this approach, all subjects follow-up time need to be described by an interval: if a subject experiences an event between the (k-1)th and kth visit, then that individual's time to event is described by the interval (t k-1 ,t k ); if a subject dropped out between the (k-1)th and kth visit, then that individual's event time is described by an interval (t k-1 ,t K + 1 ) where t K + 1 = some large number, such as 100 years(a theoretical time interval that in essence indicates that the person who dropped out will eventually develop an event assuming there are no competing risks); if this subject was free of events till the end of the study t K , then that individual is given an interval (t K ,t K + 1 ). The Interval package in R [27,28] can be readily applied to estimate the survival curve and the conditional probability of developing the event of interest during each interval.
Next, the expected number of events between (t k-1 ,t k ) is estimated to be N Ã k−1P k where N Ã k−1 ¼ number of subjects remained in the study at time t k-1 if there was no loss of follow-up andP k ¼ the estimated conditional probability of event during the kth interval using the NPMLE method for k = 1,…,K and N Ã 0 ¼ N . Therefore, the number of subjects remained in the study at the beginning of the interval k + 1 if there was no loss of follow-up is then Then, the expected person time if there was no dropout is estimated to be The Person-time follow-up rate is then estimated to be This method, apparently, is relying on the assumption of independent censoring, that is, the event rate of the dropout is the same as that in the general population.
While a prospective epidemiological cohort study may intend to follow participants at serial intervals of approximate equal-length (e.g., annual or semi-annual visits), not every participant returns for each visit or does so at the planned time. This leads to varying lengths of time between visits, which can sometimes be quite extensive. Clinical based cohort studies that involve ad hoc patient follow-up (e.g., cohorts defined retrospectively from hospital EMR) often result in irregular schedules of clinical visits with clustering that does not occur at random (e.g., motivated by symptoms, or an abnormal laboratory test result). To assess the follow-up rate for such data, we extended the proposed approach above to address irregular intervals between visits.
For cohorts involving intermittent and ad hoc followup, let ðt 1 i ; t 2 i ; …t K i Þ be the visit times for the ith person, where K i is either (a) the date of the last visit in the study for the ith person; or (b) the visit that ith person was diagnosed of the event. Then for (a) we used time to the last visit as an estimate of the person's censoring time, i.e.,Ĉ i ¼ mînðT i ; C i Þ ¼ t K i , and for (b)we estimate the time to event occurred in the mid of the interval, i.e.,T i ¼mînðT i ;C i Þ¼t K i −1 þt K i 2 . The actual Person-time of follow-up by a specified time, say, t K , is then estimated by the summation of all the observed follow-up times across subjects, i.e., To estimate PT no-dropout , if the ith person developed the event at his/her last visit, the interval event time is ðt K i −1 ; t K i Þ and if a person did not develop event at his/her last visit, the interval event time is then Þwhere again E represents some large number. Then the NPMLE method can be applied to PT no-dropout .
As mentioned above, the use of observed data to estimate the event rate relies on the assumption that the loss to follow-up is not informative, i.e., event rate among those who remained in the study is the same as those who dropped out so that the event rate estimates obtained from the observed data apply to the unobserved. However, if the subjects who were lost to followup are at a different risk of recurrence than those who remained in the study, the estimates of event rates are biased. For example, if the subjects who were lost to follow-up had a higher risk of event, then the event risk is under-estimated using the observed data and the follow-up rate will be underestimated using the persontime approach because PY nodropout is overestimated. Conversely, if the subjects who were loss to follow-up had a lower risk of event, then the event risk is overestimated and the follow-up rate will consequently be overestimated using the Person-time approach. Here we proposed to calculate a lower bound to the Person-time follow-up rate by assuming all those who dropped out never developed event of interest during the time interval we examined. In this case, PY nodropout reaches its highest possible value, leading to a lower bound for the follow-up rate. Note in this case PY nodropout = PY potential so that min η PTFR = η CCI . The lower bound of the followup rate is important because it provides a conservative estimate of the follow-up rate: if the follow-up rate was over-estimated it can lead to over-optimism on the quality of the follow-up.
A simplified method to estimate the person-time follow-up rate (SPT) The need to estimate the event rate for the purpose of calculating the PTFR can be difficult especially to a nonstatistician. Therefore, we also explore a simplified alternative method to allow quick estimation of η PTFR without having to estimate the event rate. Our proposed Simplified Person-Time method is a hybrid method including aspects of the Percentage Method and the Person-Time Method. Specifically, as in the Percentage Method, individuals who developed the event of interest during the study are treated the same as individuals who were followed till the end of the study, i.e., they are treated as having contributed complete follow-up since they have already provided complete data regarding the factors associated with becoming a case. Furthermore, as a Person-Time Method, dropouts contribute partial follow-up time in the numerator.
A simple alternative method to calculate the follow-up rate is therefore Therefore, in Fig. 1, η SPT = 66.7% for scenario (A) and η SPT = 93.3% for scenario (B), remarkably close to but slightly overestimate η PTFR , the slight overestimation is because events are given the full length of follow-up in this method. It can be shown that : Figure 1 also indicated that η CCI and η SPT together provides a close boundary for η PTFR . In fact, the outcome events can be viewed as competing risk to loss to followup and we can therefore use the method in competing risk framework for the computation of cumulative loss to follow-up rate [29,30] and then to obtain the subdistribution reverse KM curve.
To revisit the reverse KM survival time, we will instead assign the events to have full follow-up time and then the rate of follow-up over time is no longer affected by the amount and the timing of the events. In Fig. 2, both scenarios (A) and (B) will share the same curve of follow-up rate over time after addressing the competing risk of events. It can be shown mathematically that the area under the curve of this new follow-up rate over time divided by τ is η SPT .
R program for computation of each method is provided in Additional file 1.

Simulation studies
Simulation studies were used to examine follow-up rates computed using the standard Percentage Method, the CCI, the FPT, and the SPT as compared to the true follow-up rate η PTFR . To conduct these comparisons, we assumed a range of different outcome event rates and dropout rates. Specifically, the simulations involved N = 1000 subjects and time-to-event and time-todropout were generated for each subject using exponential distributions. The event rate was varied between 5% to 50% and the dropout rate from 10% to 50%, which covers a wide range of plausible values for these two parameters. In the first scenario of the simulation, the length of the study was five years with annual clinical visits; the second scenario incorporated random variations in the time between clinic visits (from 0.5 to 1.5 years). The results were then averaged across 1000 simulated datasets.
Application to the prostate cancer clinical cohort study A retrospective clinical cohort study of time to recurrence of prostate cancer (PrCa) was conducted using EMRs among patients who underwent robotic assisted laparoscopic prostatectomy (RALP) by a single surgeon at Montefiore Medical Center in the Bronx from October, 2005 through December, 2012 [23]. We used this dataset as a real-world example with staggered study entry and ad hoc follow-up. The dataset included N = 610 PrCa patients. Clinical guidelines held that PrCa patients should have PSA levels measured every 3 to 4 months in the first year following RALP, every 6 months in the second and third year, and then annually. However, PSA measurements were to be conducted more frequently if the post-operative serum PSA value exceeded 0.1 ng/dl. The median number of follow-up serum PSA measurements was 7 (range 1-28). PrCa recurrence was defined as a rise in serum PSA of 0.2 ng/ml or higher. There were 87 (14.3%) recurrence events following RALP. Three-year and five-year recurrence rates were of primary interest.
Note although there were no observed deaths in the study, death can be a potential competing risk here. For the interest of assessing the completeness of the followup, death should be included as an event when calculating the follow-up rate.

Results
Simulation studies Table 1 shows that across a wide range of dropout and event rates, η percentage systematically underestimated the follow-up rate: the larger the dropout rate, the higher the level of underestimation. For example, when the event rate was fixed at 10%, the averaged η percentage varied from 91.0% to 46.4%, whereas the true η PTFR varied from 95.3% to 68.4%. In contrast, the FPT η FPT consistently provided an accurate estimate of η PTFR with bias less than 2%. The downward bias is because the These results were compared to the true Person-time follow-up Rate (Eq. 3) based on complete information generated under the simulations, each averaged across 1000 simulated data sets. The simulations involved an assumed 5-year prospective cohort study of N = 1000 subjects with fixed annual interval clinical visits and non-informative dropout. Time-to-event was generated based on exponential distributions with event rates varied from 5 to 50% and time to dropout was generated based on an independent exponential distribution with dropout proportion varying from 10 to 70%. Results were averaged across the 1000 simulated datasets Note: 1. % bias was calculated as (average of the particular method-η PTFR )/η PTFR *100%; 2. ffiffiffiffiffiffiffiffi ffi MSE p was calculated as the square root of the average of (estimate-η PTFR ) 2 . MSE from the true η PTFR was calculated instead of variance because several methods used here can be biased Turnbull's NPMLE [26] tends to slightly underestimate the event rate consequently the follow-up rate. This under-estimation of the cumulative incidence function using the NPMLE method for interval-censored data has been recognized [31,32] and more research on alternative estimators are needed.
The η CCI in general provided a good but slightly lower estimate of η PTFR , except when both the event and dropout rates were high because it fails to take into account events occurred in dropouts. For example, when the event rate was 50% and dropout was 70%, the true η PTFR = 46.6% while η CCI = 40.0%, a 14% downward bias. The η SPT is also in close agreement with the true person-time follow-up rate η PTFR but slightly higher because the events are given the full length of follow-up. The overestimation is also more apparent when the event and dropout rates are high. In the same above example, η SPT = 51.3%, a 10% upward bias. Careful examination of Table 1 shows that the easily estimable SPT and the CCI were as likely to be the closest to the "True Person-Time" follow-up rate in most scenarios as the more complex and laborious FPT. When η SPT is used in tandem with η CCI , they provide a tight range of the true follow-up rate so that the use of η FPT is not necessary.
Similar results for each of the methods of estimating follow-up rates were obtained when visits were irregular; i.e., allowing the time-intervals between visits to vary within a person and between persons (results not shown). Table 2 shows the follow-up time as estimated by the Percentage Method, the CCI, the FPT, and the SPT. Because event rates and dropout rates are low, as expected, the FPT, the CCI and the SPT provided similar results. These results provide much higher estimated follow-up than that calculated using the naïve Percentage Method. In fact, had the Percentage Method approach been used, the investigator may have falsely concluded that the dataset had inadequate 5-year follow-up to be suitable for research purposes, when in fact the other methods showed follow-up to be >90% after 5-years.

Example dataset
In case of informative censoring, as mentioned in the method section, the CCI estimate provides a lower bound for the person-time follow-up rate. Table 2 showed that the lower bounds were very close to the Person-time estimates, suggesting that even in the extreme case that all the dropouts have no risk of developing event during the study, we do not expect the true follow-rate to be much lower.

Discussion and Conclusion
The completeness of follow-up and the length of followup are important measures to determine the adequacy of a cohort dataset for research purposes. The longer the follow-up is, the less the concern regarding statistical power; the better the follow-up is, the less the concern regarding the validity of a study. This paper focused on measures to assess the completeness of the follow-up. A commonly used follow-up rate to assess the completeness of the follow-up, the naïve Percentage Method, fails to consider the person-time contributed to a study by subjects who drop out prior to study completion; other existing measures of completeness of the follow-up including the reverse Kaplan-Meier survival curve and the Clark's completeness index (CCI) all have its own limitations. Therefore, we define a new follow-up rate based on total observed person-time of follow-up out of the total person-time of follow-up that could have been observed if there was no dropout. This definition corrects the inherited biases in the existing methods.
We next proposed two methods to estimate the proposed Person-Year follow-up rate. In the formal Table 2 The follow-up rate at each annual interval after subjects (N = 610) in a retrospective cohort study of 3-year and 5-year prostate cancer (PrCa) recurrence risk based on electronic medical record (EMR) data Follow-up  person-time method, we proposed to estimate the event rate using the observed data, based on which we then estimate the expected number of events if they were no dropouts. Note non-informative censoring is assumed for the validity of FPT, that is, event rate among the dropouts is the same as those who did not. Although this assumption is not verifiable, sensitivity analyses can be conducted to examine the robustness of the estimate of the follow-up rate, for example, by assuming that the dropouts have either a higher event rate or lower event rate than those who did not drop out. The second simplified method (SPT) assigns event time as full follow-up therefore does not require the estimation of event rate and consequently is much easier to use. Our simulations showed that the Percentage Method often underestimates the follow-up rate quite extensively when the dropouts occurred later in the study. The FPT performed well and the CCI and SPT also performed well in most scenarios, while the CCI tends to slightly underestimate and the SPT slightly overestimate the follow-up rate. The bias can be moderate only when both the event rate and the dropout rate are high; otherwise, the SPT used in tandem with the CCI provides an accurate and tight interval estimate of the true Persontime follow-up rate. In these cases, the use of FPT which involves more computations is not necessary. However, the FPT is recommended when event rates and dropout rates are high.
Application of the methods to an example dataset, based on a study of prostate cancer recurrence, helped demonstrate the critical importance of considering person-time prior to dropout when estimating follow-up rates. Briefly, using the standard Percentage Method the 5-year follow-up rate was estimated to be approximately 68%, whereas the CCI, the FPT and SPT all showed the follow-up to be greater than 90%.
Although the CCI method has been proposed over a decade ago, the use of this person-time method to determine follow-up rates has not been widely adopted, likely due to the fact that the performance of the CCI has not been fully examined and/or the misconception that median follow-up time and the reverse KM survival curve are sufficient. Thus, the presentation of this work is timely. The availability and ease of the calculation of the proposed person-time follow-up rate can represent an important advance in assessing the completeness of the follow-up.
Guidelines on how much the extent of loss to followup can be problematic have been based primarily on the percentage method. New guidelines that are based on the person-time follow-up rate should be developed to suggest "acceptable" and "alarming" follow-up rates. Recent work by von Allmen [33] examined the bias in estimating mortality rate under various levels of CCI.
However, this work did not distinguish missing mechanisms including missing completely at random, missing at random and missing not at random; further, research studies are often interested in obtaining an unbiased estimate of the exposure-disease association or relative risk associated with the exposure instead of absolute risk of death or disease. Therefore, further studies including conducting series of simulation studies to examine the bias and efficiency loss on relative risk estimates under various levels of loss to follow-up measured by our proposed person-time follow-up rates and under various missing mechanisms are needed and will be the primary focus of our future research.