 Research article
 Open access
 Published:
New methods for estimating followup rates in cohort studies
BMC Medical Research Methodology volume 17, Article number: 155 (2017)
Abstract
Background
The followup rate, a standard index of the completeness of followup, is important for assessing the validity of a cohort study. A common method for estimating the followup rate, the “Percentage Method”, defined as the fraction of all enrollees who developed the event of interest or had complete followup, can severely underestimate the degree of followup. Alternatively, the median followup time does not indicate the completeness of followup, and the reverse KaplanMeier based method and Clark’s Completeness Index (CCI) also have limitations.
Methods
We propose a new definition for the followup rate, the PersonTime Followup Rate (PTFR), which is the observed persontime divided by total persontime assuming no dropouts. The PTFR cannot be calculated directly since the event times for dropouts are not observed. Therefore, two estimation methods are proposed: a formal persontime method (FPT) in which the expected total followup time is calculated using the event rate estimated from the observed data, and a simplified persontime method (SPT) that avoids estimation of the event rate by assigning full followup time to all events. Simulations were conducted to measure the accuracy of each method, and each method was applied to a prostate cancer recurrence study dataset.
Results
Simulation results showed that the FPT has the highest accuracy overall. In most situations, the computationally simpler SPT and CCI methods are only slightly biased. When applied to a retrospective cohort study of cancer recurrence, the FPT, CCI and SPT showed substantially greater 5year followup than the Percentage Method (92%, 92% and 93% vs 68%).
Conclusions
The Persontime methods correct a systematic error in the standard Percentage Method for calculating followup rates. The easy to use SPT and CCI methods can be used in tandem to obtain an accurate and tight interval for PTFR. However, the FPT is recommended when event rates and dropout rates are high.
Background
The followup rate, a standard index of the completeness of followup, is important for assessing the adequacy of a prospective or retrospective longitudinal cohort dataset for research purposes. In particular, a low followup rate raises concerns regarding the possibility of informative censoring, bias and diminishing statistical power [1,2,3,4,5]; concerns that increase incrementally with the extent of participant dropout from the cohort [5,6,7,8,9,10,11,12]. Common sources of “losstofollowup” include, death due to causes other than the endpoint of interest, patient withdrawal, as well as other reasons for dropout, such as a change in atrisk status (e.g., undergoing a hysterectomy during a study of cervical cancer). For simplicity, in this paper we refer to all losstofollowup and censoring due to any causes other than the event of interest or the end of the study as dropout.
Methods to accurately assess followup rates are likely to be of growing importance during the current, expanding era of electronic medical records (EMRs). That is, hospital and outpatient databases are increasingly being exploited for research purposes, but require careful scrutiny to determine whether they are truly adequate for use in scientific studies. Patients in routine clinical practice may be more likely than research volunteers in a prospective cohort to seek care from multiple, unaffiliated providers, leading to low followup rates observed at a specific health care facility, raising particular concerns regarding informative censoring. Investigators may therefore need to screen through multiple potential clinics or other sources of EMR data to find an appropriate population with adequate followup data.
Thus, while there are many sources of potential bias, the followup rate provides a quick and easy tool to initially screen potential retrospective clinical cohorts prior to doing more in depth evaluation of the adequacy of the data. Both the researchers and journal reviewers should therefore routinely examine the followup rate in an EMRbased study over a period of observation relevant to the study question.
The most commonly used method to assess the completeness of the followup, recommended by Cochrane Handbook [13] and the CONSORT guidelines [14] and often referred to as the “Percentage Method” [15], involves simply calculating the proportion of subjects present at baseline (e.g., enrollment) who remained through the end of the study interval or developed the event of interest by the end of the interval [7, 13, 14, 16]. However, this definition is “naïve” in that it does not distinguish subjects who dropped out early during a study from subjects who dropped out late in the study. In fact, the Percentage Method essentially assumes that all the subjects who were lost to followup were lost at the very beginning of the study, and therefore can severely underestimate the followup rate in a cohort, leading to a false conclusion regarding the quality of the data.
Several attempts have been made to improve upon the Percentage Method for assessing the degree of followup. For example, the median followup time has been used as a measure to examine the length of followup. However, there have been disagreements regarding how the median followup time should be calculated: whether it should be calculated among all subjects, only dropouts, or other variations, each has its limitations [17,18,19,20]. Further, there is an increasing recognition that the median followup time does not directly measure the “completeness of the followup”: e.g., the median followup can be low with excellent followup, and it can be high with poor followup [18, 20,21,22]. While time to event studies must have sufficient length of followup to capture enough events in order to have sufficient statistical power, as we mentioned earlier, poor followup raises concern on the validity of the study. Thus, to assess adequate of followup for a cohort study, we need to examine both the length and the completeness of followup.
Alternatively, a reverse KaplanMeier (KM) survival curve has also been used to assess the length as well as the completeness of the followup, which is constructed by reversing “censor” and “event” [18]. However, as explained in detail below, because the reverse KM method treats the events of interest as censoring, it exaggerates the cumulative loss to followup rate. In addition, a measure of followup completeness proposed by Clark et al. [21], which we explained more later, fails to account for possible events that could have occurred among those who were lost to followup if they had remained in the study. Further, the accuracy of this method, to our knowledge, was never formally examined using simulations.
In this paper, we review major existing methods for estimating followup, and propose a new persontime followup rate (PTFR) – essentially, the observed persontime divided by the persontime assuming no dropouts – to address the limitations we found with existing methods. We then describe two methods to estimate PTFR. Simulation studies are used to examine the accuracy of the proposed methods and the existing methods, and each method is applied to a realworld prostate cancer recurrence “retrospective cohort” study based on EMR data [23].
Existing measures for followingup rates
Consider a cohort of size N, and that T_{i} and C_{i} represent the time to the development of event of interest and the censoring time for the ith subject, respectively, i = 1,2,…,N. For simplicity, we assume the study ends at a specified time,τ.
Standard “percentage method”
The Percentage Method η _{ percentage } defines the followup rate as
In brief, this method calculates the fraction of all enrollees who either developed the outcome of interest or were censored at τ. Note that although participants dropped out at different times, the percentage method essentially considers their followup time as zero no matter how long they contributed persontime to the studysystematically underestimating the true followup. To help illustrate these points, Fig. 1 provides a simple example of a hypothetical cohort of 100 subjects who were followed and assessed with annual visits for three years. There were 10, 5 and 5 outcome events in the 1st, 2nd and 3rd year, respectively with 40 dropouts in the 1st year in scenario (A) and in the 3rd year in scenario (B). The Percentage Method estimates followup rate to be 60%, regardless of whether the dropouts occurred at the beginning of the study or late in the study.
As mentioned above, alternative methods have been developed to address the length of actual observation within a cohort. Two of the most commonly referenced are the reverse KM Survival Curve and the Clark et al.’s Completeness Index method [21].
Reverse KaplanMeier (KM) survival curve
The reverse KM survival curve is constructed by reversing “censor” and “event” of the standard KM curve [18]. The advantage of this curve is that it describes the extent as well as the timing of loss to followup occurred during the study followup. If this curve remained closed to 1 until later in the study, then one can infer nearly complete early followup therefore more reliable survival estimates at earlier times than later. However, an important limitation to the reverse KM is that it removes events of interest developed during the study from all subsequent risk sets. Thus, studies with a high early event rate can have a low followup rate simply due to a smaller risk set. For example, for a hypothetical cohort of 100 subjects who were followed for two years, there were 30 outcome events in the 1st year in scenario (A) and 10 outcome events in the 1st year in scenario (B) while in both scenarios there was no dropout in the 1st year and 30 dropouts in the 2nd year. As indicated in Fig. 2, the reverse KaplanMeier Survival curve estimates a higher followup rate over time for scenario (B) simply because that Scenario (B) had less earlier events, despite that both scenarios had exactly the same level and timing of dropouts for cohorts of same size at baseline and of same length of followup time. Thus, the reverse KM can be very sensitive to earlier events. Another limitation is that the reverse KM survival curve does not provide a summary measure to assess the completeness of the followup by the end of the study.
Clark’s completeness index (CCI)
Clark et al. [21] proposed a novel measure to assess completeness of followup based on persontime of followup:
Specifically, PT_{observed} = the actual total persontime observed in the study, while PT_{potential} = total potential persontime of followup estimated by assuming that all dropouts had the full followup time. However, this approach fails to consider that those dropouts could have developed the event of interest during the study interval. Therefore, it can overestimate the total potential followup time and consequently underestimate the completeness of followup; the extent of underestimation would necessarily increase with higher event and dropout rates. In Fig. 1, η _{ CCI } = 62.3% for scenario (A) and η _{ CCI } = 92.5% for scenario (B), suggesting that the method takes into account observation time for dropouts. However, if in scenario (A) 5 of the 40 dropouts died shortly after dropping out, PT_{potential} would be overestimated and thus η _{ CCI } would underestimate the true followup rate. The extent to which this affects the estimates given varying conditions and assumptions, to our knowledge, has not been examined before.
Methods
A new persontime definition of followup rate (PTFR)
In this paper, we propose a new persontime followup rate (PTFR) – essentially, the observed persontime divided by the persontime assuming no dropouts. Specifically, we define the followup rate η _{ PTFR } as:
where PT_{nodropout} = the total persontime that would have been observed in the study if there were no dropouts. The denominator is the hypothetical situation of no dropout, with subjects contributing time to event T _{ i } or time to the end of the study, whichever came first. Note that the calculation of η _{ PTFR } requires that the time to event T _{ i } is known for all participants, whether they dropped out or not.
It can be shown that η _{ CCI } underestimates η _{ PTFR } since
as W _{ i } follows the distribution of T _{ i } truncated at τ. Using the example in Fig. 1, if none of the dropouts became events during the study, η _{ PT } = 62.3% for scenario (A) and η _{ PT } = 92.4% for scenario (B), η _{ PTFR } = η _{ CCI }; however, if 5 of the dropouts became events shortly after they dropped out, then η _{ PTFR } = 65.3 % > η _{ CCI }.
Because the PTFR cannot be calculated directly since the event times for dropouts are not observed, here we propose two estimation methods.
A formal method to estimate the persontime followup rate (FPT)
We first consider an observational cohort study design that involves repeated serial assessments of participants at fixed timeintervals of equal length (e.g., annual or semiannual clinical visits). In addition to the baseline visit at t_{0} = 0, we denote the prespecified visit times as (t_{1}, t_{2},…,t_{K}) where t_{K} = τ, i.e., the end of the followup. It is then assumed that, on average, events and censoring occur midway through each interval, consistent with standard practice in lifetable analysis [24]. Therefore, the numerator (i.e., the actual persontime of followup) of Eq. (3) is estimated to be
where N _{ k − 1}= number of subjects at risk at the beginning of the time interval k (i.e., at time t_{k1}) and \( {N}_k={N}_{k1}{N}_{E_k}{N}_{C_k},{N}_{E_k} \)and \( {N}_{C_k} \) are number of events and dropouts that occurred during the interval k, respectively.
While PT_{observed} can be easily calculated by summing all participants their observed followup time during the study, calculation of the denominator, PT_{nodropout} in the definition of η _{ PTFR }, requires knowledge of the actual time to outcome event for each participant if it happened during the study, regardless whether or not the participant dropped out. This information is typically not available in a realworld study. In an earlier effort to address this problem, Chen, Wei and Huang used the known event rate for the population from which the cohort was derived to calculate “the maximum personyear”, which in our nomenclature, is PT_{nodropout} [15]. However, it is often difficult to specify the population from which a cohort is derived [25], nor will the event rate be known except for certain general endpoints, such as allcause mortality. Therefore, this approach is not applicable to most studies.
To estimate PT_{nodropout}, herein we propose estimating the event rate based on the observed data. The survival function and the conditional probability of developing the event of interest are estimated using a nonparametric maximum likelihood approach (NPMLE) proposed by Turnbull [26], equivalent of a KaplanMeier survival curve but appropriate for interval observations. To use this approach, all subjects followup time need to be described by an interval: if a subject experiences an event between the (k1)th and kth visit, then that individual’s time to event is described by the interval (t_{k1},t_{k}); if a subject dropped out between the (k1)th and kth visit, then that individual’s event time is described by an interval (t_{k1},t_{K + 1}) where t_{K + 1} = some large number, such as 100 years(a theoretical time interval that in essence indicates that the person who dropped out will eventually develop an event assuming there are no competing risks); if this subject was free of events till the end of the study t_{K}, then that individual is given an interval (t_{K},t_{K + 1}). The Interval package in R [27, 28] can be readily applied to estimate the survival curve and the conditional probability of developing the event of interest during each interval.
Next, the expected number of events between (t_{k1},t_{k}) is estimated to be \( {N}_{k1}^{\ast }{\widehat{P}}_k \) where \( {N}_{k1}^{\ast }= \) number of subjects remained in the study at time t_{k1} if there was no loss of followup and \( {\widehat{P}}_k= \)the estimated conditional probability of event during the kth interval using the NPMLE method for k = 1,…,K and \( {N}_0^{\ast }=N \). Therefore, the number of subjects remained in the study at the beginning of the interval k + 1 if there was no loss of followup is then \( {N}_k^{\ast }={N}_{k1}^{\ast }{N}_{k1}^{\ast }{\widehat{P}}_k \). Then, the expected person time if there was no dropout is estimated to be
The Persontime followup rate is then estimated to be
This method, apparently, is relying on the assumption of independent censoring, that is, the event rate of the dropout is the same as that in the general population.
While a prospective epidemiological cohort study may intend to follow participants at serial intervals of approximate equallength (e.g., annual or semiannual visits), not every participant returns for each visit or does so at the planned time. This leads to varying lengths of time between visits, which can sometimes be quite extensive. Clinical based cohort studies that involve ad hoc patient followup (e.g., cohorts defined retrospectively from hospital EMR) often result in irregular schedules of clinical visits with clustering that does not occur at random (e.g., motivated by symptoms, or an abnormal laboratory test result). To assess the followup rate for such data, we extended the proposed approach above to address irregular intervals between visits.
For cohorts involving intermittent and ad hoc followup, let \( \left({t}_{1_i},{t}_{2_i},\dots, {t}_{K_i}\right) \) be the visit times for the ith person, where K_{i} is either (a) the date of the last visit in the study for the ith person; or (b) the visit that ith person was diagnosed of the event. Then for (a) we used time to the last visit as an estimate of the person’s censoring time, i.e., \( {\widehat{C}}_i=\mathrm{m}\widehat{\mathrm{i}}\mathrm{n}\left({\mathrm{T}}_i,{C}_i\right)={t}_{K_i} \), and for (b)we estimate the time to event occurred in the mid of the interval, i.e.,\( {\widehat{T}}_i=\mathrm{m}\widehat{\mathrm{i}}\mathrm{n}\left({T}_i,{C}_i\right)=\frac{t_{K_i1}+{t}_{K_i}}{2} \). The actual Persontime of followup by a specified time, say, t _{ K }, is then estimated by the summation of all the observed followup times across subjects, i.e.,
To estimate PT_{nodropout}, if the ith person developed the event at his/her last visit, the interval event time is \( \left({t}_{K_i1},{t}_{K_i}\right) \) and if a person did not develop event at his/her last visit, the interval event time is then \( \left({\mathrm{t}}_{K_i},\mathrm{E}\right) \) where again E represents some large number. Then the NPMLE method can be applied to PT_{nodropout}.
As mentioned above, the use of observed data to estimate the event rate relies on the assumption that the loss to followup is not informative, i.e., event rate among those who remained in the study is the same as those who dropped out so that the event rate estimates obtained from the observed data apply to the unobserved. However, if the subjects who were lost to followup are at a different risk of recurrence than those who remained in the study, the estimates of event rates are biased. For example, if the subjects who were lost to followup had a higher risk of event, then the event risk is underestimated using the observed data and the followup rate will be underestimated using the persontime approach because PY _{ nodropout } is overestimated. Conversely, if the subjects who were loss to followup had a lower risk of event, then the event risk is overestimated and the followup rate will consequently be overestimated using the Persontime approach. Here we proposed to calculate a lower bound to the Persontime followup rate by assuming all those who dropped out never developed event of interest during the time interval we examined. In this case, PY _{ nodropout }reaches its highest possible value, leading to a lower bound for the followup rate. Note in this case PY _{ nodropout } = PY _{ potential } so that min η _{ PTFR } = η _{ CCI }. The lower bound of the followup rate is important because it provides a conservative estimate of the followup rate: if the followup rate was overestimated it can lead to overoptimism on the quality of the followup.
A simplified method to estimate the persontime followup rate (SPT)
The need to estimate the event rate for the purpose of calculating the PTFR can be difficult especially to a nonstatistician. Therefore, we also explore a simplified alternative method to allow quick estimation of η _{ PTFR } without having to estimate the event rate. Our proposed Simplified PersonTime method is a hybrid method including aspects of the Percentage Method and the PersonTime Method. Specifically, as in the Percentage Method, individuals who developed the event of interest during the study are treated the same as individuals who were followed till the end of the study, i.e., they are treated as having contributed complete followup since they have already provided complete data regarding the factors associated with becoming a case. Furthermore, as a PersonTime Method, dropouts contribute partial followup time in the numerator.
A simple alternative method to calculate the followup rate is therefore
Therefore, in Fig. 1, η _{ SPT } = 66.7% for scenario (A) and η _{ SPT } = 93.3% for scenario (B), remarkably close to but slightly overestimate η _{ PTFR }, the slight overestimation is because events are given the full length of followup in this method. It can be shown that
Figure 1 also indicated that η _{ CCI } and η _{ SPT } together provides a close boundary for η _{ PTFR }. In fact, the outcome events can be viewed as competing risk to loss to followup and we can therefore use the method in competing risk framework for the computation of cumulative loss to followup rate [29, 30] and then to obtain the subdistribution reverse KM curve.
To revisit the reverse KM survival time, we will instead assign the events to have full followup time and then the rate of followup over time is no longer affected by the amount and the timing of the events. In Fig. 2, both scenarios (A) and (B) will share the same curve of followup rate over time after addressing the competing risk of events. It can be shown mathematically that the area under the curve of this new followup rate over time divided by τ is η _{ SPT }.
R program for computation of each method is provided in Additional file 1.
Simulation studies
Simulation studies were used to examine followup rates computed using the standard Percentage Method, the CCI, the FPT, and the SPT as compared to the true followup rate η _{ PTFR }. To conduct these comparisons, we assumed a range of different outcome event rates and dropout rates. Specifically, the simulations involved N = 1000 subjects and timetoevent and timetodropout were generated for each subject using exponential distributions. The event rate was varied between 5% to 50% and the dropout rate from 10% to 50%, which covers a wide range of plausible values for these two parameters. In the first scenario of the simulation, the length of the study was five years with annual clinical visits; the second scenario incorporated random variations in the time between clinic visits (from 0.5 to 1.5 years). The results were then averaged across 1000 simulated datasets.
Application to the prostate cancer clinical cohort study
A retrospective clinical cohort study of time to recurrence of prostate cancer (PrCa) was conducted using EMRs among patients who underwent robotic assisted laparoscopic prostatectomy (RALP) by a single surgeon at Montefiore Medical Center in the Bronx from October, 2005 through December, 2012 [23]. We used this dataset as a realworld example with staggered study entry and ad hoc followup. The dataset included N = 610 PrCa patients. Clinical guidelines held that PrCa patients should have PSA levels measured every 3 to 4 months in the first year following RALP, every 6 months in the second and third year, and then annually. However, PSA measurements were to be conducted more frequently if the postoperative serum PSA value exceeded 0.1 ng/dl. The median number of followup serum PSA measurements was 7 (range 1–28). PrCa recurrence was defined as a rise in serum PSA of 0.2 ng/ml or higher. There were 87 (14.3%) recurrence events following RALP. Threeyear and fiveyear recurrence rates were of primary interest.
Note although there were no observed deaths in the study, death can be a potential competing risk here. For the interest of assessing the completeness of the followup, death should be included as an event when calculating the followup rate.
Results
Simulation studies
Table 1 shows that across a wide range of dropout and event rates, η _{ percentage } systematically underestimated the followup rate: the larger the dropout rate, the higher the level of underestimation. For example, when the event rate was fixed at 10%, the averaged η _{ percentage } varied from 91.0% to 46.4%, whereas the true η _{ PTFR } varied from 95.3% to 68.4%. In contrast, the FPT η _{ FPT } consistently provided an accurate estimate of η _{ PTFR } with bias less than 2%. The downward bias is because the Turnbull’s NPMLE [26] tends to slightly underestimate the event rate consequently the followup rate. This underestimation of the cumulative incidence function using the NPMLE method for intervalcensored data has been recognized [31, 32] and more research on alternative estimators are needed.
The η _{ CCI } in general provided a good but slightly lower estimate of η _{ PTFR }, except when both the event and dropout rates were high because it fails to take into account events occurred in dropouts. For example, when the event rate was 50% and dropout was 70%, the true η _{ PTFR } = 46.6% while η _{ CCI } = 40.0%, a 14% downward bias. The η _{ SPT } is also in close agreement with the true persontime followup rate η _{ PTFR } but slightly higher because the events are given the full length of followup. The overestimation is also more apparent when the event and dropout rates are high. In the same above example, η _{ SPT } = 51.3%, a 10% upward bias. Careful examination of Table 1 shows that the easily estimable SPT and the CCI were as likely to be the closest to the “True PersonTime” followup rate in most scenarios as the more complex and laborious FPT. When η _{ SPT } is used in tandem with η _{ CCI }, they provide a tight range of the true followup rate so that the use of η _{ FPT } is not necessary.
Similar results for each of the methods of estimating followup rates were obtained when visits were irregular; i.e., allowing the timeintervals between visits to vary within a person and between persons (results not shown).
Example dataset
Table 2 shows the followup time as estimated by the Percentage Method, the CCI, the FPT, and the SPT. Because event rates and dropout rates are low, as expected, the FPT, the CCI and the SPT provided similar results. These results provide much higher estimated followup than that calculated using the naïve Percentage Method. In fact, had the Percentage Method approach been used, the investigator may have falsely concluded that the dataset had inadequate 5year followup to be suitable for research purposes, when in fact the other methods showed followup to be >90% after 5years.
In case of informative censoring, as mentioned in the method section, the CCI estimate provides a lower bound for the persontime followup rate. Table 2 showed that the lower bounds were very close to the Persontime estimates, suggesting that even in the extreme case that all the dropouts have no risk of developing event during the study, we do not expect the true followrate to be much lower.
Discussion and Conclusion
The completeness of followup and the length of followup are important measures to determine the adequacy of a cohort dataset for research purposes. The longer the followup is, the less the concern regarding statistical power; the better the followup is, the less the concern regarding the validity of a study. This paper focused on measures to assess the completeness of the followup. A commonly used followup rate to assess the completeness of the followup, the naïve Percentage Method, fails to consider the persontime contributed to a study by subjects who drop out prior to study completion; other existing measures of completeness of the followup including the reverse KaplanMeier survival curve and the Clark’s completeness index (CCI) all have its own limitations. Therefore, we define a new followup rate based on total observed persontime of followup out of the total persontime of followup that could have been observed if there was no dropout. This definition corrects the inherited biases in the existing methods.
We next proposed two methods to estimate the proposed PersonYear followup rate. In the formal persontime method, we proposed to estimate the event rate using the observed data, based on which we then estimate the expected number of events if they were no dropouts. Note noninformative censoring is assumed for the validity of FPT, that is, event rate among the dropouts is the same as those who did not. Although this assumption is not verifiable, sensitivity analyses can be conducted to examine the robustness of the estimate of the followup rate, for example, by assuming that the dropouts have either a higher event rate or lower event rate than those who did not drop out. The second simplified method (SPT) assigns event time as full followup therefore does not require the estimation of event rate and consequently is much easier to use.
Our simulations showed that the Percentage Method often underestimates the followup rate quite extensively when the dropouts occurred later in the study. The FPT performed well and the CCI and SPT also performed well in most scenarios, while the CCI tends to slightly underestimate and the SPT slightly overestimate the followup rate. The bias can be moderate only when both the event rate and the dropout rate are high; otherwise, the SPT used in tandem with the CCI provides an accurate and tight interval estimate of the true Persontime followup rate. In these cases, the use of FPT which involves more computations is not necessary. However, the FPT is recommended when event rates and dropout rates are high.
Application of the methods to an example dataset, based on a study of prostate cancer recurrence, helped demonstrate the critical importance of considering persontime prior to dropout when estimating followup rates. Briefly, using the standard Percentage Method the 5year followup rate was estimated to be approximately 68%, whereas the CCI, the FPT and SPT all showed the followup to be greater than 90%.
Although the CCI method has been proposed over a decade ago, the use of this persontime method to determine followup rates has not been widely adopted, likely due to the fact that the performance of the CCI has not been fully examined and/or the misconception that median followup time and the reverse KM survival curve are sufficient. Thus, the presentation of this work is timely. The availability and ease of the calculation of the proposed persontime followup rate can represent an important advance in assessing the completeness of the followup.
Guidelines on how much the extent of loss to followup can be problematic have been based primarily on the percentage method. New guidelines that are based on the persontime followup rate should be developed to suggest “acceptable” and “alarming” followup rates. Recent work by von Allmen [33] examined the bias in estimating mortality rate under various levels of CCI. However, this work did not distinguish missing mechanisms including missing completely at random, missing at random and missing not at random; further, research studies are often interested in obtaining an unbiased estimate of the exposuredisease association or relative risk associated with the exposure instead of absolute risk of death or disease. Therefore, further studies including conducting series of simulation studies to examine the bias and efficiency loss on relative risk estimates under various levels of loss to followup measured by our proposed persontime followup rates and under various missing mechanisms are needed and will be the primary focus of our future research.
Abbreviations
 CCI:

Clark’s completeness index
 EMR:

Electronic medical records
 FPT:

formal persontime method
 KM:

KaplanMeier
 NPMLE:

Nonparametric maximum likelihood approach
 PTFR:

PersonTime Followup Rate
 SPT:

Simplified persontime method
References
Choi BC, Noseworthy AL. Classification, direction, and prevention of bias in epidemiologic research. Journal of occupational medicine : official publication of the Industrial Medical Association. 1992;34(3):265–71.
Greenland S. Response and followup bias in cohort studies. Am J Epidemiol. 1977;106(3):184–7.
Johnson ES. Treatment of subjects lost to followup in the analysis of mortality studies. Journal of occupational medicine : official publication of the Industrial Medical Association. 1988;30(1):60–2.
Johnson ES. Bias on withdrawing lost subjects from the analysis at the time of loss, in cohort mortality studies, and in followup methods. Journal of occupational medicine : official publication of the Industrial Medical Association. 1990;32(3):250–4.
Kristman V, Manno M, Cote P. Loss to followup in cohort studies: how much is too much? Eur J Epidemiol. 2004;19(8):751–60.
Deeg DJ, van Tilburg T, Smit JH, de Leeuw ED. Attrition in the longitudinal aging study Amsterdam. The effect of differential inclusion in side studies. J Clin Epidemiol. 2002;55(4):319–28.
Dettori JR. Loss to followup. Evidencebased spinecare journal. 2011;2(1):7–10.
Kempen GI, van Sonderen E. Psychological attributes and changes in disability among lowfunctioning older persons: does attrition affect the outcomes? J Clin Epidemiol. 2002;55(3):224–9.
Sackett DL. Evidencebased medicine. Semin Perinatol. 1997;21(1):3–5.
Touloumi G, Babiker AG, Pocock SJ, Darbyshire JH. Impact of missing data due to dropouts on estimators for rates of change in longitudinal studies: a simulation study. Stat Med. 2001;20(24):3715–28.
Twisk J, de Vente W. Attrition in longitudinal studies. How to deal with missing data. J Clin Epidemiol. 2002;55(4):329–37.
Van Beijsterveldt CE, van Boxtel MP, Bosma H, Houx PJ, Buntinx F, Jolles J. Predictors of attrition in a longitudinal cognitive aging study: the Maastricht aging study (MAAS). J Clin Epidemiol. 2002;55(3):216–23.
Higgins JPT, Green S, Cochrane collaboration. Cochrane handbook for systematic reviews of interventions. Chichester, West Sussex; Hoboken NJ: WileyBlackwell; 2008.
Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallelgroup randomised trials. Lancet (London, England). 2001;357(9263):1191–4.
Chen R, Wei L, Huang H. Methods for calculation of followup rate in a cohort study. Int J Epidemiol. 1993;22(5):950–2.
Renquist K, Jeng G, Mason EE. Calculating followup rates. Obes Surg. 1992;2(4):361–7.
Schemper M, Smith TL. A note on quantifying followup in studies of failure time. Control Clin Trials. 1996;17(4):343–6.
Shuster JJ. Median followup in clinical trials. Journal of clinical oncology : official journal of the American Society of Clinical Oncology. 1991;9(1):191–2.
Altman DG, De Stavola BL, Love SB, Stepniewska KA. Review of survival analyses published in cancer journals. Br J Cancer. 1995;72(2):511–8.
Korn EL. Censoring distributions as a measure of followup in survival analysis. Stat Med. 1986;5(3):255–60.
Clark TG, Altman DG, De Stavola BL. Quantification of the completeness of followup. Lancet (London, England). 2002;359(9314):1309–10.
Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br J Cancer. 2003;89(2):232–8.
Agalliu I, Williams S, Adler B, Androga L, Siev M, Lin J, Xue X, Huang G, Strickler HD, Ghavamian R. The impact of obesity on prostate cancer recurrence observed after exclusion of diabetics. Cancer causes & control : CCC. 2015;26(6):821–30.
Lawless JF. Some nonparametric and graphical procedures. In: Statistical Models and Methods for Lifetime Data. New York: Wiley; 2002. p. 79–145.
Wacholder S, McLaughlin JK, Silverman DT, Mandel JS. Selection of controls in casecontrol studies. I. Principles. Am J Epidemiol. 1992;135(9):1019–28.
Turnbull BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B Methodol. 1976;38(3):290–5.
Fay MP, Shaw PA. Exact and asymptotic weighted Logrank tests for interval censored data: the interval R package. J Stat Softw. 2010;36(2):i02.
Gentleman R, CJ G. Maximum likelihood for interval censored data: consistency and computation. Biometrika. 1994;81(3):618–23.
Lau B, Cole SR, Gange SJ. Competing risk regression models for epidemiologic data. Am J Epidemiol. 2009;170(2):244–56.
Satagopan JM, BenPorat L, Berwick M, Robson M, Kutler D, Auerbach AD. A note on competing risks in survival data analysis. Br J Cancer. 2004;91(7):1229–35.
Pan W, Chappell R. Estimating survival curves with lefttruncated and intervalcensored data under monotone hazards. Biometrics. 1998;54(3):1053–60.
Pan W, Chappell R. A nonparametric estimator of survival functions for arbitrarily truncated and censored data. Lifetime Data Anal. 1998;4(2):187–202.
von Allmen RS, Weiss S, Tevaearai HT, Kuemmerli C, Tinner C, Carrel TP, Schmidli J, Dick F. Completeness of followup determines validity of study findings: results of a prospective repeated measures cohort study. PLoS One. 2015;10(10):e0140817.
Acknowledgements
Not applicable.
Funding
This work was supported by Albert Einstein Cancer Center Support Grant 5P30CA013330–40.
Availability of data and materials
The data that support the findings of this study are available from Montefiore Medical Center (MMC) electronic medical records but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of MMC IRB.
Author information
Authors and Affiliations
Contributions
XX made contributions to every aspect of the study including method development, design of simulations, data analysis, drafting and reviewing the manuscript; IA made contribution to the conception and method development and data interpretation; MK to method development and simulation design; TW made contribution to the method development; JL made contribution on the data analysis; RG made contributions to acquisition of data and interpretation of the data analysis results; HS made substantial contributions to the conception and method development and the interpretation and presentation of simulation results and data analysis results, helped to draft the manuscript and critically reviewed the manuscript in great detail. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This paper involves a secondary analysis of a data set obtained from hospital electronic medical records. The original study were approved by the Institutional Review Board of Albert Einstein College of Medicine and Montefiore Medical Center and has been published elsewhere [23].
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1:
R program for Computation of the traditional Percentage followup rate, Formally estimated PersonTime followup rate, Clark’s Completeness Index and Simplified PersonTime followup rate. (DOCX 14 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Xue, X., Agalliu, I., Kim, M.Y. et al. New methods for estimating followup rates in cohort studies. BMC Med Res Methodol 17, 155 (2017). https://doi.org/10.1186/s128740170436z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s128740170436z