Skip to main content

A non-parametric approach to predict the recruitment for randomized clinical trials: an example in elderly inpatient settings

A Correction to this article was published on 20 September 2024

This article has been updated

Abstract

Background

Accurate prediction of subject recruitment, which is critical to the success of a study, remains an ongoing challenge. Previous prediction models often rely on parametric assumptions which are not always met or may be difficult to implement. We aim to develop a novel method that is less sensitive to model assumptions and relatively easy to implement.

Methods

We create a weighted resampling-based approach to predict enrollment in year two based on recruitment data from year one of the completed GRIPS and PACE clinical trials. Different weight functions accounted for a range of potential enrollment trajectory patterns. Prediction accuracy was measured by Euclidean distance for enrollment sequence in year two, total enrollment over time, and total weeks to enroll a fixed number of subjects, against the actual year two enrollment data. We compare the performance of the proposed method with an existing Bayesian method.

Results

Weighted resampling using GRIPS data resulted in closer prediction evidenced by better coverage of observed enrollment with the prediction intervals and smaller Euclidean distance from actual enrollment in year 2, especially when enrollment gaps were filled prior to the weighted resampling. These scenarios also produced more accurate predictions for total enrollment and number of weeks to enroll 50 participants. These same scenarios outperformed an existing Bayesian method for all 3 accuracy measures. In PACE data, using a reduced year 1 enrollment resulted in closer prediction evidenced by better coverage of observed enrollment with the prediction intervals and smaller Euclidean distance from actual enrollment in year 2, with the weighted resampling scenarios better reflecting the seasonal variation seen in year (1) The reduced enrollment scenarios resulted in closer prediction for total enrollment over 6 and 12 months into year (2) These same scenarios also outperformed an existing Bayesian method for relevant accuracy measures.

Conclusion

The results demonstrate the feasibility and flexibility for a resampling-based, non-parametric approach for prediction of clinical trial recruitment with limited early enrollment data. Application to a wider setting and long-term prediction accuracy require further investigation.

Peer Review reports

Introduction

To maximize the likelihood of success in clinical trials, one must take into consideration the study design, patient recruitment and retention, study setup, conduct and analysis. When looking at patient recruitment, successfully recruiting a prespecified number of trial participants is critical and remains challenging to the success of clinical trials. It has been reported that 86% of trials fail to finish on time and 60% are delayed or terminated due to low enrollment [1]. Even though clinical trial participant satisfaction is high [2, 3], recruitment in trials is still highly challenging. The recruitment rate can be impacted by both external and internal factors [4], such as the COVID19 pandemic in 2020 which impacted a substantial number of clinical trials for many different reasons [5]. The topic of recruitment prediction in clinical trials has been investigated by researchers in the U.S. and Europe, with an online resource being created by the British and Irish health system for the latter [6]. Recruitment research has prioritized several questions to clinical trial stakeholders: Given the recruitment data we have so far, can we enroll the targeted number of subjects within the study period? How long will it take to recruit the remaining target number of patients, and do we need to adjust our predetermined recruitment strategy [7, 8]?

Various types of prediction models for recruitment have been previously developed, including both stochastic and deterministic methods [9, 10]. Despite the existence of these models, simpler prediction approaches are still preferred due to the additional complexity of implementation yet with unknown evidence of accuracy [11]. Previous prediction models have relied on the assumptions of specific distributions such as parametric assumptions, number of patients enrolled over a fixed interval for Poisson models [12, 13], assuming constant or varying recruitment rates among recruitment centers for Gamma distributions [14], or the reliance on prior information that is not always available with a Bayesian method [15, 16]. Though it is challenging to accurately predict the recruitment rate, understanding the pattern of recruitment may provide valuable information needed to model final recruitment, and make adjustment to the recruitment strategy, as needed. As the actual recruitment pattern is frequently affected by a number of factors, such as seasonal adjustments for disease [13, 17], unavailability during holidays [17], reduced availability of staff due to exposure risk [4, 5], logistical issues [5], and other internal or external factors, previous prediction models will fail to meet necessary assumptions and cannot have optimal prediction. Inaccurate or overly optimistic estimated recruitment rates may lead to missing recruitment targets, high dropout rates, and premature discontinuation, negatively affect data integrity due to the delayed assessment and monitoring, and such trial disruption has resource, ethical, and care implications [5, 7, 11].

The objective for this project is to develop and test a novel and flexible recruitment model using a weighted resampling-based non-parametric approach for clinical trials in inpatient settings. We also aim to compare the performance of the proposed method to an existing model in the literature.

Methods

We used recruitment logs from the Geriatric Recovery Using Inpatient and Post-hospitalization Supplementation (GRIPS, NCT03904615) [18] and the Feasibility Study of Post-hospitalization Interventions to Improve Physical Function in Older Adults (PACE, NCT02203656) [19], conducted in Acute Care for the Elderly (ACE) units [20] at the University of Texas Medical Branch (UTMB) (IRB# 18–0247 and 13–038). GRIPS originally planned to enroll 160 subjects and began recruiting patients in June of 2019. PACE originally planned to enroll 113 subjects and began recruiting patients in January of 2014. Inclusion criteria included participants aged 65 years or older and admitted to UTMB hospital for any acute medical condition, with PACE recruitment being expanded to include non-ACE unit patients in the second year. GRIPS was suspended due to difficulty in sourcing supplements and stopped the recruitment in March of 2022 with a total enrollment of 70. Both recruitment logs included information on the date where patients were admitted to the hospital, all variables related to inclusion and exclusion criteria, overall eligibility, and final enrollment indicators. We summarized the enrollment data counting patients that were finally enrolled weekly, given that daily recruitment plots were too sparse to interpret, and monthly sample size was too small to run resampling confidently. We then split the first two years of data into training and testing sets. The first-year data served as the empirical distribution to simulate the predicted enrollment for the second year using a weighted resampling approach. The weight functions account for how the resampling is conducted over time (see Figure S1 in supplementary digital content). Year one of GRIPS contained weeks where zero enrollment took place, due to systemic events such as the holiday season, clinic closure, staff shortage, or other reasons like patient unavailability, which implies a higher anticipated recruitment in year two. Year two of the PACE study was anticipated to have lower enrollment due to the availability of the coordinator being reduced from 5 days a week to 3 due to patient obligations. The actual second year enrollment served as the validation set. Resampling was conducted with replacement under six scenarios that fit different circumstances for each study heading into year two of recruitment. Each scenario corresponds to one (non-parametric) prediction model.

GRIPS

Scenario 1: Random sampling with replacement (Bootstrapping). This scenario assumes an equal weight for each enrollment week, producing a constant enrollment rate. The constant enrollment rate reflects the assumption that time of year does not influence recruitment (no seasonal effects).

Scenario 2: Weighted sampling was performed using a probability mass function of Binomial (51, 0.5). The probability mass function reflects the weights that we will use for the calendar time. We anchored the peak of the curve, or highest weight at the same calendar time to one we were going to predict. The weight was reduced gradually when we move away from that time. This assumes we have higher probability for those enrollments that likely follow the same pattern due to the seasonal effect. Therefore, weighting was incorporated assuming certain seasonality in recruitment, oversampling those weeks which are closer to the predicted time of year relative to times of year which are farther away.

Scenario 3: Sampling with the same weights used in scenario 2 but using augmented data. The weeks in year one where zero enrollment took place were filled by resampling using data from weeks with active enrollment within the same year. Gaps in enrollment were filled before the simulation was run. This strategy assumes enhanced enrollment, avoiding gaps in the second year.

Scenario 4: Sampling with the same weights used in scenario 2 except assuming zero weight during the weeks of U.S. federal holidays, effectively removing the eligibility of those weeks from the simulation.

Scenario 5: Sampling with weights, filling gap weeks and with holidays weighting zero. This strategy assumes enhanced enrollment in non-holidays only.

Scenario 6: To assess the impact of different weight functions, we modified the weighting function to calculate weights using Cauchy (0,1) instead of Binomial. Gaps were filled and holidays weights were set to 0. The Cauchy (0,1) has a heavier tail than Binomial (51, 0.5) which results in increased weights from weeks further away from the calendar week.

Pace

Scenarios 1 and 2 are the same as in the GRIPS study above.

Scenario 3: A Cauchy (0, 1) distribution was used to calculate weights to assess the impact of different weight functions.

Scenarios 4, 5, and 6 were repeats of the first 3 scenarios, however they used the first year’s weekly enrollment data reduced to 60% to simulate the anticipated 2 days a week in which enrollment could not occur due to being out of office.

These scenarios were implemented based on the change of circumstances from year one to year two and could be implemented both in other datasets and prospectively in actively recruiting trials where settings are expected to change.

We generated 10,000 simulations for each of six different scenarios. The following three indexes were calculated to measure the performance of the model in each scenario:

  1. 1.

    Median and 95% prediction interval at the 2.5 and 97.5 percentiles for Euclidean distance (ED) between observed and simulated accumulation sequences.

  2. 2.

    Median and 95% projection band (2.5, 97.5 percentiles) for total accumulated enrollment over time.

  3. 3.

    Median number of weeks required to enroll 50 subjects with 95% prediction interval.

Simulation

For each of the scenarios, ten thousand simulations were generated. Each simulation consists of a one year accumulated weekly recruitment curve. First year of recruitment data corresponds cumulative enrollment with curve \(\:\mathbf{X}=({X}_{1},{X}_{2},\dots\:,{X}_{52}),\text{\:where\:}{X}_{i}={\sum\:}_{j=1}^{i}{x}_{j}\) and \(\:{x}_{j}\:\)corresponds to enrollment in week j. Simulations to predict the second year are drawn from the first-year sample. The lth simulation corresponds to \(\:{\mathbf{Y}}_{l}=({Y}_{l,1},{Y}_{l,2},\dots\:,{Y}_{l,52})\text{,\:where\:}{Y}_{l,i}={\sum\:}_{j=1}^{i}{y}_{l,i}\) and \(\:{y}_{l,i}\in\:\{{x}_{1},{x}_{2},\dots\:,{x}_{52}\}\text{\:with\:}P({y}_{l,i}={x}_{j})={w}_{i,j}\). In the case of bootstrapping \(\:{w}_{i,j}=\frac{1}{52}\). In the binomial simulations \(\:{w}_{i,j}=\left(\genfrac{}{}{0pt}{}{51}{\left|26-\left|i-j\right|\right|}\right){0.5}^{\left|26-\left|i-j\right|\right|}{0.5}^{51-\left|26-\left|i-j\right|\right|}\). Finally, in Cauchy simulations \(\:{w}_{i,j}\) can be calculated by a normalization of \(\:f\left({x}_{i,j}\right)\) where \(\:{w}_{i,j}=\frac{f\left({x}_{i,j}\right)}{{\sum\:}_{k=1}^{52}f\left({x}_{k,j}\right)}\) and \(\:f\left({x}_{i,j}\right)=\frac{1}{\pi\:}\left[\frac{1}{{\left[{x}_{i,j}\right]}^{2}+1}\right]\) where \(\:f\) is the standard Cauchy pdf and \(\:{x}_{i,j}\) takes values between 0 and 3, and its value is inversely proportional to the absolute difference between \(\:i\) and \(\:j\). The Cauchy weights correspond to the normalized density of a truncated standard Cauchy between − 3 and 3. In simulations where holiday week weights have been set to 0, a normalization like the Cauchy has been used.

For the measures of accuracy, we calculate the Euclidean distance as \(\:{d}_{l}=\sqrt{{\sum\:}_{i=1}^{52}{\left({Y}_{l,i}-{X}_{i}\right)}^{2}}\).

To compare our methodology to an existing approach, we also generated simulations based on the methodology and R package software developed by Jiang et al [15, 16, 21]. The Bayesian model developed by Jiang et al. is the only model that comes with a freely available software package dedicated to predicting recruitment in clinical studies. The Bayesian approach incorporates subjective knowledge about subject accrual rates through an informative prior distribution. These methods assume that the accrual rate is constant and that the waiting time follows an exponential distribution. In their methods, the strength of the prior distribution is controlled by a parameter P between 0 and 1. If P = 1, the prior is given weight equivalent to the proposed sample size of the study. If P = 0.5, the prior is given weight equivalent to half the proposed sample size. This means that halfway through the study, the prior and the actual subject accrual data are given equal weight. If P = 0, the prior is effectively ignored. These 3 priors which hold P constant are called “informative priors” (inf_p_00, inf_p_50, inf_p_100). Jiang et al [15, 16] incorporate to the existing model two extensions to make the choice of prior more objective: the accelerated prior (accelerate_prior) and the hedging prior (hedging_prior). In the accelerated prior, P decreases proportional to the number of subjects recruited. The hedging prior will down weight historical data that are inconsistent with the off-target prior.

We used an R package created for these methods [16] to compare with our methods. However, the functions of the existing package generate a prediction for a given time interval, not an accumulated sequence which is monotone increasing. We modified the functions to run sequenced predictions for each of 52 weeks of recruitment to generate a correlated sequence. This modification required each step to incorporate the previous simulation step as prior data and was more appropriate for comparison.

Results

Over the course of the GRIPS study, 70 total participants were enrolled with 60 participants enrolled by the end of year two which were used in this analysis. Mean age of the GRIPS study population utilized was 73.1 (standard deviation = 6.7), with 27 (38%) males and 44 (62%) females. Of this population, 61 (86%) were White, 10 (14%) were Black. The population consisted of 2 (2.8%) Hispanic, and 69 (97.2%) non-Hispanic participants. Weekly enrollment rate during the GRIPS study was not constant. Various gaps exist within the data due to events such as staffing issues and the COVID19 lockdown (Fig. 1). If the underlying enrollment rate was constant, the time interval between consecutive recruitments would follow an exponential distribution on which many previous methods rely. With these recruitment gaps, the data do not appear to follow an exponential distribution (p-value = 0.02 by the test of Epps and Pulley for the Exponential distribution) [22].

Fig. 1
figure 1

Recruitment Data by Week. Weekly enrollment pattern using the GRIPS: Geriatric Recovery Using Inpatient and Post-hospitalization Supplementation study (left) and PACE: Post-hospitalization Interventions to Improve Physical Function in Older Adults (right). Blue bars represent participants who have been screened. Red bars represent participants who are eligible for enrollment. Green bars represent participants who were ultimately enrolled. It is important to note that blue bars use a different scale than the red and green bars, by a scale factor of 2:1. Red and green bars share the same scale. All bars start at 0 on the x-axis

In the PACE study, 113 total participants were enrolled with 94 participants enrolled by the end of year two which were used in this analysis. Mean age of the PACE study population utilized was 78.1 (standard deviation = 7.4), with 36 (32%) males and 77 (68%) females. Of this population, 98 (87%) were White, 13 (12%) were Black. The population consisted of 14 (12%) Hispanic, and 99 (88%) non-Hispanic participants. Weekly enrollment rate during the PACE study was not constant. While fewer gaps exist relative to the GRIPS study, PACE enrollment in year 2 was drastically lower than in year 1 due to the anticipated schedule involving the coordinator being unable to enroll 2 days per week.

Figure 2 shows the predictions along with the observed year one and two enrollment sequences for the GRIPS study under the 6 different scenarios described in the Methods section. Table S1 in the supplementary digital content shows the results of the 3 indexes calculated for each scenario relative to the actual enrollment of year two (median ED = 94.2). Results of actual year one enrollment is presented to serve as a benchmark for heterogeneity between years. The bootstrapping simulations resulted in a constant rate of enrollment and a simulated sequence which appears as a constant slope (median ED = 87.7 (45.6, 125.6)). All weighted samplings (scenarios 2–6) resulted in closer prediction evidenced by better coverage of the observed enrollment (red), and smaller ED relative to actual enrollment in year two (Table S1). Scenario 4 has a slightly reduced ED relative to scenario 2 (median ED = 79.6 (45.6, 117.8) vs. 83.4 (48.2, 122.1)). Simulation after filling gaps (scenarios 3, 5, 6) further increases the accuracy of prediction with ED ranging between 49.2 and 54.0. Actual year two enrollment displayed a deviation in the last 3–6 months of recruitment; however, scenarios 3, 5, and 6 have smaller deviations.

Fig. 2
figure 2

Median (95% prediction band) Enrollment of Year 2 vs. Enrollment across 6 different scenarios (GRIPS). Predicted median Year 2 enrollment (95% prediction band) vs. actual enrollment of Years 1 and 2 across 6 different scenarios. Blue lines represent the empirical data from year 1 enrollment. Red lines represent the observed enrollment data from year 2. Black lines represent the simulated median enrollment data from the 6 scenarios along with their respective prediction bands in the shaded regions

The total enrollment observed in the actual year two GRIPS data was 41 participants. It took a total of 92 weeks to enroll 50 participants in the actual trial. Scenarios 1, 2, and 4 performed similarly, with a total median recruitment in year two of 18 and 147–148 weeks to reach 50 subjects. Scenarios 3, 5, and 6 displayed improved performance, with a total median recruitment in year two of 25–27 and 98–104 weeks to reach 50 subjects.

Figure 3 shows the predictions along with the observed year one and two enrollment sequences for the PACE study under the 6 different scenarios described in the Methods section. Table S2 in the supplementary digital content shows the results of the 3 indexes calculated for each scenario relative to the actual enrollment of year two (median ED = 104.5). Bootstrapping (median ED = 87.7 (45.6, 125.6)) and Cauchy (0, 1) (median ED = 91.7 (35.7, 161.8)) scenarios improved upon the observed ED. The Binomial (51, 0.5) weighting scenario resulted in a median ED closer to the empirical ED (101.6 (44.4, 171.5)). All scenarios using 60% enrollment data performed similarly and resulted in better prediction and smaller ED relative to actual enrollment in year two (Table S2). The Binomial (51, 0.5) prediction band flattens sooner and may result in better prediction beyond 12 months as seen by the plateau at the end of 2nd year enrollment (Fig. 3).

Fig. 3
figure 3

Median (95% prediction band) Enrollment of Year 2 vs. Enrollment across 6 different scenarios (PACE). Predicted median Year 2 enrollment (95% prediction band) vs. actual enrollment of Years 1 and 2 across 6 different scenarios. Blue lines represent the empirical data from year 1 enrollment. Red lines represent the observed enrollment data from year 2. Black lines represent the simulated median enrollment data from the 6 scenarios along with their respective prediction bands in the shaded regions

The total enrollment observed in the actual year two PACE data was 35 participants. It took a total of 40 weeks to enroll 50 participants in the actual trial. Scenarios 1, 2, and 3 which used 100% of year 1 as empirical data performed similarly, with a total median recruitment in year two of 59 and 40–45 weeks to reach 50 subjects. Scenarios 4, 5, and 6 also performed similarly and displayed improved performance, with a total median recruitment in year two of 41 and 64–68 weeks to reach 50 subjects.

Figure 4 shows the comparison of our proposed method against an existing Bayesian method [16] using the GRIPS study. The graph on the top left shows ED between simulated sequences and the actual enrolled sequence in year two. The red reference line is the distance between actual year one and year two enrollment. The Bayesian method [16] shows its best performance with inf_p_00 and hedging_prior for predicting ED. In our method, scenarios 3, 5, and 6 considering seasonal fluctuation and filling enrollment gaps produced a smaller median ED for the predicted sequence. The Bayesian accrual method performs similarly when predicting a fixed number of subjects, with none of the Bayesian scenarios covering the observed number of weeks to reach 50 subjects in the actual recruitment data. In our method, scenarios 3, 5, and 6 showed improved performance over this existing method displaying coverage of the actual enrollment of year two. When predicting total number of subjects enrolled within a fixed time interval of year two, our scenarios 3, 5, 6 showed improved performance in predicting recruitment in the first 6 months. Neither Bayesian method nor our methods are optimal if predicting 12 months away.

Fig. 4
figure 4

Comparison of Accrual package vs. Our Method across 3 indexes (GRIPS). Comparison between Bayesian accrual models and our proposed methods across median Euclidean Distance, median time to enroll 50 participants, and median total accumulated enrollment over time (split into 6 month and 12-month periods on right plots). The solid horizontal red reference line represents the observed Euclidean Distance between actual year one and year two (top left), the observed number of weeks to enroll 50 participants beginning in actual year one and ending in year two (bottom left), and the observed number of participants enrolled in a given timeframe in actual year two (top right and bottom right). The blue lines represent Accrual results across 5 types of priors. The red lines represent our method across 6 scenarios. Scenarios below the reference line in Euclidean Distance and time to enroll 50 participants (left plots) are improvements upon actual year measurements. Scenarios above the reference line in median total enrolled (right plots) are improvements upon actual year 2 measurements

Figure 5 shows the comparison of our proposed method against an existing Bayesian method [16] using the PACE study. The Bayesian method [16] shows its best performance with inf_p_00 and hedging_prior for predicting ED. In our method, scenarios 4, 5, and 6 considering a reduced weekly enrollment based on a known schedule change produced a smaller median ED for the predicted sequence. The Bayesian accrual method performs similarly when predicting a fixed number of subjects, with all Bayesian methods covering the observed weeks to reach 50 subjects in the actual recruitment data. In our method, scenarios 1, 2, and 3 produced smaller confidence intervals around the observed time to 50 subjects. Scenarios 4, 5, and 6 did not cover the observed time to 50 subjects as they predicted lower enrollment than that seen during year one. When predicting total number of subjects enrolled within a fixed time interval of year two, our scenarios 4, 5, and 6 showed similarly improved performance in predicting recruitment in the first 6 months as well as 12 months. Neither Bayesian method nor our scenarios 1, 2, and 3 are optimal if predicting 12 months away.

Fig. 5
figure 5

Comparison of Accrual package vs. Our Method across 3 indexes (PACE). Comparison between Bayesian accrual models and our proposed methods across median Euclidean Distance, median time to enroll 50 participants, and median total accumulated enrollment over time (split into 6 month and 12-month periods on right plots). The solid horizontal red reference line represents the observed Euclidean Distance between actual year one and year two (top left), the observed number of weeks to enroll 50 participants beginning in actual year one and ending in year two (bottom left), and the observed number of participants enrolled in a given timeframe in actual year two (top right and bottom right). The blue lines represent Accrual results across 5 types of priors. The red lines represent our method across 6 scenarios. Scenarios below the reference line in Euclidean Distance and time to enroll 50 participants (left plots) are improvements upon actual year measurements. Scenarios above the reference line in median total enrolled (right plots) are improvements upon actual year 2 measurements

Discussion

We developed a simulation-based non-parametric approach for predicting recruitment for randomized clinical trials in elderly inpatient settings utilizing recruitment data from the GRIPS and PACE studies. The GRIPS study began just before the COVID19 lockdown and could be anticipated that the enrollment would increase as the lockdown was lifted. The PACE study took place prior to the COVID19 pandemic and had an anticipated scheduling issue which would lead to lowered enrollment in the following years. These two studies represent different enrollment outcomes in the real world. Using the first year of recruitment data, 6 different scenarios were simulated and compared to the actual enrollment data from year two as well as an existing Bayesian accrual method [15, 16]. Of these 6 scenarios using the GRIPS data, those that generated weighted sampling scenarios and filled enrollment gaps (scenarios 3, 5, and 6) consistently outperformed or matched all other models until predicting total enrollment at 12 months, where all tested methods performed less than optimally. For the 6 scenarios using the PACE data, those which used a reduced weekly enrollment outperformed all other models apart from predicting the number of weeks to 50 subjects, where these scenarios predicted lower enrollment than the 59 observed during year one, which was anticipated.

When comparing our scenarios to the actual observed enrollment in year two of the GRIPS study, scenario 1 (bootstrapping) resulted in similar ED and appears visually as a constant slope, which was expected as it reflects the assumption of an equal random sampling weight for each week. Scenarios 2–6 all utilized a weighted sampling technique which improved the accuracy of prediction. The similar performance between scenarios 2 and 4 suggest that adjusting the sampling weight due to holidays, while it still improves prediction accuracy in this case, may not be as effective as other weighting strategies. The weighted sampling scenarios which also filled enrollment gaps (scenarios 3, 5, and 6) produced the most accurate predictions and lowest ED relative to the actual year two enrollment. Scenarios 3 and 5 yielded similar ED. However, scenario 5, which accounted for holidays, is deemed a more reasonable strategy in recruitment. Scenario 6 utilized a Cauchy (0, 1) weighted distribution while filling enrollment gaps as well as adjusting weights for holiday weeks. Scenario 6 distributed the sampling weights over a wider range to simulate a lower frequency of enrollment fluctuations. Scenarios 5, and 6 performed similarly; however, scenario 6 appears visually like the bootstrapping scenario with a near constant slope until the last 3 months due to its wider sampling weights.

All 6 scenarios using the GRIPS data began to deviate from the observed year two enrollment during the last 3–6 months. The deviation in late year two is due to the observed dramatic increase in enrollment in the actual year two data while the simulated predictions were based on the observed rate in year one. This dramatic difference in enrollment between actual year one and actual year two may be associated with several factors, such as the COVID19 shutdowns during year one, competing studies which occurred during year one, and the training of an additional study coordinator during the study. These issues during year one were resolved during year two. Because of this important difference between predicted and observed enrollment, all scenarios underperformed when compared to the observed total enrollment in year two. Scenarios 1, 2, and 4 performed similarly in this regard. Scenarios 3, 5, and 6 again provided a more accurate prediction. Scenarios 3, 5, and 6 all provided coverage of the observed year two enrollment until beyond week 42. After week 42, the actual year two enrollment elevates drastically, resulting in a much higher total enrollment compared to any of the simulated scenarios.

When looking at the number of weeks to enroll a fixed number of 50 participants, a similar pattern began to emerge. Scenarios 1, 2, and 4, which did not account for enrollment gaps, all performed similarly and required many more weeks to reach 50 total enrollments. Scenarios 3, 5, and 6, which filled any gaps in enrollment, again showed an improvement in prediction accuracy when compared to the actual year two enrollment.

Using the PACE study data, it was anticipated that enrollment in year 2 would be lower than year 1 due to the coordinator being out of office for 2 days a week. Scenario 1 (bootstrapping) again resulted in a similar ED to actual year 1 as expected. However, this time scenario 2 which used a Binomial (51, 0.5) weight function produced the most similar ED to actual year 1 of our 6 scenarios. Scenario 3 which utilized a Cauchy (0, 1) weight function performed similarly to scenarios 1 and 2. Scenarios 4, 5, and 6 all used version of the year 1 weekly enrollment which was reduced to 60% to simulate the 3 days a week that enrollment could occur in year 2. Because of this, scenarios 4, 5, and 6 all had the lowest ED and tightest confidence bands.

Scenarios 1, 2, and 3 using the entirety of the PACE year 1 data overestimated the total number of patients to be enrolled in year 2. This was expected as they did not account for the reduced availability of the coordinator. Scenarios 4, 5, and 6 which used a reduced weekly enrollment all similarly produced a more accurate estimate of the total enrolled in year 2. However, scenarios 1, 2, and 3 did accurately estimate the total weeks needed to reach 50 enrolled patients with scenario 2 (Binomial (51, 0.2)) producing the most accurate result. Scenarios 4, 5, and 6 overestimated the number of weeks needed to enroll 50 patients. In the actual data, more than 50 patients were already enrolled by the end of year 1, so this specific question may be less relevant for this study which used the first year to predict the second. While scenarios 4, 5, and 6 all performed similarly, scenarios with weight functions [5, 6] better captured the seasonal variation across recruitment period.

Many of the Bayesian methods assume that the accrual rate is constant and that the waiting time follows an exponential distribution. When using an existing Bayesian accrual method [16] with the GRIPS data, the inf_p_00 and hedging_prior resulted in the smallest predicted ED, but all approaches performed similarly. All our scenarios outperformed the existing Bayesian method and the actual ED between year one and year two enrollment in terms of whether it contained a lower possible ED. The inf_p_00 approach is like our scenario 1 (bootstrapping) which assumes an equally weighted random sampling. Scenarios 3, 5, and 6 which involved weighted sampling and filled enrollment gaps produced the lowest possible EDs. These results remained consistent for time to enroll 50 participants and total enrollment at 6 months in year two. Scenarios 3, 5, and 6 all required their own assumptions to be met such as determining the sampling weight, filling enrollment gaps, or accounting for holidays. However, these assumptions are based more on recruitment strategy such as enhanced enrollment during periods of zero enrollment, and the utilization of existing data rather than assuming a constant rate or requiring specific distributions of previous years. The type of input for our scenarios is easy for the investigator to obtain.

When using the Bayesian accrual methods [16] on the PACE data, inf_p_100 and heading_prior resulted in the smallest ED, but again all methods performed similarly. All Bayesian methods performed similarly to our scenarios 1, 2, and 3 when predicting year 2 in terms of containing the smallest possible ED, with our scenarios having smaller confidence bands and performing closer to the observed year 1 ED. Our scenarios using a reduced weekly enrollment (scenarios 4, 5, and 6) all outperformed the Bayesian methods using 100% enrollment data and resulted in the tightest confidence bands. All methods and scenarios which used 100% enrollment data from year 1 contained the observed time to enroll 50 participants, with our scenarios 1, 2, and 3 again resulting in tighter confidence bands. All methods and scenarios using 100% enrollment data performed similarly when predicting total participants enrolled at 6 months. At 12 months, the larger confidence bands of the Bayesian methods resulted in closer coverage, but none of the methods nor our scenarios using 100% enrollment data performed optimally. Scenarios 4, 5, and 6 which used 60% enrollment data were the only scenarios to cover the observed total participants enrolled at 6 and 12 months while also resulting in the tightest observed confidence bands.

Our approach is applicable to the monitoring phase with early enrollment data available. This analysis used 1 year of empirical data, but it is feasible to use much less. However, our method would assume that the future months would follow the same pattern as the empirical distribution, which may not accurately reflect seasonal fluctuations. If a smaller initial enrollment sample is all that is available, the scope of prediction would need to be narrow or require more input from the investigator at the time of prediction. The current empirical models include 18 participants (GRIPS) and 59 participants (PACE) in year one. We anticipate that projection model performance improves with larger enrollment sample size.

There are several strengths of our method. The reliance on a constant accrual rate or specific distribution is common to many of the previous methods. Our proposed method relaxes these parametric assumptions by using non-parametric approaches (empirical recruitment pattern) for the model. Our approach is data-driven, more flexible to account for unknown underlying impacting factors, and requires limited information, primarily only a recruitment log. Using our method, arbitrary enrollment patterns such as seasonal effects, staffing issues, or interruptions caused by pandemics can be modeled inherently by the clinical investigator who does not anticipate these gaps in enrollment to repeat in future recruitment periods. This is methodologically innovative and will allow the generation of a new approach that can inform clinical trial stakeholders in the decision-making process for whether to stop the trial or introduce amendments to the experimental design or ultimate trial goal.

However, there are several limitations to consider. Long-term prediction using our method does not seem to provide an advantage over previous methods. Predicted enrollment at 12 months was not optimal with our method nor was the existing Bayesian method unless enrollment could be anticipated to change by a quantifiable amount as seen using the PACE data. Unforeseen dramatic changes in enrollment patterns which occur in the predicted year cannot be accounted for, such as the elevated enrollment seen in the actual recruitment after week 42 in year two in the GRIPS data. Our method was applied to randomized clinical trials in elderly inpatient settings. The accuracy of prediction in other settings is to be determined.

We would like to extend our method to other types of settings, such as outpatient enrollment or community sample, other type of interventions, and multi-site trials. Our method is easily adaptable to predict recruitment for subgroups based upon patient characteristics. Interim updates to the empirical distribution throughout the enrollment process to provide an updated prediction need to be assessed to better understand how additional enrollment data affects our projections. We would also like to apply our methods to real trial monitoring to test the validity of our prediction method, which has not been established for many other prediction methods [11]. Finally, we plan to integrate recruitment and develop a Shiny app for use during the monitoring phase of trial protocols.

Conclusions

It is feasible to use the simulation-based non-parametric approach to predict clinical trial recruitment where there is early enrollment data but limited information in an elderly inpatient setting. The proposed weight function in the resampling accommodates arbitrary enrollment patterns and anticipated changes in future enrollment periods and outperforms bootstrapping and an existing Bayesian method. Our method is easily adaptable under various scenarios with minimal input from the clinical investigator. Further refinement of our approach is needed to improve long-term prediction accuracy and extend the method to a wider setting.

Data availability

The anonymized dataset supporting the conclusions of this article is available upon request from the corresponding author, Alejandro Villasante-Tezanos, at alvillas@utmb.edu.

Change history

Abbreviations

GRIPS:

Geriatric Recovery Using Inpatient and Post-hospitalization Supplementation

PACE:

Post-hospitalization Interventions to Improve Physical Function in Older Adults

ACE:

Acute Care for the Elderly

UTMB:

University of Texas Medical Branch

ED:

Euclidean Distance

References

  1. Huang GD, Bull J, Johnston McKee K, Mahon E, Harper B, Roberts JN. Clinical trials recruitment planning: a proposed framework from the clinical trials Transformation Initiative. Contemp Clin Trials. 2018;66:74–9.

    Article  PubMed  Google Scholar 

  2. Verheggen F, Nieman F, Reerink E, Kok G. Patient satisfaction with clinical trial participation. Int J Qual Health Care. 1998;10(4):319–30.

    Article  CAS  PubMed  Google Scholar 

  3. Adler P, Otado J, Kwagyan J. Satisfaction and perceptions of research participants in clinical and translational studies: an urban multi-institution with CTSA. J Clin Transl Sci. 2020;4(4):317–22.

    Article  PubMed  PubMed Central  Google Scholar 

  4. COVID19-Response9. 0_Clinical-Trials_2020921_v2-1.pdf [Internet]. [cited 2023 Nov 22]. https://www.medidata.com/wp-content/uploads/2021/06/COVID19-Response9.0_Clinical-Trials_2020921_v2-1.pdf

  5. Sathian B, Asim M, Banerjee I, Pizarro AB, Roy B, van Teijlingen ER, et al. Impact of COVID-19 on clinical trials and clinical research: a systematic review. Nepal J Epidemiol. 2020;10(3):878–87.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Online Resource for Research in Clinical trials [Internet]. https://www.orrca.org.uk/

  7. Kasenda B, Liu J, Jiang Y, Gajewski B, Wu C, von Elm E et al. Prediction of RECRUITment In randomized clinical Trials (RECRUIT-IT)--rationale and design for an international collaborative study. Trials [Internet]. 2020 Aug 21 [cited 2023 Nov 28];21(1). https://go.gale.com/ps/i.do?p=HRCA&sw=w&issn=17456215&v=2.1⁢=r&id=GALE%7CA634935732&sid=googleScholar&linkaccess=abs

  8. Healy P, Galvin S, Williamson PR, Treweek S, Whiting C, Maeso B et al. Identifying trial recruitment uncertainties using a James Lind Alliance Priority Setting Partnership - the PRioRiTy (Prioritising Recruitment in Randomised Trials) study. Trials [Internet]. 2018 Mar 1 [cited 2023 Nov 28];19(1). https://go.gale.com/ps/i.do?p=HRCA&sw=w&issn=17456215&v=2.1⁢=r&id=GALE%7CA546089118&sid=googleScholar&linkaccess=abs

  9. Barnard KD, Dent L, Cook A. A systematic review of models to predict recruitment to multicentre clinical trials. BMC Med Res Methodol. 2010;10:63–63.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Gkioni E, Rius R, Dodd S, Gamble C. A systematic review describes models for recruitment prediction at the design stage of a clinical trial. J Clin Epidemiol. 2019;115:141–9.

    Article  PubMed  Google Scholar 

  11. Gkioni E, Dodd S, Rius R, Gamble C. Statistical models to predict recruitment in clinical trials were rarely used by statisticians in UK and European networks. J Clin Epidemiol. 2020;124:58–68.

    Article  PubMed  Google Scholar 

  12. Carter RE, Sonne SC, Brady KT. Practical considerations for estimating clinical trial accrual periods: application to a multi-center effectiveness study. BMC Med Res Methodol. 2005;5(1):11.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Lee YJ. Interim recruitment goals in clinical trials. J Chronic Dis. 1983;36(5):379–89.

    Article  CAS  PubMed  Google Scholar 

  14. Anisimov VV. Predictive event modelling in multicenter clinical trials with waiting time to response. Pharm Stat. 2011;10(6):517–22.

    Article  PubMed  Google Scholar 

  15. Jiang Y, Simon S, Mayo MS, Gajewski BJ. Modeling and validating bayesian accrual models on clinical data and simulations using adaptive priors. Stat Med. 2015;34(4):613–29.

    Article  PubMed  Google Scholar 

  16. Jiang Y, Guarino P, Ma S, Simon S, Mayo MS, Raghavan R et al. Bayesian accrual prediction for interim review of clinical studies: open source R package and smartphone application. Trials [Internet]. 2016 Jul 22 [cited 2024 Jan 21];17(1). https://go.gale.com/ps/i.do?p=HRCA&sw=w&issn=17456215&v=2.1⁢=r&id=GALE%7CA468888792&sid=googleScholar&linkaccess=abs

  17. Moffat KR, Shi W, Cannon P, Sullivan F. Factors associated with recruitment to randomised controlled trials in general practice: a systematic mixed studies review. Trials. 2023;24(1):NA–NA.

    Article  Google Scholar 

  18. Geriatric Recovery Using Inpatient and Post-hospitalization Supplementation (GRIPS). https://clinicaltrials.gov/study/NCT03904615#participation-criteria

  19. Feasibility Study of Post-hospitalization Interventions to Improve Physical Function in Older Adults (PACE). https://clinicaltrials.gov/study/NCT02203656?cond=Post-hospitalization%20Interventions%20to%20Improve%20Physical%20Function&rank=1

  20. Geriatric Services [Internet]. [cited 2024 Jan 21]. https://www.utmb.edu/utmbhealth/services/geriatric-care/ace-unit#:~:text=The%20ACE%20Unit%20is%20for,need%20end%20of%20life%20care

  21. Gajewski BJ, Simon SD, Carlson SE. Predicting accrual in clinical trials with bayesian posterior predictive distributions. Stat Med. 2008;27(13):2328–40.

    Article  PubMed  Google Scholar 

  22. Epps TW. A test for normality based on the empirical characteristic function.

Download references

Acknowledgements

We thank Shawn Goodlett, coordinator for clinical research senior at the Sealy Center for Aging, for providing the enrollment data logs and explaining the recruitment process.

Funding

Drs Villasante-Tezanos and Yu are supported by the Claude D. Pepper OAIC Developmental Project award from the UTMB Claude D. Pepper Older Americans Independence Center (5P30AG024832-19: Goodwin, Volpi, and Wong) from NIH/NIA.

Author information

Authors and Affiliations

Authors

Contributions

AV, YK, and XY conceptualized and designed the study. Data curation and analysis was performed by AV, XY and CKK. Drafting of the manuscript was performed by AV, XY, and CKK. Revision and editing of the manuscript were performed by AV, YK, CKK, YL, and XY. Funding was obtained by AV and XY. All authors read and approved the final version for publication.

Corresponding author

Correspondence to Alejandro Villasante-Tezanos.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: the authors noticed that three author names (Given and Family names) were switched. The incorrect names are: Villasante-Tezanos Alejandro, Kurinec Christopher and Li Yisheng. The correct names should be: Alejandro Villasante-Tezanos, Christopher Kurinec and Yisheng Li.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Villasante-Tezanos, A., Kuo, YF., Kurinec, C. et al. A non-parametric approach to predict the recruitment for randomized clinical trials: an example in elderly inpatient settings. BMC Med Res Methodol 24, 189 (2024). https://doi.org/10.1186/s12874-024-02314-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12874-024-02314-2

Keywords