### Actual example

Details of the trial design, interventions, resident background information, and efficacy results of the INSPIRED trial have been published previously [6]. The trial included 1700 residents from 12 care homes in Australia, of which 1089 (64.1%) were residents at the start of the trial, and the remaining 611 (35.9%) became residents after the start of the trial. There were 1149 hospitalisations during the trial, of which 943 hospitalizations of more than 24 h (> 24 h) were used for the primary outcome, length of stay in hospital. Of the residents, 377 had only one hospitalization of > 24 h, while 211 had multiple hospitalizations of > 24 h (137 had two, 45 had three, 11 had four, and 18 had four or more). The number of residents who died during the trial period was 534 (31.4%). The secondary outcome, number of hospitalizations > 24 h per facility-month, was 5.6 in the control condition and 4.3 in the intervention condition, a decrease of approximately 23% (no adjustment by covariates or comparison by estimation/statistical testing was performed).

### Basic notation

The timing of the unidirectional switch (henceforth, switch) in each cluster of the SWCRT is called a step, and here, we consider SWCRT with \(m\) clusters and \(s\) steps. For simplicity, we assume that the number of clusters to be switched from the control condition to the intervention condition in one step is one (\(s=m\)). In the \(i\) th cluster (\(i=1,\:\dots ,\:m\)), \({n}_{i}\) is the number of subjects observed during the entire trial duration.

Assuming that the start of the test is \({t}_{S}\) and the end of the last step period is \({t}_{E}\), the timing of the switch in each cluster is calculated as follows: \({W}_{i}={t}_{S}+i*({t}_{E}-{t}_{S}) / (m+1)\), and the distance between switches is calculated as follows: \({W}_{d}={W}_{i+1}-{W}_{i}={W}_{i}-{W}_{i-1}=({t}_{E}-{t}_{S}) / (m+1)\). Let \({d}_{ij}\) be the time point at which the \(j\) th subject (\(j=1,\: \dots ,\: {n}_{i}\)) in the \(i\) th cluster entered the trial. The distance \({w}_{ij}\) to the switch for each subject from the trial entry is defined as follows:

$$\left\{\begin{array}{cc}{w}_{ij}={W}_{i}-{d}_{ij}& { W}_{i}\ge {d}_{ij}\\ {w}_{ij}=0& {W}_{i}<{d}_{ij}\end{array}\right.$$

Suppose the starting point of the second and subsequent TTREs is the time of the previous event, and the actual TTRE used in the analysis based on the kth recurrence is \({T}_{ijk}\). Considering the starting point of each recurrence, the distance \({w}_{ijk}\) to the switch for each subject is defined as follows:

$$\left\{\begin{array}{cc}{w}_{ijk}={W}_{i}-{d}_{ij} & { W}_{i}\ge {d}_{ij} \:(k=1)\\ {w}_{ijk}={W}_{i}-{T}_{ijk-1}& { W}_{i}\ge {T}_{ijk-1} \:(k\ge 2)\\ {w}_{ijk}=0& {W}_{i}<{d}_{ij} \: \left(k=1\right), { \:W}_{i}\ge {T}_{ijk-1} \:(k\ge 2)\end{array}\right.$$

where \({h}_{ijk}\left(t\right)\) is the hazard function of the \(k\) th recurrence of the \(j\) th subject in the \(i\) th cluster at time \(t\), and \({h}_{0ik}\left(t\right)\) is the baseline hazard function of the \(k\) th recurrence of the \(i\) th cluster at time \(t\). No specific distribution is assumed for the baseline hazard function. \({Y}_{ijk}\left(t\right)\) is the indicator variable for the \(k\) th recurrence of the \(j\) th subject in the \(i\) th cluster at time \(t\), and this is 1 if the subject is at risk of recurrence and under observation, and 0 if not. \({X}_{ijk}\) is a vector of time-independent covariates for the \(k\) th recurrence of the \(j\) th subject in the \(i\) th cluster, and \({\beta }_{ik}\) is a vector of fixed parameters for the time-independent covariates of the \(k\) th recurrence of the \(i\) th cluster. \({Z}_{ijk}\left(t\right)\) is the intervention indicator as a time-dependent covariate for the \(k\) th recurrence of the \(j\) th subject in the \(i\) th cluster, which is 0 for \({t<{w}_{ij} \mathrm{\:or\: }w}_{ijk},\) and 1 for \({t\ge {w}_{ij} \mathrm{\:or\:} w}_{ijk}\) (changes before and after the switch). \({\beta }_{tik}\) is the parameter for the intervention effect for the \(k\) th recurrence of the \(i\) th cluster. The subscript \(i\) is omitted if it is assumed that each cluster has a common effect. The subscript \(k\) is omitted if it is assumed that each recurrence has a common effect.

### Statistical models

The first model considered was the CoxPH model [8, 14]. The hazard of the \(j\) th subject in the \(i\) th cluster at time \(t\) is expressed as follows:.

$${h}_{ij}\left(t\right)={h}_{0i}\left(t\right) \mathrm{exp}({\beta }_{ti}{Z}_{ij}\left(t\right)+{\beta }_{i}^{^{\prime}}{X}_{ij})$$

As was previously mentioned, applying the CoxPH model to recurrent events would result in a loss of information because only the TTFE of each subject can be included in the analysis, and the second and subsequent events are ignored. Taking recurrent events into account should theoretically improve the efficiency of estimating the effects of interventions [19]. Since the purpose of this study is to evaluate the performance of the statistical model in estimating the intervention effect using TTRE, no performance evaluation on the CoxPH model will be conducted. In the following, we present an extended CoxPH model that allows for the inclusion of TTRE in the analysis.

The Andersen and Gill (AG) model assumes a common baseline hazard function for all events, independent of the number of previous recurrences, and it is considered beneficial when investigating the overall intervention effect on the occurrence of recurrent events [9]. The hazard for the \(j\) th subject in the \(i\) th cluster at time \(t\) is expressed as follows:

$${h}_{ij}\left(t\right)={Y}_{ij}\left(t\right) {h}_{0i}\left(t\right) \mathrm{exp}({\beta }_{ti}{Z}_{ijk}\left(t\right)+{\beta }_{i}^{^{\prime}}{X}_{ijk})$$

In the usual CoxPH model, a subject who has experienced one event is no longer at risk for that event. In contrast, the AG model assumes that subjects who have experienced at least one event remain at risk unless they drop out of the trial. In the AG model, multiple events that occur within the same subject are considered to be independent. However, because they may not be independent in reality, it is advised that robust variance is used to handle the correlation within the subject when inferring the parameter vector [20, 21].

The Prentice-Williams-Peterson (PWP) model assumes a different baseline hazard function for each recurrence and accounts for correlation by stratifying by the number of prior recurrences. Therefore, it is considered beneficial when the risk of repeat events differs between recurrences [17]. The hazard \({h}_{ijk}\left(t\right)\) for the \(k\) th recurrence is defined by the history of the covariates and the number of recurrences up to time \(t\). Conditionally, it is assumed that the (\(k-1\))th recurrence is independent of the \(k\) th recurrence. Furthermore, it assumes that the subject is not at risk for the \(k\) th recurrence until the (\(k-1\))th recurrence, so that \({Y}_{ijk}\left(t\right)\) is 0 until the (\(k-1\))th recurrence and 1 after that.

The PWP model can be broadly divided into two models depending on the treatment of the time points. First, the PWP total-time (PWP-TT) model uses the time from the start of the follow-up to each recurrence. The hazard of the \(k\) th recurrence of the \(j\) th subject in the \(i\) th cluster at time \(t\) is expressed as follows:

$${h}_{ijk}\left(t\right)={Y}_{ijk}\left(t\right) {h}_{0ik}\left(t\right) \mathrm{exp}({\beta }_{tik}{Z}_{ijk}\left(t\right)+{\beta }_{ik}^{^{\prime}}{X}_{ijk})$$

The second is the PWP gap-time (PWP-GT) model, which uses the time from the occurrence of the previous recurrence to each recurrence. The hazard of the \(k\) th recurrence for the \(j\) th subject in the \(i\) th cluster at time \(t\) is expressed as:

$${h}_{ijk}\left(t\right)={Y}_{ijk}\left(t\right) {h}_{0ik}\left(t-{t}_{k-1}\right) \mathrm{exp}({\beta }_{tik}{Z}_{ijk}\left(t\right)+{\beta }_{ik}^{^{\prime}}{X}_{ijk})$$

As the number of recurrences increases in the PWP model, the number of subjects at risk becomes relatively small. This would make the estimates unstable, so limiting the data to a specific number of recurrences is usually necessary [22]. Due to these characteristics, the PWP model is helpful in situations where the number of recurrences per subject is small [17]. Our study assumes that each recurrence has a common effect when estimating parameters using the PWP model.

For each of the statistical models described so far, there are two analysis policies: (i) with stratification by clusters, which assumes that the baseline hazard function is different for each cluster, and (ii) without stratification by clusters, which assumes that the baseline hazard function is the same for each cluster.

The performance of each statistical model in the simulation was evaluated in terms of bias, mean square error (MSE), and coverage probability (CP). Bias is the mean difference across simulated replicates of the parameters of the intervention effect based on each statistical model and the true intervention effect \({\beta }_{t}\), where a positive value indicates underestimation and a negative value indicates overestimation; MSE is the sum of bias squared and variance of the estimated intervention effect based on each statistical model, with smaller values indicating better performance. CP is the proportion of the 95% confidence interval (CI) for the HR obtained by each statistical model that includes the HR based on the true intervention effect \({\beta }_{t}\). The closer the CI is to 0.95, the better the performance.

### Data generation process

For the time point \({d}_{ij}\) of the \(j\) th subject in the \(i\) th cluster to enter in the trial, we use \({t}_{S}\) at the beginning of the trial and \({t}_{E}\) at the end of the last step period already mentioned, and generate them randomly within the interval of \({t}_{S}+(\left({t}_{E}-{t}_{S}\right)*e)/E\) or \({t}_{S}+(\left({t}_{F}-{t}_{S}\right)*e)/E\). From this point, the TTFE at least, always occurs starting from \({d}_{ij}\). Here, \(e\) is a pseudo-random number generated from a uniform distribution, \(e\sim U(0, 1)\).

\({t}_{F}\) indicates the end of the trial and is expressed as \({t}_{F}={t}_{E}+({W}_{d}*F)\) using the distance \({W}_{d}\) between \({t}_{E}\) and the switch at the end of the last step period, as described above. \(F\) is a coefficient that specifies the follow-up period that may be set after the end of the last step period. When \(F=0\), there is no follow-up period, and \({t}_{F}={t}_{E}\). If \(F=X(>1)\), there is a follow-up period of \(X\) step after the end of the last step period. In the actual example, as shown in Fig. 1, each step is set every two months, and there is a follow-up period of 5 months (= 2.5 steps) after the end of the last step period. Based on the purpose and setting of the trial, other SWCRT have adopted a similar design [23,24,25].

In the actual simulation, three policies are considered: (i) no follow-up period and \({d}_{ij}={t}_{S}+(\left({t}_{E}-{t}_{S}\right)*e)/E\); (ii) there is a follow-up period and \({d}_{ij}={t}_{S}+(\left({t}_{F}-{t}_{S}\right)*e)/E\) (allow trial entry until the end of the follow-up period; illustrated in Fig. 2a); (iii) there is a follow-up period but \({d}_{ij}={t}_{S}+(\left({t}_{E}-{t}_{S}\right)*e)/E\) (terminate trial entry at the end of the last step period; illustrated in Fig. 2b).

In addition, \(E\) is a coefficient that specifies the timing of the trial entry. If \(E=1\), the subject enters the trial randomly between \({t}_{S}\) and \({t}_{E}\) or \({t}_{F}\), which reflects the open cohort design in that the subject may enter in the trial at any time. If \(E\) is greater than 1, it reflects a situation where the entry of the trial is concentrated at an earlier stage of the trial (illustrated in Fig. 2c). In the actual example, 64.1% of the residents entered at the start of the trial. Depending on the purpose and setting of the trial, other SWCRT show similar situations [26, 27].

In the actual simulation, policies (i) to (iii) above regarding the follow-up period and the time of trial entry can be taken for \(E=1\) and \(E>1\), respectively. Our study adopts only policy (iii) instead of (ii) at \(E>1\) (illustrated in Fig. 2d).

To compare our results with the secondary outcome of the actual example, number of hospitalisations > 24 h per facility-month, we decided to treat only hospitalizations > 24 h as a TTE in this study. It was previously published [6] that the number of residents repeatedly hospitalised more than four times was very small. Therefore, in our study, the maximum number of recurrent events generated in the simulation was three.

The relative performance of the statistical models used in TTRE, which are based on bias and variability, depend on the event generation model used in the simulation, and it is thus recommended that simulations based on multiple event generation models be considered [28]. Therefore, in this study, three types of event generation model were used.

The first is the Poisson process, which generates TTEs based on exponential distributions independent of each other, not only between subjects but also within subjects. The exponential distribution consists only of scale parameter. The starting point of all TTEs is \({d}_{ij}\) at the time of trial entry, and the hazard of a TTE is always constant, regardless of the time and number of recurrences (illustrated in Fig. 3a).

The second model uses the same Poisson process as the first one, but adopts the exponential distribution with different scale parameters between the subjects using random effect (i.e., inter-individual variability exists). It is referred to as the Mixed-Poisson process.

The third is the Weibull model, where the starting point of the first TTE is \({d}_{ij}\), as in the Poisson process, but the starting point of the second and subsequent TTEs is the time of the previous event (illustrated in Fig. 3b). Then, a Weibull distribution was assumed for the time between events within each subject. In addition to a scale parameter similar to an exponential distribution, the Weibull distribution contains the shape parameter. The Weibull distribution allows the hazard to vary with time depending on the setting of the shape parameter. As this model adopts a Weibull distribution with a common parameter from the first to the third TTE (i.e. the way the hazard changes are common from the first to the third TTE), we refer to it as the Weibull model (constant).

The fourth model uses the same Weibull model as the second one, but adopts the Weibull distribution with different parameters between the “first TTE” and the “second and third TTE” (i.e., the way the hazard changes is different between the first and second and third TTEs), and so it is referred to as the Weibull model (change).

In a simple RCT situation where an intervention effect exists, previous studies with time-independent covariates have shown that both the AG and PWP-TT models perform well for the Poisson process. On the other hand, it has been shown that only the PWP-TT model performs well for the Weibull model (constant), and only the AG model performs well for the Mixed-Poisson process [28].

To generate TTREs that can account for unidirectional switching, which is assumed to be estimating intervention effects using the CoxPH model and several extended CoxPH models, we use a data generation process for the CoxPH model with time-dependent covariates, based on the three event generation models previously described [29]. If the generated TTRE exceeds \({t}_{E}\) or \({t}_{F}\), it is treated as right-censored at \({t}_{E}\) or \({t}_{F}\).

In the generation of TTRE in the Poisson process and the Mixed-Poisson process, three pseudo-random numbers were generated independently from the uniform distribution \(U(0, 1)\) and sorted in increasing order, \({u}_{1},\: {u}_{2},\: {u}_{3}\) in turn (\({u}_{k},\: k=1,\: 2,\: 3\)). If the scale parameter of the exponential distribution is \(\lambda\), the baseline hazard function is \(\lambda\), which is always constant regardless of the time or number of recurrences. The \(k\) th TTRE of the \(j\) th subject in the \(i\) th cluster, when the starting point is not considered, is as follows:

$$\begin{aligned}{T}_{ijk}^{*}=\left\{\begin{array}{l}\frac{-\mathrm{log}({u}_{k})} {\lambda \mathrm{exp} ({\beta}^{\prime}x+{\tau }_{i}+{\tau}_{j})}\\ \qquad\qquad\mathrm{if}\:-\mathrm{log}({u}_{k})\:<\:\lambda \mathrm{exp} \left({\beta}^{\prime}x+{\tau}_{i}+{\tau}_{j}\right){w}_{ij},\\ \frac{[-\mathrm{log}\left({u}_{k}\right)-\lambda \mathrm{exp} \left({\beta }^{\prime}x+{\tau }_{i}+{\tau }_{j}\right){w}_{ij}+\lambda \mathrm{exp} ({\beta }^{\prime}x+{\beta }_{tik}+{\tau }_{i}+{\tau }_{j}){w}_{ij}]}{\lambda \mathrm{exp} ({\beta }^{\prime}x+{\beta }_{tik}+{\tau }_{i}+{\tau }_{j})} \\ \qquad\qquad\mathrm{if}\:-\mathrm{log}({u}_{k})\:\ge\: \lambda \mathrm{exp} \left({\beta }^{\prime}x+{\tau }_{i}+{\tau }_{j}\right){w}_{ij}\end{array}\right.,\end{aligned}$$

where \({\tau }_{i}\) and \({\tau }_{j}\) is the random effect on the variations between clusters and between subjects, \({\tau }_{i}\sim N\left(0, {\sigma }^{2}\right)\) and \({\tau }_{j}\sim N(0, {\sigma }_{s}^{2})\). \({\sigma }_{s}^{2}\) is 0 for the Poisson process and > 0 for the Mixed-Poisson process.

As already mentioned, \({\beta }_{tik}\) is the parameter of the intervention effect on the \(k\) th recurrence of the \(i\) th cluster, and \({w}_{ij}\) is the distance to switch for each subject from the trial entry. For simplicity, we omitted the \({\beta }^{^{\prime}}x\) for the time-independent covariates in the simulation. The TTRE, which is used in the analysis considering the starting point, is represented by \({T}_{ijk}={d}_{ij}+{T}_{ijk}^{*}\).

In the generation of TTRE in the Weibull model, three pseudorandom numbers were generated independently from the uniform distribution \(U(0, 1)\), \({u}_{1},\: {u}_{2},\: {u}_{3}\) in the order in which they are generated (\({u}_{k},\: k=1,\: 2,\: 3\)). Let the scale parameter of the Weibull distribution for each recurrence be \({\lambda }_{k}\), and the shape parameter be \({\nu }_{k}\). The baseline hazard function is \({\lambda }_{k}{\nu }_{k}{t}^{{\nu }_{k}-1}\) and it is allowed to vary with time. The \(k\) th TTRE of the \(j\) th subject in the \(i\) th cluster, when the starting point is not considered, is as follows:

$$\begin{aligned}{T}_{ijk}^{*}=\left\{\begin{array}{l}{\left(\frac{-\mathrm{log}({u}_{k})}{{\lambda }_{k}\mathrm{exp} ({\beta }^{\prime}x+{\tau }_{i})}\right)}^{{1}/{{\nu}_{k}}} \\\qquad\qquad \mathrm{if}\:-\mathrm{log}\left({u}_{k}\right)\:<\:{\lambda }_{k}\mathrm{exp} \left({\beta }^{\prime}x+{\tau }_{i}\right){{w}_{ijk}}^{{v}_{k}}\\ {\left(\frac{[-\mathrm{log}\left({u}_{k}\right)-{\lambda }_{k}\mathrm{exp} \left({\beta }^{\prime}x+{\tau }_{i}\right){{w}_{ijk}}^{{\nu}_{k}} +{\lambda }_{k}\mathrm{exp} ({\beta }_{tik}){{\mathrm{exp} \left({\beta }^{\prime}x+{\tau }_{i}\right)w}_{ijk}}^{{\nu}_{k}}]}{{\lambda }_{k}\mathrm{exp} ({\beta }_{tik})\mathrm{exp} \left({\beta }^{\prime}x+{\tau }_{i}\right)}\right)}^{{1}/{{\nu}_{k}}} \\\qquad\qquad\mathrm{if}\: -\mathrm{log}\left({u}_{k}\right)\:\ge\: {\lambda }_{k}\mathrm{exp} \left({\beta }^{^{\prime}}x+{\tau }_{i}\right){{w}_{ijk}}^{{v}_{k}}\end{array}\right..\end{aligned}$$

\({\tau }_{i}\), \({\beta }_{tik}\), \({w}_{ijk}\), and \({\beta }^{^{\prime}}x\) were explained in the previous sentence. The TTRE that is actually used for the analysis considering the starting point is:

$$\left\{\begin{array}{cc}{T}_{ijk}={d}_{ij}+{T}_{ijk}^{*}& k=1\\ {T}_{ijk}={T}_{ijk-1}+{T}_{ijk}^{*}& k=2,\: 3\end{array}\right.$$

The parameters are \({\lambda }_{1}={\lambda }_{2}={\lambda }_{3}, {\nu }_{1}={\nu }_{2}={\nu }_{3}\) for the Weibull model (constant), and \({\lambda }_{1}\ne {\lambda }_{2}={\lambda }_{3}, {\nu }_{1}\ne {\nu }_{2}={\nu }_{3}\) for the Weibull model (change).

In the actual example, 31.4% of the residents died during the trial period. Therefore, in our simulation, we considered the time-to-terminal-event (TTTE) as independent of the distance to switch and TTRE. If the generated TTTE does not exceed \({t}_{E}\) or \({t}_{F}\) and it is before the third TTRE, it is treated as mid-trial right-side censoring at the occurrence of the terminal event. The scale parameter of the Weibull distribution for the terminal event is \({\lambda }_{c}\), and the shape parameter is \({\nu }_{c}\). Without considering the starting point, the TTTE of the \(j\) th subject in the \(i\) h cluster, \({C}_{ij}^{*}\), can be expressed using the probability density function as follows:

$$f\left(x\right)=\frac{{\nu }_{c}}{{\lambda }_{c}^{{\nu }_{c}}}{x}^{{\nu }_{c}-1} exp\left\{-{\left(\frac{x}{{\lambda }_{c}}\right)}^{{\nu }_{c}}\right\}, x>0$$

The TTTE used in the actual analysis considering the starting point is expressed as \({C}_{ij}={d}_{ij}+{C}_{ij}^{*}\).

### Parameter settings

The scale parameter for the exponential distribution in the generation of the TTRE by the Poisson process was set to \(\lambda =0.003281\). This parameter was estimated based on the TTHA up to the third of the actual example, with all starting points set to zero. In addition, the inter-individual variability of the scale parameter in the generation of the TTRE by the Mixed-Poisson process was set to \({\sigma }_{s}^{2}=0.3455\). For this parameter, we used an estimate of the standard deviation of the normal distribution for the scale parameter based on the TTHA.

The scale and shape parameters of the Weibull distribution in the generation of TTRE using the Weibull model (constant) were set to \({\lambda }_{1}={\lambda }_{2}={\lambda }_{3}=0.004703,\: {\nu }_{1}={\nu }_{2}={\nu }_{3}=1.1219\). These parameters were estimated based on the TTHA, up to the third of the actual example. The starting point of the second and subsequent TTHA was the time of the previous hospitalisation.

The scale and shape parameters of the Weibull distribution in the generation of the TTRE using the Weibull model (change) were set to \({\lambda }_{1}=0.003599,\: {\lambda }_{2}={\lambda }_{3}=0.009910,\: {\nu }_{1}=1.5122,\: {\nu }_{2}={\nu }_{3}=0.9108\). These parameters were estimated based on the “first TTHA” and the “second and third TTHA” of the actual example, respectively. The starting point of the second and subsequent TTHAs was the time of the occurrence of the previous hospitalisation.

The scale and shape parameters of the Weibull distribution in the generation of TTTE as mid-trial right-side censoring were set to \({\lambda }_{c}=0.003674\) and \({\nu }_{c}=1.7191\). These parameters were estimated based on the time to death in the actual example.

Two parameters were set for the true intervention effect. The first is \({\beta }_{tik}={\beta }_{t}=-0.264\), which was calculated as \(\mathrm{ln}(4.3/5.6)\) based on the secondary outcome of the actual example, number of hospitalisations per facility month. The second is \({\beta }_{tik}={\beta }_{t}=0\), a setting used in previous studies on event generation models: HR = 1, which indicates that there is no difference in the risk of event occurrence between the control and intervention conditions. In a simple RCT situation where there is no intervention effect, both the AG and PWP-TT models have been shown to perform well, regardless of the type of event generation model [28].

### Simulation set-up

For all simulations, we fixed \({t}_{S}=0\) at the beginning of the trial, \({t}_{E}=360\) at the end of the last step period, and the total sample size per simulation (total number of subjects per trial) \(N=2000\). These settings were based on the fact that the actual example lasts for 12 months from the start of the trial to the end of the final step period; if one month is considered to be approximately 30 days, the trial period can be calculated as 12 × 30 = approximately 360 days, and the total number of subjects was 1700. Unless otherwise noted, the basic settings for each simulation scenario are as follows: the number of simulations is 1000, the event generation model consists of three types (Poisson process, Weibull model (constant), Weibull model (change)), the parameters of the true intervention effect are two ways (\(-0.264,\: 0\)), and \(s\left(=m\right)=5,\: {n}_{i}=n=N/m=400,\: {W}_{d}=({t}_{E}-{t}_{S}) / (m+1)=60,\: {\sigma }^{2}=0,\: E=1,\: F=0\). The setting of \(s=m=5\) is in reference to the fact that the number of steps in the actual example is five (Fig. 1).

Each simulation scenario is listed below. Scenario II applied two policies for each statistical model: (i) with stratification by clusters and (ii) without stratification by clusters. In all scenarios, except for scenario II, only (i) was applied.

In Scenario I, the number of steps (clusters) varied as \(s\left(=m\right)=2, 4, 5, 8, 10, 20\) to investigate how the performance of each statistical model changed as the number of steps (clusters) increased. As the number of steps changes, it becomes \(n=N/m=1000, 500, 400, 250, 200, 100,\: {W}_{d}=120, 72, 60, 40, 33, 17\). The results based on \(s\left(=m\right)=5,\: n=400,\: {W}_{d}=60\) in this scenario were used as a reference throughout the simulations in our study.

In Scenario II, we varied the variance with respect to the random effect \({\tau }_{i}\), which represents the variation among clusters, as \({\sigma }^{2}=0.25,\: 0.5,\: 1\), and investigated how the performance of each statistical model changed as the variation between clusters increased.

In Scenario III, the follow-up period varied as follows, \(F=1, 2, 3, 4\) to investigate how the performance of each statistical model changed as the follow-up period increased. The setting of \(F\) is based on the follow-up period of 2.5 steps in the actual example (Fig. 1). In this scenario, the time point of the trial entry point was \({d}_{ij}={t}_{S}+(\left({t}_{F}-{t}_{S}\right)*e)/E\), and the subject was allowed to enter until the end of the follow-up period.

In Scenario IV, the follow-up period was changed to \(F=1, 2, 3, 4\) to investigate how the performance of each statistical model changed as the follow-up period increased. In this scenario, the time point of the trial entry point was \({d}_{ij}={t}_{S}+(\left({t}_{E}-{t}_{S}\right)*e)/E\), and entry was terminated at the end of the final step period.

In Scenario V, we varied the timing of the trial entry as follows, \(E=1.5,\: 2,\: 4,\: 6\) to investigate how the performance of each statistical model changed as trial entry was concentrated at an earlier stage of the trial.

In Scenario VI, the time of trial entry varied as follows, \(E=1.5,\: 2,\: 4,\: 6\), and the follow-up period was changed to \(F=1, 2, 3, 4\), to investigate how the performance of each statistical model changed in a situation where trial entry was concentrated in an earlier stage of the trial, and there was a follow-up period. In this scenario, for convenience, we used \({d}_{ij}={t}_{S}+(\left({t}_{E}-{t}_{S}\right)*e)/E\) as the time point for trial entry.

### Analysis of an actual example

The time-independent covariates employed in the model analysis for the primary outcome in the actual example (age, sex, medical power of attorney, health directive, advance care plan/statement of choices, primary diagnosis, age-adjusted Charlson comorbidity index, and fidelity) were used for adjustment, when analysing hospitalization > 24 h repeatedly occurred with the TTRE in the actual example using each statistical model.

Two policies were applied to each statistical model: (i) with stratification by clusters and (ii) without stratification by clusters. Fidelity is a per-cluster variable and was employed only with policy (ii), as it is not available for adjustment in (i). The unidirectional switch from the control condition to the intervention condition in each cluster was expressed using the intervention indicator as a time-dependent covariate.

In the usual TTRE analysis, continuous risk intervals were employed. However, in reality, they are not exposed to the risk of further hospitalisation during their hospital stay. Therefore, in this study, we adopted a discrete risk interval [30]. Thus, for example, if a resident was hospitalised, subsequent exposure to the risk of new hospitalisation would be from the day of discharge.

The results of the analysis were evaluated using HR and its 95% CI and *p*-value. In addition, parameter estimates and standard error (SE) were evaluated for the intervention effects.

### Software and code

All statistical analyses, including simulations, were performed using SAS, version 9.4 (SAS Institute, Cary, NC, USA). The PROC PHREG of SAS was used to analyse the TTRE. For the generation of pseudo-random numbers by SAS, the RANUNI function was used to generate the time point of trial entry and TTRE, the RANNOR function was used to generate the cluster effect, and the RAND function was used to generate the TTTE. For information on simulation codes, see Availability of data and materials.