Research article  Open  Open Peer Review  Published:
The impact of varying cluster size in crosssectional steppedwedge cluster randomised trials
BMC Medical Research Methodologyvolume 19, Article number: 123 (2019)
Abstract
Background
Cluster randomised trials with unequal sized clusters often have lower precision than with clusters of equal size. To allow for this, sample sizes are inflated by a modified version of the design effect for clustering. These inflation factors are valid under the assumption that randomisation is stratified by cluster size. We investigate the impact of unequal cluster size when that constraint is relaxed, with particular focus on the steppedwedge cluster randomised trial, where this is more difficult to achieve.
Methods
Assuming a multilevel mixed effect model with exchangeable correlation structure for a crosssectional design, we use simulation methods to compare the precision for a trial with clusters of unequal size to a trial with clusters of equal size (relative efficiency). For a range of scenarios we illustrate the impact of various design features (the clustermean correlation – a function of the intracluster correlation and the cluster size, the number of clusters, number of randomisation sequences) on the average and distribution of the relative efficiency.
Results
Simulations confirm that the average reduction in precision, due to varying cluster sizes, is smaller in a steppedwedge trial compared to the parallel trial. However, the variance of the distribution of the relative efficiency is large; and is larger under the steppedwedge design compared to the parallel design. This can result in large variations in actual power, depending on the allocation of clusters to sequences. Designs with larger variations in cluster sizes, smaller number of clusters and studies with smaller clustermean correlations (smaller cluster sizes or smaller intracluster correlation) are particularly at risk.
Conclusion
The actual realised power in a steppedwedge trial might be substantially higher or lower than that estimated. This is particularly important when there are a small number of clusters or the variability in cluster sizes is large. Constraining the randomisation on cluster size, where feasible, might mitigate this effect.
Background
Cluster randomised trials (CRTs) often contain clusters of unequal size [4]. In the context of a parallel CRT (PCRT), it has been established that an increase in the variability of cluster sizes leads to a decrease in precision [3, 18]. There have been many suggested modifications to the conventional cluster design effect (DE) to allow for unequal cluster sizes in a PCRT. In such modifications, the DE is a function of the cluster sizes and the intracluster correlation (ICC), and either the actual (varying cluster sizes) that are known pretrial [12, 17]; or an estimate of the average cluster size and a measure of dispersion of cluster sizes [3, 19].
The steppedwedge CRT (SWCRT) is an alternative form of the CRT. Under this design, clusters are typically randomly allocated to one of a number of sequences which dictate how many time periods the cluster will spend in the control condition, followed by periods under the intervention condition (Fig. 1) [7, 9]. Outcomes can be assessed on the same cohort of participants who are followedup for the study duration, on a new crosssection of participants at each timeperiod, or a combination of the two [2, 7]. In this paper, the focus is solely on the crosssectional design.
Under the assumption of a crosssectional design, [9] proposed a mixed effect model, with a fixed effect for time and random effect for cluster, as a framework for the design and analysis of a SWCRT. They derived methods to estimate the power of a SWCRT based on this model set up. Whilst this approach does not itself limit the cluster sizes to be equal, subsequent design effects derived from this model make the assumption that there is no variation in cluster sizes [10, 20]. Although there exists an adjustment for these DE to allow for unequal cluster sizes, it is based on stratification scheme in which the distribution of cluster sizes is the same within each sequence [6]. If the number of clusters allocated to each sequence is small, then stratification by cluster size may not be possible. Furthermore, because in a SWCRT, clusters sequentially transition to the intervention condition, in a design with clusters of unequal size, the order of randomisation of the (different sized) clusters to crossover to the intervention has implications on the power of the study (because it can result in a large imbalance on cluster sizes across treatment conditions). This leads to the notion of a conditional power – the power for a fixed randomisation order; and an average and distribution of power over all possible randomisation orders. At the design stage, the natural focus is on the average and distribution of power since it reflects the expected power across all randomisation orders. We therefore explore the influence of varying cluster sizes in the SWCRTs in absence of stratification by cluster size; and importantly consider the distribution of possible realisations of power across all randomisation orders.
Aims and objectives
We present methods to estimate the power in a SWCRT with unequal cluster sizes in which the cluster sizes are all known, but are unequal. We then extend this method to estimate power when the cluster sizes are not known but are unequal, and only the expected average cluster size and a measure of dispersion of cluster sizes are known, such as the coefficient of variation (CV). We then explore the extent to which the power of a SWCRT is affected by varying cluster size and highlight design variations (i.e. number of sequences, cluster size etc.) which are most influential. We illustrate how much variation in power may exist across different randomisation orders. We explore which of a SWCRT and a PCRT is most affected by varying cluster size; and to a limited extent explore whether randomisation schemes which constrain the randomisation so that total sample sizes under intervention and control conditions are balanced might help minimise any loss in power due to varying cluster sizes.
Motivating example
Changing clinical communications: a steppedwedge cluster randomised trial
A SWCRT is to be used to evaluate the effectiveness of a training program aimed at improving patients’ satisfaction with doctorpatient relationship in a general practice environment. The intervention includes a training package in communication skills which will be delivered to all doctors at each of six included general practices. The intervention will be rolledout to the practices over six sequences, and the evaluation will consist of data from seven timeperiods (Fig. 1). It is unlikely that any conventional stratification method for constraining the randomisation by cluster size could be implemented in this design set up. The primary outcome is patient satisfaction score, measured via a series of questions on a Likert scale. It is hoped that the intervention will lead to a 0.2 increase in patient satisfaction from a mean (SD) patient satisfaction of 3.2 (0.8). For illustration we assume the ICC is in the region of 0.05.
Each timeperiod will be one month in duration, and different patients will be included at each timeperiod, so that the design is crosssectional. It is expected that each cluster will contribute an average of 50 patients per timeperiod, so that an estimated 2100 observations (=50x6x7) will be available. However, it is known that the cluster sizes will vary. We outline two proposed approaches for accommodating this variation in cluster sizes by way of example to illustrate the concept of a distribution of power across randomisation orders, and then proceed to describe the technical details of implementation.
Estimating power with known cluster sizes
Let us first assume that the average cluster size per timeperiod for clusters 1 to 6 are: 15, 25, 35, 45, 80, and 100. This corresponds to an average of 50 observations per clusterperiod, and a coefficient of variation of 0.66 – a value which is not dissimilar to that reported in UK general practice [3]. With six clusters and six sequences, there are 720 (=6x5x4x3x2x1) possible permutations of the randomisation order. The power can be calculated for each randomisation order using the methods described by Hussey and Hughes (described in detail forthwith), and is illustrated in Fig. 2a.
If equal cluster sizes had been assumed, then the power would be 80.75%. Allowing for the variation in cluster sizes and the associated different randomisations, the average (median) power across all randomisation orders is 80.2% (IQR: 78.4 to 81.6%, range: 75.0 to 82.5%). The minimum power is found when the randomisation order is: 25, 45, 100, 80, 15, and 35, and the maximum power when the order is: 100, 15, 45, 35, 25, and 80. Therefore, whilst on average the design may obtain 80% power, the randomisation order could produce a design in which the power is less than this, and it could be as low as 75%.
Estimating power with unknown cluster sizes
Let us now assume that the cluster sizes are not known, but it is known that average cluster size across clusters (per period) will be 50, and the coefficient of variation of cluster sizes will be 0.66. To acknowledge varying cluster sizes, potential (unequal) cluster sizes could be simulated, and an estimate of the power calculated using the Hussey and Hughes formula ([9], full details below). Because, the average and distribution of the possible power is of interest, the simulation of cluster sizes can be repeated a large number of times to create a distribution of the power. The distribution of power with 4000 simulated clusterperiod sizes is given in Fig. 2b. The median power is 80.9% (IQR: 78.8 to 82.1%, range: 63.9 to 83.7%). Therefore, whilst on average the design may obtain 80% power, the randomisation order could produce a design in which the power is less than this, and it is estimated it could be as low as 64%.
Methods
Firstly, for completeness, we present the method to estimate power in a SWCRT, as presented by Hussey and Hughes [9]. This will include an illustration of how the power can be estimated for a fixed set of known, but varying, cluster sizes. We then present a simulation method to estimate the average and distribution of power across a simulated set of randomisation orders when only the mean and variance of the cluster sizes are known. We illustrate that randomisation to a particular sequence induces a conditional power and highlight the need to consider the average and distribution of power at the design stage. We then describe a simulation study that investigates the importance of variance design features (number of sequences, clustermean correlation, number of clusters, and coefficient of variation of cluster sizes) on the effect of the power of a SWCRT with varying cluster size for continuous outcomes. Finally, for a limited set of scenarios we explore the correlation between the power and balance of total sample size observed under both treatment conditions.
Estimating power in a SWCRT with known cluster sizes (equal or unequal)
The power can be estimated in a SWCRT using analytical methods described by Hussey and Hughes [9]. For this, a multilevel mixed effect model is assumed:
Where, y_{ijk} is the outcome for participant k in cluster i at time j, μ is the mean outcome in the unexposed period in the first timeperiod, β_{j} is a time effect, fixed for timeperiodsj = 2, … , T (β_{1} = 0 for identifiability), δ is the treatment effect, α_{i} is a random effect for cluster i defined as: α_{i}~N(0, σ_{b}^{2}), ε_{ijk} is the residual error (~N(0, σ_{w}^{2})) and x_{ij} is an indicator of treatment exposure of cluster i at time j (1 = treatment, 0 = control). Under this model, the ICC can be defined as \( \frac{{\sigma_b}^2}{{\sigma_b}^2+{\sigma_w}^2} \).
The power in a SWCRT to detect a specified difference (δ) can be estimated using a Wald test, if the variance components are known, as:
Here, Z_{1 − α/2} is the 1 − α/2^{th} quantile of the standard Normal distribution function, X is a design matrix that describes the cell means for the linear parameters (the intervention effect, δ, the time parameters β_{1}, … , β_{j} and the intercept μ) and V is a variancecovariance matrix of the cell means, made up of CT × CT blocks, where C is the number of clusters and T the number of timeperiods. Each T × T block of V refers to a particular cluster and describes the correlation between the cluster means over time, and has the form:
Here m_{i} refers to the clusterperiod size for cluster i. The m_{i} ′ s are known but unequal, in general.
If the cluster sizes are unequal, then the power is dependent on the randomisation order – since the randomisation will impact matrix V. A distribution of power can be calculated by considering all possible permutations of the randomisation order or a large enough sample of unique randomisation orders; and then determining the power under each of these randomisation orders.
Estimating power in a SWCRT with unknown (but varying) cluster sizes by simulation
In a SWCRT in which the exact cluster sizes are not known in advance, the mean clusterperiod size (φ) and the CV can be used to simulate potential clusterperiod sizes (m_{i}). Since it is expected that cluster sizes will exhibit a positive skew, and a nonnegative distribution is required, we assume that the clusterperiod sizes follow a Gamma distribution, such that:
The simulated values can be used in the above framework (Eq. 2) by replacing the m_{i} values in the matrix V in order to estimate the power. Following this, the mean clusterperiod size and the CV are used to simulate a new set of clusterperiod sizes. The new m_{i} values are used to calculate matrix V, which is used in Eq. 2 to calculate a new estimate of the power. This process is repeated to generate a set estimates of power, which provides an average (and distribution of) power. The number of repetitions will influence the degree of precision surrounding the mean (and possible SD) of the distribution of power.
A simulation study to assess the impact of varying cluster size in a SWCRT
Now, we consider the impact of various design features (such as the number of sequences and clusterperiod sizes) on the power in a SWCRT, where the cluster sizes vary. This is shown through a simulation study, which we describe below. We present estimates of the relative efficiency, which compares the precision of a SWCRT with unequal cluster sizes to the precision in a SWCRT with equal cluster sizes; under the prerequisite than both designs have the same total sample size. The precision is used as it is invariant to the target effect size.
We consider five key design features: the number of sequences; the number of clusters; the clusterperiod size; the ICC (ρ); and the coefficient of variation of cluster sizes. The cluster mean correlation (CMC) is a function of the average total cluster size (M) and the ICC [5] (see Fig. 3), and represents the correlation between the cluster means of two repeated sets of observations taken from the same cluster and is defined as:
It has previously been established that the efficiency of a SWCRT with equal sized clusters hinges on the value of the CMC [5]. However, in the scenarios described below – i.e. Gammadistributedclustersizes – the distribution of precision/power depends on M and ρ only through the CMC. This means that the number of dimensions in the simulation study is conveniently reduced by presenting results in terms of the CMC, rather than M and ρ separately. This has substantial presentational advantages. In what follows any result that describes the qualitative effect of an increased CMC can be reinterpreted in terms of increased ρ, or of increased M. The full spectrum of potential values of the CMC was used (0 to 1). The majority of SWCRTs contain four or fewer sequences [14] but we included two larger values to capture the full effect of the number of sequences on the design and crucially because we are interested in the situation where the randomisation cannot be stratified on cluster size, which is more likely to occur in situations with a larger number of sequences. The number of clusters is based upon multiples of the number of sequences to ensure an equal number of clusters randomised per sequence. The degree of cluster size variation ranged from small (CV = 0.25) to large (CV = 1.5). A full list of the values chosen is given in Table 1. A full factorial design was used, giving 1320 possible scenarios. To maintain a Monte Carlo error around the precision smaller than 1%, 4000 simulations were used for each scenario [1].
In every simulation a clusterperiod size (m_{i}) is generated for each cluster (i) by sampling from a Gamma distribution with shape parameter α. The m_{i}s are then scaled to ensure that the total sample size in the simulated design is equal to the total sample size in the corresponding equalcluster design with clusterperiod size φ. (The scaling ensures that the variation in simulated precision is a consequence of cluster inequality rather than differences in studysize.) The scaled m_{i}s are used to calculate matrix V, which in turn is used to estimate the precision using Eq. 2, which is then compared to the precision of a SWCRT with equal sized clusters to give the relative efficiency. This process is repeated 4000 times with a new set of clusterperiod sizes simulated each time, to produce 4000 estimates of the relative efficiency. When referring to the distribution (or variation) of the relative efficiency, we focus on the IQR rather than the actual range – since the range is impacted by the number of simulations.
A simulation study to assess the impact of varying cluster size in a PCRT
The notion of the randomisation of clusters impacting the precision in a SWCRT led us consider whether the precision in a PCRT with varying cluster sizes should also be represented as a distribution of values, rather than a singular value – which is usually assumed. The methods described above for a SWCRT can also be used to evaluate the precision in a PCRT [8]. This allows us to simulate potential clusterperiod sizes for a PCRT for a variety of different scenarios, and examine the impact of unequal cluster sizes in a PCRT. The scenarios chosen were identical to that used to assess the impact of unequal cluster size in a SWCRT (see Table 1), with the exception of the number of sequences – which can be conceptualised by two arms in a PCRT (the total number of clusters are therefore assumed to be randomised evenly across the two arms). A full factorial design was used, giving 264 possible scenarios.
Results
The impact of varying cluster sizes on both the average and distribution of power (or precision) of a SWCRT depends on the design features of the study, such as the number of randomisation sequences, the CMC, and the number of clusters. We discuss the impact of each design feature in turn. The results are presented using the relative efficiency (RE), which compares the precision of a CRT with unequal cluster sizes to the precision in a CRT with equal cluster sizes; with a prerequisite that both CRTs have identical designs and sample sizes. We also discuss what impact the imbalance of observations between control and intervention conditions may have on the precision and power of a study.
Key results
On average, the precision is lower when the cluster sizes are unequal compared to the case with equal sized clusters, for both the PCRT and the SWCRT (Fig. 4). Under most scenarios considered, the average effect of varying cluster sizes on precision was smaller in a SWCRT than in a PCRT (Fig. 4, Table 2). However, the true impact of varying cluster sizes in any given SWCRT will depend on the randomisation of clusters to sequences. In an illustrative example, a SWCRT with clusters of unequal size could have up to 80% less precision than a SWCRT with equal sized clusters (Fig. 5). In the same illustrative example, somewhat surprisingly, it could transpire that a SWCRT with clusters of unequal size could have up to 30% more precision than a SWCRT with equal sized clusters (Fig. 5). Therefore, the anticipated precision in a SWCRT with unequal cluster sizes might differ from a SWCRT with equal cluster sizes, and the actual realised loss or gain in efficiency might be high and this crucially depends on the actual randomisation (i.e. there will be a range across this relative efficiency and this is not necessarily below 1).
The magnitude of the loss or gain in efficiency and its possible range across randomisation orders is impacted by the design features of the SWCRT, which are discussed in more detail below. We focus on the interquartile range so as not to put undue emphasis on extremes.
Steppedwedge CRTs
Coefficient of variation of cluster sizes
Any increase to the amount of variation in cluster sizes leads to a greater average precision loss in a SWCRT (i.e. the RE is less than one). Figure 4a illustrates a small amount of variability in cluster sizes (CV = 0.25) has negligible impact on the average RE, but larger amounts of cluster size variability could provide a design with substantial losses in efficiency compared to a design with equal sized clusters. In addition, the range of the distribution of RE values widens as the CV increases. For example a 12clusterSWCRT with 12 sequences and a CMC of 0.2, the RE has an IQR of 0.98 to 1.02 when the CV is small (CV = 0.25) (Table 2); whereas there is a much wider IQR of 0.76 to 0.93 when the CV is large (CV = 1.25).
Cluster mean correlation
The average loss in precision in a SWCRT due to the presence of unequal sized clusters is relatively unaffected by the CMC (Fig. 4b). However, the actual value of the RE can vary substantially from the average depending on the randomisation of clusters to sequences. Figure 4b illustrates how the range (or distribution) of the RE is widest when the CMC is small; and at its narrowest when the CMC is large. This is emphasised by an example from Table 2, in which for a SWCRT with 3 sequences and 12 clusters the average RE is 0.91 [IQR: 0.79–1.00] for a CMC of 0; and 0.90 [IQR: 0.81–0.95] for a CMC of 0.8 (illustrative CV = 1.25).
Number of clusters
The average loss in efficiency due to unequal sized clusters does depend on the number of clusters and is greater when the number of clusters is small (Fig. 4c). The range of the RE also depends on the number of clusters. Figure 4c illustrates that the range of the RE is widest when the number of clusters is smaller; and at its narrowest when the number of clusters is large. For example, the average RE for the 12 sequence design is 0.88 [IQR: 0.83–0.92] for a study with 12 clusters (Table 2); and 0.95 [IQR: 0.94–0.96] for a study with 96 clusters (illustrative CV = 1.25 and CMC = 0.8).
Number of sequences
The average loss in precision due to the presence of unequal sized clusters is relatively unaffected by the number of sequences. The range (or distribution) of the RE is not greatly impacted by the number of sequences when the SWCRT has more than two sequences (Fig. 4d). For example from Table 2, in a SWCRT with 12 clusters the average RE for a design with 3 sequences is 0.90 [IQR: 0.81–0.95]; and 0.88 [IQR: 0.83–0.92] for a design with 12 sequences (illustrative CV = 1.25 and CMC = 0).
SWCRT vs PCRT
The effect of varying cluster sizes on the average loss (or gain) in efficiency is smaller in a SWCRT compared to the PCRT (Fig. 4a, c, Table 2). However, as is the case for the SWCRT, the actual realised precision (or power) in a PCRT might be different from the expected (or average) precision. The relationship between the average and distribution of precision, and the number of clusters and amount of variation in cluster sizes (CV) is similar to that of the SWCRT. Any increase in the CV leads to a decrease in the average RE, and a widening of the range of RE values. PCRTs with fewer clusters may have a lower RE and a wider range of RE values than designs with a greater number of clusters. However, the relationship between the relative efficiency and the clustermean correlation in a PCRT is somewhat different to in a SWCRT (Fig. 4b). As previously discussed, the impact of the CMC in a SWCRT is small. However in a PCRT, increases to the CMC between 0 and 0.5 lead to decreases in the RE, but increases in the CMC between 0.5 and 1.0 increase the RE, and so the PCRT follows a parabolic pattern when comparing RE and the CMC. Furthermore, in a SWCRT it is possible for designs with unequal cluster sizes to obtain more precision – and hence a greater power – than an identical SWCRT but with equal sized clusters. However, a PCRT with unequal cluster sizes can never have greater precision than a PCRT with equal sized clusters (Fig. 4a, c, Table 2).
Imbalance of observations between control and intervention condition (sample size imbalance)
In a SWCRT with clusters of unequal sizes, the randomisation process could lead to an imbalance in the number of observations contributing to the control and intervention conditions (sample size imbalance). However, the guarantee of an equal number of observations observed under control and intervention conditions does not guarantee that a SWCRT will have optimal precision or power. A comparison of precision and sample size imbalance has been illustrated for four scenarios in Fig. 6 (12 cluster SWCRT with 4 or 12 sequences and a CMC of 0.2 or 0.5). Generally, the lowest precision was found when there was a small degree of sample size imbalance. The greatest precision is not necessarily achieved when the number of observations is equal in control and intervention conditions. The results are consistent with changes to the number of sequences and changes to the CMC. However, despite these scenarios showing a positive correlation between sample size imbalance and precision, in several other examples (a SWCRT with 4 clusters and 4 sequences, and a SWCRT with 5 clusters and 5 sequences), we observed the opposite relationship (see Additional file 1: Figure S1).
Discussion
It is well known that the precision or power of a cluster randomised trial is lower when the cluster sizes are unequal compared to the case with equal sized clusters. This is known to be the case for both the PCRT and the SWCRT. More recently, it has also been established that the average reduction in relative precision in a SWCRT is lower than in a PCRT [6]. However, we have shown that whilst the expected or average impact of varying cluster sizes is relatively small, the actual impact might be much larger. This is because conditional on the randomisation order, a SWCRT with clusters of unequal size could possibly have more or less precision than a SWCRT with equal sized clusters. In some designs with unequal cluster sizes, some randomisations could lead to as much as a 30% increase in precision compared to a design with equal sized clusters. However, other randomisations could lead to an 80% decrease in precision compared to a design with clusters of equal size. These potentially large reductions (or sometimes increases) in precision are particularly of concern in SWCRTs with large variation between cluster sizes, a small number of clusters or small clustermean correlation (i.e. smaller cluster sizes or smaller intracluster correlation).
We also demonstrated similar, although less marked properties in the PCRT. This is something that has not been noted in the literature to date. In the PCRT it has been established that the loss of (average) efficiency due to variation in cluster sizes rarely exceeds 10% [6, 19]. However, this average or expected loss in efficiency holds under the assumption of a sizestratified randomisation scheme [6]. When not stratifying the randomisation on cluster size, the loss of efficiency can greatly exceed 10% depending on the randomisation order. It is fairly typical for a PCRT to stratify or constrain the randomisation on cluster size [11]. A constrained randomisation approach has been recommended to minimise loss in power in a SWCRT [16]. We observed (for a limited set of scenarios) that on average, the smaller the sample size imbalance, the greater the precision. However, for a few limited scenarios, we observed an inverse correlation between sample size imbalance and precision. Further work is therefore needed to determine when a constrained randomisation in SWCRTs, where the constraint minimises any sample size imbalance, will achieve desired aims of increasing power and where it might decrease power.
Limitations
Although the methods and results described here have been for continuous outcomes, we suggest that until further research is conducted these results are also assumed to hold for binary outcomes. In this work, we have assumed an exchangeable correlation structure with only random cluster effects. Further work is needed to consider more general autocorrelation structures. For example, the inclusion of a random cluster by time interaction [6, 10, 15], or an exponential correlation model [13]. We also assumed observations were sampled uniformly across timeperiods, which is consistent with standard approaches for longitudinal cluster randomised trials, but may not always be an assumption that will hold in practice.
Conclusions
The actual realised power in a steppedwedge trial with unequal cluster sizes depends on the order of randomisation of clusters to sequences. Design inflation factors, allowing for varying cluster sizes, all assume a sizestratified randomisation scheme. Only under this assumption is the impact of varying cluster size known to be minimal. Where randomisation schemes either do not, or where it is infeasible to implement a sizestratified randomisation scheme, the realised power could be substantially higher or lower than the expected power, even after allowing for variation in cluster sizes. This is particularly important when there are a small number of clusters or the variability in cluster sizes is large. Constraining the randomisation on cluster size, where feasible, might mitigate this effect.
Availability of data and materials
The datasets used and analysed during the current study are available from the corresponding author on reasonable request.
Abbreviations
 CMC:

Cluster mean correlation
 CRT:

Cluster randomised trial
 CV:

Coefficient of variation
 DE:

Design effect
 ICC:

Intracluster correlation
 PCRT:

Parallel cluster randomised trial
 SWCRT:

Steppedwedge cluster randomised trial
References
 1.
Baio G, Copas A, Ambler G, Hargreaves J, Beard E, Omar RZ. Sample size calculation for a stepped wedge trial. Trials. 2015;16:354.
 2.
Copas AJ, Lewis JJ, Thompson JA, Davey C, Baio G, Hargreaves JR. Designing a stepped wedge trial: three main designs, carryover effects and randomisation approaches. Trials. 2015;16:352.
 3.
Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int J Epidemiol. 2006 Oct;35(5):1292–300.
 4.
Eldridge SM, Ashby D, Feder GS, Rudnicka AR, Ukoumunne OC. Lessons for cluster randomized trials in the twentyfirst century: a systematic review of trials in primary care. Clinical trials. 2004;1(1):80–90.
 5.
Girling AJ, Hemming K. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med. 2016;35(13):2149–66.
 6.
Girling A. Relative efficiency of unequal cluster sizes in stepped wedge and other trial designs under longitudinal or crosssectional sampling. Stat Med. 2018:1–13.
 7.
Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. BMJ (Clinical research ed). 2015a;h391:350.
 8.
Hemming K, Lilford R, Girling AJ. Steppedwedge cluster randomised controlled trials: a generic framework including parallel and multiplelevel designs. Stat Med. 2015b;34(2):181–96.
 9.
Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemporary clinical trials. 2007;28(2):182–91.
 10.
Hooper R, Teeresntra S, De Hoop E, Eldridge S. Sample size calculations for stepped wedge and other longitudinal cluster randomised trials; 2016.
 11.
Ivers NM, Halperin IJ, Barnsley J, Grimshaw JM, Shah BR, Tu K, et al. Allocation techniques for balance at baseline in cluster randomized trials: a methodological review. Trials. 2012 Aug;13(1):120.
 12.
Kerry SM, Bland JM. Unequal cluster sizes for trials in English and welsh general practice: implications for sample size calculations. Stat Med. 2001;20(3):377–90.
 13.
Kasza J, Hemming K, Hooper R, Matthews J, Forbes AB. Kasza J, Hemming K, Hooper R, Matthews J, Forbes AB; ANZICS Centre for Outcomes & Resource Evaluation (CORE) Committee. Impact of nonuniform correlation structure on sample size and power in multipleperiod cluster randomised trials. Stat Methods Med Res. 2017. 1:962280217734981.
 14.
Martin J, Taljaard M, Girling AJ, Hemming K. Systematic review finds major deficiencies in sample size methodology and reporting for steppedwedge cluster randomised trials. BMJ Open. 2016;6:e010166. https://doi.org/10.1136/bmjopen2015010166.
 15.
Martin J, Girling A, Nirantharakumar K, Ryan R, Marshall T, Hemming K. Intracluster and interperiod correlation coefficient for crosssectional cluster randomised controlled trials for type2 diabetes in UK primary care. Trials. 2016b;17:402.
 16.
Moulton LH, Golub JE, Durovni B, Cavalcante SC, Pacheco AG, Saraceni V, King B, Chaisson RE. Statistical design of THRio: a phased implementation clinicrandomized study of a tuberculosis preventive therapy intervention. Clin Trials. 2007;4(2):190–9.
 17.
Pan W. Sample size and power calculations for correlated binary data. Control Clin Trials. 2001;22(2):211–27.
 18.
Rutterford C, Copas A, Eldridge S. Methods for sample size determination in cluster randomized trials. Int J Epidemiol. 2015;44(3):1051–67.
 19.
Van Breukelen GJ, Candel MJ. Comments on ‘efficiency loss because of varying cluster size in cluster randomized trials is smaller than literature suggests. Stat Med. 2012;31(4):397–400.
 20.
Woertman W, et al. Stepped wedge designs could reduce the required sample size in cluster randomized trials. J Clin Epidemiol. 2013;66(7):752–8.
Acknowledgements
Not applicable.
Author information
Affiliations
Contributions
This work forms a chapter of JMs PhD (awarded 2017). JM undertook all the simulations and wrote the first draft of the paper, under supervision of KH and AG. KH and AG made a substantial contribution to all stages of the project; including writing significant parts of the manuscript. All authors read and approved the final manuscript.
This research was partly funded by the UK NIHR Collaborations for Leadership in Applied Health Research and Care West Midlands initiative. Karla Hemming is funded by a NIHR Senior Research Fellowship SRF201710002. The funding body played no role in the design of the study, the analysis and interpretation of results, or in the writing of the manuscript.
Corresponding author
Correspondence to James Thomas Martin.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1:
Figure S1. The impact of imbalance of observations between control and intervention condition on the precision of a steppedwedge cluster randomised trial with few clusters. The balance statistic was calculated as: (number of observation in intervention condition – number of observation in control condition). A larger value of the balance statistic indicates greater imbalance. Each point is the balance statistic and precision for a particular randomisation order. Values were calculated for all possible randomisation orders. The cluster sizes for the 4 cluster design (a) are: 10, 50, 100, and 500. The cluster sizes for the 5 cluster design (b) are: 15, 25, 50, 100, and 200. (PNG 40 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Steppedwedge
 Cluster randomised trials
 Varying cluster size