Simulation methods to estimate design power: an overview for applied research
 Benjamin F Arnold†^{1}Email author,
 Daniel R Hogan†^{2},
 John M ColfordJr^{1} and
 Alan E Hubbard^{3}
DOI: 10.1186/147122881194
© Arnold et al; licensee BioMed Central Ltd. 2011
Received: 1 February 2011
Accepted: 20 June 2011
Published: 20 June 2011
Abstract
Background
Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap.
Methods
We review an approach to estimate study power for individual or clusterrandomized designs using computer simulation. This flexible approach arises naturally from the model used to derive conventional power equations, but extends those methods to accommodate arbitrarily complex designs. The method is universally applicable to a broad range of designs and outcomes, and we present the material in a way that is approachable for quantitative, applied researchers. We illustrate the method using two examples (one simple, one complex) based on sanitation and nutritional interventions to improve child growth.
Results
We first show how simulation reproduces conventional power estimates for simple randomized designs over a broad range of sample scenarios to familiarize the reader with the approach. We then demonstrate how to extend the simulation approach to more complex designs. Finally, we discuss extensions to the examples in the article, and provide computer code to efficiently run the example simulations in both R and Stata.
Conclusions
Simulation methods offer a flexible option to estimate statistical power for standard and nontraditional study designs and parameters of interest. The approach we have described is universally applicable for evaluating study designs used in epidemiologic and social science research.
Keywords
Computer Simulation Power Research Design Sample SizeBackground
Estimating the sample size and statistical power for a study is an integral part of study design and has profound consequences for the cost and statistical precision of a study. There exist analytic (closedform) power equations for simple designs such as parallel randomized trials with treatment assigned at the individual level or cluster (group) level [1]. Statisticians have also derived equations to estimate power for more complex designs, such as designs with two levels of correlation [2] or designs with two levels of correlation, multiple treatments and attrition [3]. The advantage of using an equation to estimate power for study designs is that the approach is fast and easy to implement using existing software. For this reason, power equations are used to inform most study designs. However, in our applied research we have routinely encountered study designs that do not conform to conventional power equations (e.g. multiple treatment interventions, where one treatment is deployed at the group level and a second at the individual level). In these situations, simulation techniques offer a flexible alternative that is easy to implement in modern statistical software.
Here, we provide an overview of a general method to estimate study power for randomized trials based on a simulation technique that arises naturally from the underlying data model typically assumed by power and sample size equations. The method is universally applicable to a broad range of outcomes and designs, and it easily accommodates complex design features such as different followup plans, multiple treatment interventions, or different sitespecific cluster effects. Simulation can also estimate the expected impact of deviations from optimal study implementation, such as item nonresponse and participant drop out. Statisticians have estimated design power using computer simulation for decades to benchmark analytic sample size equations [4, 5], but most published articles on estimating power using simulation have been either highly specific in application or highly technical [6–10, 10–15]. Feiveson [16] presents an applied, general overview of estimating power by simulation using Stata software, but the article is not indexed in major databases, and has only been cited twice in applied research [17, 18]. To our knowledge, this is the first published application of this approach that outlines the method using the data generating models that are both the foundation of traditional power calculations and familiar to quantitative epidemiologists. Our goal with this article is to motivate and demonstrate with concrete examples how to use simulation techniques to estimate design power in a way that quantitative, applied epidemiologists can use in practice. We believe this approach has the potential for widespread application because the setting in which we have applied it is similar to that found in many studies.
As a motivating example, we recently considered a study design to test the independent and combined effects of environmental interventions (sanitation, handwashing and water treatment) and nutrient supplementation on child growth, measured by length/height. Growth faltering in the first years of life can have profound, negative consequences on lifelong human capital [19]. Enteric infections can cause growth faltering through acute diarrhea and parasitic insults [20, 21]. There is abundant evidence that environmental interventions can reduce enteric infections [22–24] and some evidence that they improve growth [25, 26]. Interestingly, even the best nutritional interventions fail to eliminate the majority of linear growth faltering typically observed in lowincome country populations [27]. Nutritionists have hypothesized that nutrient supplementation interventions could be enhanced by complementary household environmental interventions that reduce fecal bacteria ingestion during the first years of life and potentially improve gut health [28, 29].
This design makes power calculations difficult for two reasons. First, the two treatments are deployed at different levels (community and household) and second, there are two sources of correlation in the outcome: withincommunity and withinchild. Considering the estimating approach that will be used, combined with the complex clustered nature of the datagenerating distribution, no analytical solution exists to calculate the statistical power for our hypothesis of interest given this design. Below, we introduce the simulation approach to estimate power, benchmark it against the conventional approach for a simple design, and then return to this example to demonstrate a more complex application. We conclude with a practical discussion of extensions and limitations to the approach, and in supporting files we provide example code to run our simulations in both R and Stata (see additional files 1 and 2: Rprograms.pdf, Stataprograms.pdf).
Methods
The statistical power of design is defined as the complement of the Type II error rate (1β), and is the probability of rejecting the null hypothesis when it is false. Estimating study power requires an investigator to specify a small number of nondesign parameters that describe the outcome and expected treatment effect. These parameters may include the mean, variance and expected difference between treatment and control in the outcome variable (the effect size). For cluster randomized trials or trials with repeated measures, an investigator using conventional power equations must also specify the intraclass correlation coefficient (ICC) or its variant, the coefficient of variation, which summarizes the correlation between repeated measures within an experimental unit [1, 35]. Power equations for designs with multiple levels of correlation require that investigators specify even more parameters [2, 3]. Typically, these parameters are estimated from existing data or extracted from prior published studies. The simulation approach we outline below estimates a related set of parameters and then uses those to simulate outcome data from a specified datagenerating model under a null and alternative hypothesis.
There are four parameters in the model: μ is the mean HAZ score in the control children, β _{1} is the estimated difference in HAZ comparing intervention children (A = 1) to the control children (A = 0); b _{ i } is a clusterlevel random effect and ε _{ ij } is an error term that captures individuallevel variability and measurement error. We assume that the random effect and error term are normally distributed with mean zero and known standard deviation, and are uncorrelated: b _{ i } ~ N(0, σ _{ g } ), ε _{ ij } ~ N(0, σ _{ e } ), cov(b _{ i } , ε _{ ij } ) = 0.
where μ, b _{ i } and ε _{ ij } are defined above. This is implemented in Stata using the xtmixed command, in SAS with PROC MIXED, and in R with the nlme or lme4 packages (for examples in Stata, see additional file 2: Stataprograms.pdf). The linear mixed model will provide two estimates of variability: clusterlevel variability and residual variability . Importantly, the different levels of variability can only be estimated if the dataset includes repeated observations at each level. For this example, to estimate repeated measures of Y _{ ij } are required (i.e., HAZ for multiple children within each cluster).
The simulation requires the following steps:
1. Estimate parameters from a training dataset (described above).
2. Create a population of 2,000 children (2 arms × 100 clusters/arm × 10 children/cluster), with a unique ID variable for each cluster, a unique ID variable for each child and an indicator for assigned treatment: treated (A = 1) and control (A = 0).
3. Generate a random effect for each cluster (200 total), b _{ i } , which is a draw from a normal distribution with mean 0 and SD .
4. Generate a residual error term for each child, ε _{ ij } , which is a draw from a normal distribution with mean zero and SD .
5. Simulate an outcome for each child, y _{ ij } , using equation 1.
6. Regress y _{ ij } on the treatment indicator A _{ i } , using robust sandwich standard errors [37] to account for clustering at the highest level of correlation, and store the one or twosided P value for the test .
7. Repeat steps 3 through 6 a large number of times (at least 1,000).
8. The empirical power of the design is the fraction of P values that are smaller than 0.05.
Note that in this article's examples we use generalized linear models with robust sandwich standard errors to account for correlation; we could equivalently use a generalized estimating equation (GEE) approach [38] with robust sandwich standard errors. For our specific application, generalized linear models and GEE are useful because they naturally estimate marginal parameters and require investigators to make fewer assumptions about the data generating distribution during parameter estimation than mixed effects models [39]. On a practical level, marginal models are also computationally simpler than mixed models, which is relevant when simulating the analysis thousands of times. However, an advantage of using simulation to estimate power is that investigators can use whatever estimation approach they plan to use in their actual analysis.
where μ is the logodds of the baseline probability of diarrhea, β _{1} is the log of the odds ratio of diarrhea comparing children in intervention communities (A = 1) to the children in control communities (A = 0); b _{ i } is a clusterlevel random effect. As before, we assume that the random effect is normally distributed with mean zero and known standard deviation: b _{ i } ~ N(0, σ _{ g } ).
As with continuous outcomes, the mixed model will estimate the clusterlevel variability . Given these parameters and an assumed effect size (β _{1}), power for the design with 100 clusters per arm and 10 children per cluster is estimated using a similar procedure as for the continuous outcome example above. Steps 14 and 78 remain the same. Steps 5 and 6 now involve simulating outcomes y _{ ij } for each child as a Bernoulli random variable with probability p _{ ij } (equation 4), and y _{ ij } is regressed on the treatment indicator A _{ i } , using a logistic regression with sandwich robust standard errors to obtain P values for the test that are adjusted for clustering.
Results
Comparison of Simulation and Conventional Methods
where β is the Type II error rate, Φ is the normal cumulative distribution function, c is the number of clusters per arm, n is the number of individuals per cluster, d is the mean difference between treatment groups, σ ^{2} is the variance of the outcome, Z _{ α/2 } is the quantile of the standard normal distribution associated with a Type I error rate of α, and ρ is the ICC (equation 2). To estimate parameters for the power calculation we use a training dataset from Indonesia that is part of an ongoing evaluation of the World Bank's Water and Sanitation Program's (WSP) Total Sanitation and Sanitation Marketing campaign [42]. The dataset includes length measurements from 2,090 children under age 24 months collected from 160 rural villages (clusters) in East Java at the baseline of the study. All length measurements (accurate to 0.1 cm) are standardized to HAZ using the WHO 2006 international standard [43]. The mean HAZ in the sample is 0.875 and its standard deviation is 1.384. To estimate the fraction of the variability explained at the village level, we estimate a mixed model regression of the form in equation 3. Estimates of the standard deviation for the villagelevel random effect for HAZ and the residual error are: and , respectively. This implies that the majority of the variability is at the child level, and the implied ICC is 0.482^{2}/(0.482^{2} + 1.297^{2}) = 0.12.
The Use of Simulation to Estimate Power for More Complex Designs
There are many study designs for which analytic equations are not available. An example of a nonstandard design is the two treatment factorial trial described in the introduction, in which sanitation mobilization is randomized at the community level, and LNS is provided to a random subsample of children in each village (Figure 1). We assume one child per household and that child length is measured at baseline (pretreatment) and again two years later. This poses problems for a conventional sample size equation due to treatment at multiple levels (community and child) and correlation at multiple levels (withincommunity and withinchild). The three hypotheses of interest include whether or not each individual intervention improves child HAZ scores on its own, and whether there is additional benefit to providing the interventions together (interaction or synergy).
Other parameters may be of interest beyond the treatment contrasts that the design implies, such as those estimated using population intervention models [45, 46], where the distribution of sanitation or nutrition supplementation reflects conditions of the study population at baseline (before intervention) or of a relevant, external population. Simulation could naturally accommodate such alternate parameters of interest for which no closed form power equations exist.
Discussion
We have demonstrated with practical examples how to use computer simulation to estimate study design power based on an assumed mixed model data generating distribution for the outcomes, which is identical to the distribution assumed for conventional power equations [1]. Simulation naturally extends conventional power equations for simple parallel trial designs by substituting programming and computer time for the effort it would require a statistician to derive analytical solutions (which for many designs may be impossible). These methods are universally applicable and can accommodate arbitrarily complex designs. The general approach can be extended to any datagenerating model, and statistical test of interest (see [16] for practical examples that include the Poisson distribution, Coxproportional hazards estimation and rank sum tests). Although we have focused on power, the process can be iterated to identify the minimum detectable effect for a fixed design.
Beyond the flexibility of simulation, an additional benefit of this approach to power studies is that it requires investigators to be more explicit about their analysis plan. The process ensures that the investigators specify a parameter of interest and estimation approach in advance, which may reduce the temptation to explore alternative modeling approaches in the presence of negative findings and is consistent with CONSORT guidelines [47]. Despite these potential benefits, we caution against the over interpretation of power simulation results. Like equationbased power calculations, the results are sensitive the assumptions about outcome variability and the data generating model (e.g., that random effects are drawn from a normal distribution), which are nearly always violated to some extent in practice. A simulation approach, like conventional power equations, will not inform investigators about optimal design choices under threats to validity like nonrandom losses to followup or systematic measurement error. We recommend the use of the diagnostic checks outlined in this article and suggest that simulations be audited in similar fashion to a primary analysis. Burton et al. [48] provide a general overview of how to conduct simulation studies in medical research. We also recommend that the characteristics of training datasets reflect the planned study population as closely as possible (e.g., age, geographic distribution, and measurement frequency).
Extensions to the basic methods in this article are possible. For example, we have used simulation to make midstudy design corrections, assuming lower levels of variability at followup than those observed at baseline (to reflect lower error due to improved measurement techniques). We have also used the approach to design multicountry trials where each country's cluster sizes and variance parameters differ, but a common test across countries is desired. Other extensions that involve more assumptions include more complex patterns of attrition [3], optimization using cost functions [6], or inclusion of covariates for either stratification or variance reduction [7]. For situations in which existing data are available to inform the parameters of the data generating model, one could consider adopting a Bayesian approach and simulating the posterior distribution for a design's power. This would provide a full description of estimated power, enabling the researcher to determine not just the expected power for a given design but also, for example, the probability that the power will be above an unacceptably low value.
Conclusions
The use of simulation to estimate study design power extends conventional power equations to accommodate nonstandard designs that often arise in practice. Investigators can estimate power for virtually any design as long as training datasets are available to estimate the appropriate variance parameters. The approach we have described is universally applicable for estimating the power of study designs used in epidemiologic and social science research.
Authors' information
BA is an epidemiologist in the Colford Research Group, Division of Epidemiology at the University of California, Berkeley. DH is a Ph.D. candidate in Health Policy at Harvard University. JC is Professor of Epidemiology at the University of California, Berkeley. AH is Associate Professor of Biostatistics at the University of California, Berkeley. All authors are actively involved in the design and analysis of epidemiologic field studies.
Disclaimer
This manuscript is based on research funded in part by the Bill & Melinda Gates Foundation. The findings and conclusions contained within are those of the authors and do not necessarily reflect positions or policies of the Bill & Melinda Gates Foundation. Some of the data in this article were collected through research conducted by the Water and Sanitation Program (http://www.wsp.org). For more information, please visit http://www.wsp.org/scalingupsanitation, or send an email to wsp@worldbank.org. WSP is a multidonor partnership created in 1978 and administered by the World Bank to support poor people in obtaining affordable, safe, and sustainable access to water and sanitation services. WSP's donors include Australia, Austria, Canada, Denmark, Finland, France, the Bill & Melinda Gates Foundation, Ireland, Luxembourg, Netherlands, Norway, Sweden, Switzerland, United Kingdom, United States, and the World Bank. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the Water and Sanitation Program, the World Bank and its affiliated organizations or those of the Executive Directors of the World Bank or the governments they represent.
Notes
List of abbreviations used
 HAZ:

heightforage Zscore
 ICC:

intraclass correlation coefficient
 LNS:

lipidbased nutrient supplementation
 SD:

Standard Deviation
Declarations
Acknowledgements
The authors thank the WSP program for providing the Indonesia training dataset, and Drs. Stephen Luby, Michael Kremer and Clair Null for early discussions of the twolevel design. This research was funded in part by a grant from the Bill & Melinda Gates Foundation (BA, JC, AH), by the Water and Sanitation Program at the World Bank (BA, JC), and by a T32 training grant (AI 007433) from the National Institute of Allergy and Infectious Diseases (DH).
Authors’ Affiliations
References
 Murray DM: Design and Analysis of GroupRandomized Trials. 1998, Oxford University PressGoogle Scholar
 Heo M, Leon AC: Statistical power and sample size requirements for three level hierarchical cluster randomized trials. Biometrics. 2008, 64: 12561262. 10.1111/j.15410420.2008.00993.x.View ArticlePubMedGoogle Scholar
 Roy A, Bhaumik DK, Aryal S, Gibbons RD: Sample size determination for hierarchical longitudinal designs with differential attrition rates. Biometrics. 2007, 63: 699707. 10.1111/j.15410420.2007.00769.x.View ArticlePubMedGoogle Scholar
 Pocock SJ, Simon R: Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics. 1975, 31: 103115. 10.2307/2529712.View ArticlePubMedGoogle Scholar
 Pocock SJ: Group sequential methods in the design and analysis of clinical trials. Biometrika. 1977, 64: 191199. 10.1093/biomet/64.2.191.View ArticleGoogle Scholar
 Abbas I, Rovira J, Casanovas J, Greenfield T: Optimal design of clinical trials with computer simulation based on results of earlier trials, illustrated with a lipodystrophy trial in HIV patients. J Biomed Inform. 2008, 41: 10531061. 10.1016/j.jbi.2008.04.008.View ArticlePubMedGoogle Scholar
 Gastanaga VM, McLaren CE, Delfino RJ: Power calculations for generalized linear models in observational longitudinal studies: a simulation approach in SAS. Comput Methods Programs Biomed. 2006, 84: 2733. 10.1016/j.cmpb.2006.07.011.View ArticlePubMedGoogle Scholar
 Taylor DW, Bosch EG: CTS: A clinical trials simulator. Stat Med. 1990, 9: 787801. 10.1002/sim.4780090708.View ArticlePubMedGoogle Scholar
 Eng J: Sample size estimation: a glimpse beyond simple formulas. Radiology. 2004, 230: 606612. 10.1148/radiol.2303030297.View ArticlePubMedGoogle Scholar
 Guimaraes P, Palesch Y: Power and sample size simulations for Randomized PlaytheWinner rules. Contemp Clin Trials. 2007, 28: 487499. 10.1016/j.cct.2007.01.006.View ArticlePubMedGoogle Scholar
 Sutton AJ, Cooper NJ, Jones DR, Lambert PC, Thompson JR, Abrams KR: Evidencebased sample size calculations based upon updated metaanalysis. Stat Med. 2007, 26: 24792500. 10.1002/sim.2704.View ArticlePubMedGoogle Scholar
 Reynolds R, Lambert PC, Burton PR, on Resistance Surveillance BSACEWP: Analysis, power and design of antimicrobial resistance surveillance studies, taking account of intercentre variation and turnover. J Antimicrob Chemother. 2008, 62 (Suppl 2): ii29ii39.PubMedGoogle Scholar
 Sutton AJ, Donegan S, Takwoingi Y, Garner P, Gamble C, Donald A: An encouraging assessment of methods to inform priorities for updating systematic reviews. J Clin Epidemiol. 2009, 62: 241251. 10.1016/j.jclinepi.2008.04.005.View ArticlePubMedGoogle Scholar
 Orloff J, Douglas F, Pinheiro J, Levinson S, Branson M, Chaturvedi P, Ette E, Gallo P, Hirsch G, Mehta C, Patel N, Sabir S, Springs S, Stanski D, Evers MR, Fleming E, Singh N, Tramontin T, Golub H: The future of drug development: advancing clinical trial design. Nat Rev Drug Discov. 2009, 8: 949957.PubMedGoogle Scholar
 Moineddin R, Matheson FI, Glazier RH: A simulation study of sample size for multilevel logistic regression models. BMC Med Res Methodol. 2007, 7: 3410.1186/14712288734.View ArticlePubMedPubMed CentralGoogle Scholar
 Feiveson AH: Power by simulation. Stata Journal. 2002, 2: 107124.Google Scholar
 ShumwayCook A, Silver IF, LeMier M, York S, Cummings P, Koepsell TD: Effectiveness of a communitybased multifactorial intervention on falls and fall risk factors in communityliving older adults: a randomized, controlled trial. J Gerontol A Biol Sci Med Sci. 2007, 62: 14201427.View ArticlePubMedGoogle Scholar
 Baqui AH, ElArifeen S, Darmstadt GL, Ahmed S, Williams EK, Seraji HR, Mannan I, Rahman SM, Shah R, Saha SK, Syed U, Winch PJ, Lefevre A, Santosham M, Black RE, Group PS: Effect of communitybased newborncare intervention package implemented through two servicedelivery strategies in Sylhet district, Bangladesh: a clusterrandomised controlled trial. Lancet. 2008, 371: 19361944. 10.1016/S01406736(08)608351.View ArticlePubMedGoogle Scholar
 Victora CG, Adair L, Fall C, Hallal PC, Martorell R, Richter L, Sachdev HS, Group CUS: Maternal and child undernutrition: consequences for adult health and human capital. Lancet. 2008, 371: 340357. 10.1016/S01406736(07)616924.View ArticlePubMedPubMed CentralGoogle Scholar
 Checkley W, Epstein LD, Gilman RH, Black RE, Cabrera L, Sterling CR: Effects of Cryptosporidium parvum infection in Peruvian children: growth faltering and subsequent catchup growth. Am J Epidemiol. 1998, 148: 497506.View ArticlePubMedGoogle Scholar
 Checkley W, Buckley G, Gilman RH, Assis AM, Guerrant RL, Morris SS, Mølbak K, ValentinerBranth P, Lanata CF, Black RE, Malnutrition C, Network I: Multicountry analysis of the effects of diarrhoea on childhood stunting. Int J Epidemiol. 2008, 37: 816830.View ArticlePubMedPubMed CentralGoogle Scholar
 Esrey SA, Potash JB, Roberts L, Shiff C: Effects of improved water supply and sanitation on ascariasis, diarrhoea, dracunculiasis, hookworm infection, schistosomiasis, and trachoma. Bull World Health Organ. 1991, 69: 609621.PubMedPubMed CentralGoogle Scholar
 Waddington H, Snilstveit B: Effectiveness and sustainability of water, sanitation, and hygiene interventions in combating diarrhoea. J Dev Eff. 2009, 1: 295335. 10.1080/19439340903141175.Google Scholar
 Clasen TF, Bostoen K, Schmidt WP, Boisson S, Fung ICH, Jenkins MW, Scott B, Sugden S, Cairncross S: Interventions to improve disposal of human excreta for preventing diarrhoea. Cochrane Database Syst Rev. 2010, 6:Google Scholar
 Esrey SA: Water, waste, and wellbeing: a multicountry study. Am J Epidemiol. 1996, 143: 608623.View ArticlePubMedGoogle Scholar
 Checkley W, Gilman RH, Black RE, Epstein LD, Cabrera L, Sterling CR, Moulton LH: Effect of water and sanitation on childhood health in a poor Peruvian periurban community. Lancet. 2004, 363: 112118. 10.1016/S01406736(03)152610.View ArticlePubMedGoogle Scholar
 Bhutta ZA, Ahmed T, Black RE, Cousens S, Dewey K, Giugliani E, Haider BA, Kirkwood B, Morris SS, Sachdev HPS, Shekar M, Group CUS: What works? Interventions for maternal and child undernutrition and survival. Lancet. 2008, 371: 417440. 10.1016/S01406736(07)616936.View ArticlePubMedGoogle Scholar
 Lunn PG: The impact of infection and nutrition on gut function and growth in childhood. Proc Nutr Soc. 2000, 59: 147154. 10.1017/S0029665100000173.View ArticlePubMedGoogle Scholar
 Humphrey JH: Child undernutrition, tropical enteropathy, toilets, and handwashing. Lancet. 2009, 374: 10321035. 10.1016/S01406736(09)609508.View ArticlePubMedGoogle Scholar
 Kar K: Subsidy or selfrespect? Participatory total community sanitation in Bangladesh. IDS Working Paper 184. 2003Google Scholar
 AduAfarwuah S, Lartey A, Brown KH, Zlotkin S, Briend A, Dewey KG: Randomized comparison of 3 types of micronutrient supplements for home fortification of complementary foods in Ghana: effects on growth and motor development. Am J Clin Nutr. 2007, 86: 412420.PubMedGoogle Scholar
 AduAfarwuah S, Lartey A, Brown KH, Zlotkin S, Briend A, Dewey KG: Home fortification of complementary foods with micronutrient supplements is well accepted and has positive effects on infant iron status in Ghana. Am J Clin Nutr. 2008, 87: 929938.PubMedGoogle Scholar
 Phuka JC, Maleta K, Thakwalakwa C, Cheung YB, Briend A, Manary MJ, Ashorn P: Complementary feeding with fortified spread and incidence of severe stunting in 6 to 18monthold rural Malawians. Arch Pediatr Adolesc Med. 2008, 162: 619626. 10.1001/archpedi.162.7.619.View ArticlePubMedPubMed CentralGoogle Scholar
 Phuka JC, Maleta K, Thakwalakwa C, Cheung YB, Briend A, Manary MJ, Ashorn P: Postintervention growth of Malawian children who received 12mo dietary complementation with a lipidbased nutrient supplement or maizesoy flour. Am J Clin Nutr. 2009, 89: 382390.View ArticlePubMedGoogle Scholar
 Hayes RJ, Bennett S: Simple sample size calculation for clusterrandomized trials. Int J Epidemiol. 1999, 28: 319326. 10.1093/ije/28.2.319.View ArticlePubMedGoogle Scholar
 Laird NM, Ware JH: Randomeffects models for longitudinal data. Biometrics. 1982, 38: 963974. 10.2307/2529876.View ArticlePubMedGoogle Scholar
 Freedman DA: On the socalled "Huber sandwich estimator" and "robust standard errors.". Am Stat. 2006, 60: 299302. 10.1198/000313006X152207.View ArticleGoogle Scholar
 Zeger SL, Liang KY: Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986, 42: 121130. 10.2307/2531248.View ArticlePubMedGoogle Scholar
 Hubbard AE, Ahern J, Fleischer NL, der Laan MV, Lippman SA, Jewell N, Bruckner T, Satariano WA: To GEE or not to GEE: Comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. 2010, 21: 467474. 10.1097/EDE.0b013e3181caeb90.View ArticlePubMedGoogle Scholar
 Abadie A: Bootstrap tests for distributional treatment effects in instrumental variable models. J Am Stat Assoc. 2002, 97: 284292. 10.1198/016214502753479419.View ArticleGoogle Scholar
 Sekhon JS: Multivariate and propensity score matching software with automated balance optimization: The Matching package for R. J Stat Softw. 2010,Google Scholar
 Cameron L, Shaw M: Scaling Up Rural Sanitation: Findings from the Impact Evaluation Baseline Survey in Indonesia. Water and Sanitation Program Technical Paper. 2010, The World Bank, [http://www.wsp.org/wsp/sites/wsp.org/files/publications/WSP_IndonesiaBaselineReport_TSSM.pdf]Google Scholar
 WHO: WHO Child Growth Standards: Length/heightforage, weightforage, weightforlength, weightforheight and body mass indexforage: Methods and developments. 2006Google Scholar
 Arnold BF, Khush RS, Ramaswamy P, London AG, Rajkumar P, Ramaprabha P, Durairaj N, Hubbard AE, Balakrishnan K, Colford JM: Causal inference methods to study nonrandomized, preexisting development interventions. Proc Natl Acad Sci USA. 2010, 107: 2260522610. 10.1073/pnas.1008944107.View ArticlePubMedPubMed CentralGoogle Scholar
 Hubbard AE, van der Laan MJ: Population intervention models in causal inference. Biometrika. 2008, 95: 3547. 10.1093/biomet/asm097.View ArticlePubMedPubMed CentralGoogle Scholar
 Fleischer NL, Fernald LCH, Hubbard AE: Estimating the potential impacts of intervention from observational data: methods for estimating causal attributable risk in a crosssectional analysis of depressive symptoms in Latin America. J Epidemiol Community Health. 2010, 64: 1621. 10.1136/jech.2008.085985.View ArticlePubMedGoogle Scholar
 Schulz KF, Altman DG, Moher D, Group CONSORT: CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. PLoS Med. 2010, 7:Google Scholar
 Burton A, Altman DG, Royston P, Holder RL: The design of simulation studies in medical statistics. Stat Med. 2006, 25: 42794292. 10.1002/sim.2673.View ArticlePubMedGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/11/94/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.