The control-group event rate nuisance parameter in pediatric trials was on average 10% smaller than that in adult trials. In our secondary analyses, when we considered the magnitude of the control-group effect size rather than the direction of effect, pediatric trials had an average control-group event rate that was 1.5 times higher or 1.5 times smaller than that in adult RCTs. The reason for considering also the magnitude of effects, ignoring their direction thereof, is that an important issue to address is whether nuisance parameters in pediatric studies are likely to differ from those estimated from adult studies. We could have large differences in nuisance parameters (some over-estimating, some under-estimating) that average out to no difference. By analyzing magnitudes, we are pre-supposing a difference and trying to estimate how large that difference might be.

In over 60% of meta-analyses the control-group event rates in pediatric RCTs were smaller than those in adult trials and in 36% of the meta-analyses, relative differences in control-group event rates of at least 2-folds, in either direction, were identified. Specifically, for mortality outcomes, the control-group mortality rate in pediatric trials was on average 50% lower than that in adult trials.

Large variation was also seen between pediatric and adult trials when continuous efficacy outcomes were considered. The pediatric control-group SD was on average 26% smaller than that of adult trials and in 27% of the meta-analyses the relative difference in SDs between pediatric and adult trials was at least 2-fold in either direction. Moreover, when the magnitude of the control-group SD was considered, pediatric trial SDs were at least either 1.8 times larger or 1.8 times smaller than adult trial SDs.

Large differences were seen among many studies with regards to nuisance parameters. To demonstrate how erroneous estimation of nuisance parameters can affect sample size computation substantially, we will take two examples from the included meta-analyses in this study. In the review *Antibiotics for the common cold and acute purulent rhinitis*, for the primary meta-analysis of the persisting symptoms outcome, we had an estimated CER in the adult population of 0.48. If one wishes to conduct a pediatric trial on the same topic, with a type I error probability of 0.05, 80% power, and an assumption that a 30% reduction in number of patients with persisting symptoms would be required to demonstrate a clinically relevant antibiotic effect, the required sample size for the pediatric study would be 182 patients per arm. Under the assumption of a CER of 0.048, as was actually seen in the pediatric trials, the required sample size would be over 16 times larger at 2962. To give an example using a continuous outcome, we take the review *Early emergency department treatment of acute asthma with systemic corticosteroids*. For the outcome of final PEFR, we had observed a mean SD in the adult studies of 32 L/min. Suppose we plan to conduct a pediatric trial on the same topic, using a type I error probability of 0.05, a power of 80%, and a minimal clinical important difference threshold of 15 L/min. Using the adult estimated SD of 32 L/min, we could compute that we would require 643 patients in each of two groups. If we assume an SD estimate of 4.3 L/min, as was observed in the pediatric trials, we would only require a sample of 12 patients in each group, a sample size that is less than 2% of the originally computed sample size. It is clear from these examples that the erroneous estimations of these nuisance parameters can have important implications in the sample size computations, which can lead to either inappropriately powered studies that would not be able to answer the clinical question, or, on the other hand, to unnecessary waste of valuable clinical and financial resources. A third unwanted consequence might be that a proposed trial is not conducted because the erroneous estimate for the sample size is too large to be feasible.

We did observe a trend in both binary and continuous outcome data for pediatric RCTs to have smaller values of nuisance parameters (both CERs and CE-SDs) than their adult counterparts. Thus, when one does use these parameters from adult studies as surrogate for pediatric studies, the nuisance parameter is more likely to be overestimated than underestimated. This relationship has been well documented and graphed^{2}. In the case of continuous data, an overestimation of the SD will always result in an overestimation of the sample size. The situation for binary data is more nuanced, as the sample size will depend upon the ratio of the CER and the treatment-group event rate (the closer the ratio is to 1, the larger the required sample size), so an underestimation of the CER could lead to either an under- or overestimated sample size. For example, if a pediatric population had an actual CER of 0.3 with a treatment-group event rate of 0.2, then underestimating the CER (say as 0.25) would result in a larger than required sample. However; if the treatment-group event rate was 0.4, then this underestimate of the CER would result in a sample that was too small.

The discrepancies in the nuisance parameters between pediatric and adult trials were more prominent with mortality outcomes. In 86% of meta-analyses with mortality outcomes, the mortality CER in adult trials was larger than that in pediatric trials. On average, the control-group mortality rates in adult trials were two times larger than in pediatric trials. Mortality seems to be an outcome where extrapolation of adult control-group event rates for the estimation of pediatric trial sample sizes may give inaccurate results.

We should acknowledge some study limitations. Traditional meta-analysis of standard deviations was not feasible in the analysis of continuous outcomes since the systematic reviews did not provide us enough information to ascertain variances around these nuisance parameters. Meta-analyses were done on a variety of outcomes, and thus the standard deviations were all reported in different units, and therefore not comparable across meta-analyses without standardization. In both the analyses of binary and continuous outcomes we observed considerable heterogeneity in nuisance parameters, not only between meta-analyses but also within them. We assumed that studies included within the same meta-analysis of a Cochrane review would have populations sufficiently similar to use them to impute nuisance parameters. However, extremely high between-study heterogeneity (I^{2} > 80%) was seen in more than half of the meta-analyses, which implies that even within studies of the same age-group (i.e. adult or pediatric) we cannot expect nuisance parameters to routinely be similar. This suggests that not only should we be wary of extrapolating nuisance parameters for pediatric studies from adult studies, but we should be almost equally wary of extrapolating them from other pediatric studies.

With these limitations in mind and given the results we have seen here, it would be interesting to do a further and more refined analysis as to which factors may lead to better concordance between the nuisance parameters of pediatric and adult studies. This would be a difficult endeavor, however, since these factors would likely be specific to a subject area, and not necessarily generalizable. Analysis would then have to be limited to those areas where there are enough studies to do it properly.