Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting

Background The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. Methods We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. Results The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Conclusions Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.


Background
Childhood obesity is more and more becoming a problem of epidemic dimensions in modern societies [1,2]. The body mass index (BMI) has proved to be a reliable measure to assess childhood obesity and can also be seen as an indicator for obesity in adulthood [3,4]. Therefore, the prediction of future BMI values for individual children may be used as a warning bell for clinicians, parents and children. Predicting future BMI values raises awareness for problems to come -as long as they are still avoidable -and can thus lower the risk of later obesity.
In this setting, we focus on obtaining reliable predictions for future BMI values of children. Prediction intervals (PIs) offer information on the expected variability by providing not only a point prediction but a covariate-specific interval which covers the future BMI for this individual child with high probability. We construct child-specific prediction intervals for the LISA study, a recent German birth cohort study with 2007 children [5]. Data include up to ten BMI values per child from birth until the age of 10, as well as variables that are discussed to be potential early childhood risk factors for later obesity, such as breastfeeding, maternal BMI gain and smoking during pregnancy, parental overweight, socioeconomic factors, and weight gain during the first two years [6,7]. In our analysis, we first construct PIs for the children's BMI at approximately the age of four, relying on data available for the children at the age of two. In a second step, we explore the longitudinal structure of the present data and construct PIs for child-specific BMI patterns from two up to ten years.
Predicting child-specific BMI values is a great challenge from two different perspectives: From the epidemiological perspective, it is difficult to predict BMI values as they depend on factors which are hard to measure; such as physical activity, healthy nutrition, and lifestyle habits. From the statistical point of view, the distribution of BMI values is typically skewed and the degree of skewness depends on children's age, see e.g. Beyerlein et al. [8], which makes standard strategies to construct PIs relying on distributional and homoscedasticity assumptions problematic.
In these standard parametric approaches, first, a point prediction for the future BMI value is estimated based on mean regression models with Gaussian distributed errors, then a symmetric PI is constructed around that point based on distributional assumptions. To predict BMI values, however, these standard parametric approaches are problematic due to two reasons: not only the model assumptions for the point prediction might not be fulfilled but also the length of the PI depends on an assumed fixed variance which does not reflect the reality of an age-specific BMI skewness [9]. One possibility to overcome these problems would be the usage of more sophisticated parametric approaches, as for example generalized additive models for location scale and shape ("GAMLSS" [10]). GAMLSS are modelling up to four parameters of the conditional response's distribution and could therefore take age-specific skewness into account. This model class has already been used for constructing PIs in combination with boosting [11]. However, the construction of PIs based on GAMLSS depends totally on the assumed distribution and the interpretation of covariate effects with respect to the interval borders is not straightforward.
We avoid making distributional assumptions here by developing a new approach to constructing non-parametric prediction intervals based on quantile boosting. Instead of constructing intervals around a point prediction, the new approach directly models the interval borders by additive quantile regression [12]. The borders are fitted as BMI quantiles conditional on the child-specific covariate combination. We use quantile boosting for the estimation [13], which offers the advantage of flexible and inter-pretable covariate effects and an intrinsic variable selection property (which is in particular useful in high-dimensional data settings). The size of the resulting PIs is not fixed but depends on covariates -in longitudinal settings it might also depend on childspecific effects (corresponding to "random effects" in linear and additive mixed models).
During the work on this paper, we found a severe pitfall in the correct validation of prediction intervals. The appropriate measure for validating PIs is conditional coverage, not sample coverage (although being more intuitive) which makes it unfeasible in almost any data setting to evaluate the intervals in practice. The only way to demonstrate the correctness of PIs is therefore based on an empirical evaluation with simulated data. Thus, in a first step we evaluate the correctness of our approach in a set of simulation studies before applying quantile boosting to predict future BMI values.

Prediction intervals by conditional quantiles
The idea of using quantile regression to construct prediction intervals for new observations was presented in [12]. In contrast to standard regression analysis, quantile regression -thoroughly described in [14] -does not estimate the conditional expectation E(Y|X = x) = μ(x) of a random variable Y but the conditional quantile function Q τ (Y|X = x) = q τ (x) for a given τ (0, 1) and a possible set of covariates X = x. Following the definition of quantiles as inverse of the cumulative distribution function, , the probability of the response Y being smaller than q τ (x) is τ: The goal is therefore to estimate the conditional quantile functionq τ (x) by quantile regression based on a training sample (y 1 , x 1 ), ..., (y n , x n ). For a new observation, the specific covariate combination x new is plugged intoq τ (x new ). A prediction interval for y new is then estimated by evaluatingq τ (x new ) at τ 1 = α 2 and τ 2 = 1 − α 2 , leading to The resulting PI should cover a new observation y new with probability (1 -a) while its length depends on x new . There might be combinations of co-variates that allow for a very precise prediction for y new resulting in a narrow interval, whereas wide intervals imply that for a given x new the prediction is more inaccurate. As the estimatesq τ (x) depend on a training sample (y 1 , x 1 ), ..., (y n , x n ), which are realizations of random variables Y and X, the boundaries of the intervals itself can be seen as random variables. This is an analogy to confidence intervals, which usually should cover unknown but fixed parameters. The boundaries of confidence intervals depend on the underlying sample and thus differ from sample to sample. Yet, for every sample, they cover the true parameter with a probability of 1 -a. Prediction intervals are constructed in the same way, but they cover a future realization of a random variable, which itself is random. The result is that the length of a prediction interval for y new is always larger than the length of a confidence interval for the expected mean of Y. Prediction intervals do not only take into account the sampling error made by the estimation based on a sample, but also the unexplained variability of Y given X = x. In conclusion, as long as Y|X = x is not deterministic, the length of the corresponding PI -in contrast to a confidence interval -does not reduce to 0, not even for infinitely large sample sizes.

Conditional coverage vs. sample coverage
We stated that a correctly specified prediction interval PI (1 -a) (x new ) covers a new observation y new with probability π : = (1 -a). To validate a method for fitting PIs, we obviously need a certain amount of new observations: From a single observation (y new , x new ) it is impossible to verify if PI (1 -a) (x new ) is correct. It either covers y new or not -both events do not prove anything, at least if a is not 0. Yet, if we have a certain amount of new observations, there still exist two different interpretations for the coverage probability π:

Sample coverage
For any new sample y = (y 1 ,...,y n ) ⊤ and corresponding covariates x = (x 1 , ...,x n ) ⊤ about (1 -a) -100% of the new sample y will be covered by the n prediction intervals PI(x 1 ),...,PI(x n ). The coverage refers to the whole sample. To evaluate sample coverage in practice, one estimates the coverage probability by averaging over different PIs: where I{·} is an indicator function.

Conditional coverage
For any x new and a corresponding sample (y 1 , x new ),..., (y n , x new ), about (1 -a) · 100% of the observations with the particular covariate combination x new will be covered by the prediction interval PI(x new ). The coverage therefore refers to observations belonging to this x new . To evaluate conditional coverage in practice, one estimates the conditional coverage probability by averaging over different new observations for one PI: Although sample coverage is the more intuitive interpretation of PIs, it is obvious that conditional coverage reflects in a better way what we really expect from a PI. For example, after constructing a 95% PI for the BMI of a child at the age of four, given all information available from the child as a two-year-old, we particularly expect the future BMI of this child with its exact measures to be covered with a probability of 95%. In frequentistic language, the BMI of 95% of children with exactly the same measures should be covered by the interval. The coverage should hold for every child and every possible combination of covariates not only on average for all children.
Hence, to show the correctness of PIs it is particularly not enough to show that PIs cover the right amount of observations on average from a new sample. This is further illustrated by a small example in Figure 1. For a simple univariate regression setting, two different prediction intervals were fitted: Both hold the sample coverage, but only one holds the conditional coverage. The first one, represented by the blue lines in Figure 1, relies on conditional quantiles fitted by linear quantile regression. It is an adequate interval for every possible x, it holds the conditional coverage and it adapts to the heteroscedasticity found in the data. The second one, drawn by red lines, is a "naive" interval, depending on the empirical quantiles of the response variable in the training sample. It does not take into account the information provided by x and is not adequate regarding the conditional coverage for any x. However, it holds the sample coverage. This further emphasizes the need to be aware of the different concepts of coverage probability and to clarify the precise aims of a PI analysis beforehand. This finding leads to a severe problem, at least for multivariate prediction settings with several continuous covariates: For every combination of covariates only one response observation will be available under almost any practical circumstances. We will only find one child for each combination of covariates -not even twins will have the exact same measures -this is obviously not enough to verify the correct conditional coverage of a fitted PI.
Therefore, to demonstrate the correctness of a method fitting accurate prediction intervals, it is necessary to use artificial simulated data sets to evaluate the conditional coverage in (4) for a selected set of covariate combinations. Here, we will conduct a simulation study to evaluate if quantile boosting is a correct method to fit accurate conditional prediction intervals in potentially high-dimensional data settings before we apply this approach to predict future BMI values of children.

Quantile boosting
Recall that our aim is to construct PIs based on conditional quantiles as given in (2). In our approach, we determine conditional quantiles by additive quantile regression. For a fixed quantile τ (0,1), the conditional quantile function is expressed by an additive predictor as follows: The index i = 1, ...,n, denotes the individual, and q τ (x i ) stands for the τ-quantile of the response y i conditional on its specific covariate vector x i = (x i1 , ..., x ip ) ⊤ . The quantile-specific additive predictor h τi is composed of an intercept b τ0 and a sum of different effects of p covariates x i = (x i1 , ..., x ip ) ⊤ on the quantile function. The functions f τ1 , ..., f τp comprise linear effects, i.e. f τj (x ij ) = b τj x ij , as well as non-linear effects whose functional form is not specified in advance. In fact, the additive predictor could also contain a wide variety of additional covariate effects, e.g. varying coefficient terms or spatial effects, as described in [13]. Note that contrary to classical regression, there is no specific distributional assumption for the response in (5). The only restriction is that the response must be continuous.
In general, the estimation of unknown parameters in quantile regression can be achieved by minimizing the empirical risk where the check function r τ is the appropriate loss function for fitting quantiles and can be written as: Standard approaches for solving the optimization problem in (6) rely on linear programming [14,15]. Quantile regression forest [12] is a recent approach to conducting quantile regression and adapts random forest [16] to estimate the whole conditional distribution function. Since this approach is based on regression trees, the resulting estimatesq τ (x) -in contrast to the additive modelling approach presented here -can only be described as black-box predictions. Nevertheless, we will use quantile regression forest as benchmark in our simulation study.
We will use gradient boosting for the estimation of the additive quantile regression model in (5), and call our approach quantile boosting in the following. Quantile boosting [13] was introduced as a method to flexibly estimate additive quantile regression models. It is an adaptation of component-wise functional gradient descent boosting [17] and aims at minimizing an empirical risk criterion, as given in (6). In case of quantile regression, the appropriate loss is the check function (7).
The minimization of (6) is achieved by stepwise updating the predictor function h τ . Therefore, base-learners are used, i.e. simple univariate regression models fitting the negative gradient of the empirical loss (7). The base-learners play a key role in the algorithm, since they define the kind of effects between each covariate and response. In our approach, we use simple linear models to represent linear covariate effects and penalized regression splines to represent non-linear effects. The advantage of quantile boosting is that the resulting predictor h τ is strictly additive and interpretable, following the additive quantile regression model in (5).
In detail, the boosting procedure works as follows: For each covariate, one specific base-learner is defined and in every boosting step the algorithm updates only the covariate with the best performing base-learner. This way, the algorithm is descending the loss by searching in the function space represented by the base-learners. If the algorithm is stopped before every base learner was at least once updated ("early stopping"), less important covariates will never have been updated during the boosting process and are effectively excluded from the final model. Thus, boosting comes along with an inherent variable selection property and produces sparse models in potentially high-dimensional settings. It even allows for candidate models that contain more covariates than observations.
Regarding prediction, early stopping is a desirable property, since it yields shrunk effect estimates. Shrinkage of effect estimates is a widely established method in statistical modelling [18,19] and tends to produce a more stable solution leading to an improved prediction accuracy of the model [20][21][22], even though an increase of the model bias (towards underlying data) has to be accepted. The primary aim is not to minimize the loss in the underlying training sample best -resulting in a small model bias -but to get accurate predictions with a small variance for new data. Since our work focuses on predictions for future BMI values, the shrinkage effect is of high relevance in our approach and is promising in order to provide accurate PIs.
A crucial parameter that has to be tuned with care during the boosting process is the number of stopping iterations. It should be tuned regarding the empirical loss in (6) on a test data sample, or -in case that no additional data is available -by applying cross-validation techniques or bootstrapping on the training data [19,23]. Quantile boosting is implemented within the R [24] addon package mboost [25,26].

Simulation study
We have already mentioned that the correct empirical validation of PIs should be based on conditional coverage. Since it is almost impossible to evaluate the conditional coverage in practical data analyses, we carried out a simulation study to provide some kind of proof that PIs fitted by quantile boosting are provided with correct conditional coverage. As benchmark, we used quantile regression forest [12] for which an implementation is available in the R add-on package quantregForest [12,27].
Our simulation study aims at answering the following questions: 1. Are the proposed PIs able to cover future observations with a predefined conditional coverage probability? 2. Is quantile boosting able to identify relevant informative covariates, also in high-dimensional settings, e.g. data sets with a potentially large number of covariates?
To investigate these questions, we generated artificial data from two different settings -a linear setup containing only covariates with linear effects on the response: and a non-linear setup including also non-linear effects: The first lines of the model formulas represent the contribution of the covariates x 1 , ..., x p on the expected mean of the response y, whereas the bottom line specifies their contribution to heteroscedasticity. Both settings include only four informative covariates x 1 ,...,x 4 . The error terms ε i were drawn independent and identically from a standard normal distribution, i.e. ε i~N (0,1), whereas the covariates were sampled independent and identically from a continuous uniform distribution, i.e. x i1 ,..., x ip~U (0,1) for the linear setup and x i1 , ...,x ip U(0, 3) for the non-linear setup. To evaluate the ability of quantile boosting to select relevant covariates, we generated data for both settings once in a low-dimensional scenario with p = 10 and once in a high-dimensional scenario with p = 500 which, in conclusion, included 496 non-informative covariates.
For each setting, we constructed two-sided 95% PIs in the following way: We generated in each simulation run a training sample (y 1 , x 1 ), ..., (y n , x n ), with n = 2000 observations and an additional data set with 5000 observations to select the optimal number of stopping iterations for quantile boosting. Then, we fitted additive quantile regression models and quantile regression forest for τ 1 = 0.025 and τ 2 = 0.975, including all p covariates.
In order to evaluate the conditional coverage of the resulting PIs, we pre-selected five fixed covari-ate combinations x t with t = 1,..., 5, as test points and thereby tried to cover the x-space. For each of the five test points x t , we sampled 10000 test observations y test |x t which served as "future" observations. In analogy to (4), we then estimated the conditional coverage of the resulting PIs separately for each model and test point bŷ By designing our simulation in this way, we were able to evaluate the conditional coverage of the constructed PIs and avoided the pitfall of averaging over a new sample, corresponding to the sample coverage.

Predicting childhood BMI Data
Data contains observations from a prospective longitudinal birth cohort study (called "LISA study", [5]), including newborns between 11/1997 and 01/1999 from four German cities. Our aim is to predict future BMI values for children relying on the data available when they were two years old. Originally, the study included 3097 healthy children -of whom 2007 are complete cases in the sense that the necessary covariates at the age of two are all available for our analysis and at least one future BMI value until the age of ten is recorded. Continuous covariates from early childhood are the BMI of the child at birth (cBMI0) and as a two-year-old (cBMI2), the exact age of the child at the future measurement (cAge), the BMI of the mother at the beginning of pregnancy (mBMI) and the following BMI gain during pregnancy (mDiffBMI). The considered binary categorical covariates are the sex of the child (cSex), the area the child is living in (cArea -rural or urban), exclusive breastfeeding until the age of four months (cBreast), maternal smoking during pregnancy (mSmoke) andwith four covariate levels -the maternal level of education (mEdu -increasing by level). As potential response variables, the data comprises BMI values at approximately the age of four (cBMI4), six (cBMI6) and ten (cBMI10). See [9] for further description of the LISA study.

Cross-sectional analysis
The aim of our first analysis was to construct prediction intervals for future BMI values of individual children at approximately the age of four, relying on all information available from the child as a two-year-old. Therefore, we constructed two-sided 95% PIs with the following additive quantile regression model: Here, q τ (x i ) denotes the τ-quantile of the response cBMI4 for child i with covariate combination x i . It will represent the borders of child-specific PIs for τ 1 = 0.025 and τ 2 = 0.975. We included a nonlinear effect for cBMI2 and linear effects for all other covariates in our candidate model.
As a benchmark, we compared PIs resulting from our approach to black box estimates for PI 0.95 (x new ) from quantile regression forest. Yet, it was impossible to evaluate the conditional coverage of the PIs in our analysis as already discussed above. As a consequence, we focused on the empirical loss (6) for model comparison, which can be seen as a reliable measure not to validate but to compare algorithms fitting PIs by quantile regression. Thus, we determined the empirical loss for the two quantiles and both models in a 10-fold cross-validation analysis. The optimal stopping iteration for quantile boosting was selected by 25-fold bootstrapping on each of the 10 training data sets separately. Goodness-of-fit of the chosen models was assessed by a recent approach presented in Wei and He [28], originally developed for conditional growth charts. We generated test samples from the conditional model distribution and compared them to the observed empirical distribution of the response, see [28] for details.

Longitudinal analysis
In a second step, we tried to explore the longitudinal structure of the data at hand and constructed prediction intervals for BMI patterns of children until the age often, relying again on all information of the child as a two-year-old. As response, we now considered individual BMI values cBMI it for child i at three different time points t {1,2,3} corresponding to the age of approximately four, six and ten. Note that related applications with similar longitudinal settings are the estimation of reference growth charts [29] and conditional growth charts [28]. We fitted the following additive quantile regression model for τ 1 = 0.025 and τ 2 = 0.975: This model contains child and quantile specific intercept b τ1i and slope b τ2i to account for the correlation between repeated measurements of the same child, which typically occurs in longitudinal data. These individual-specific "random" effects are estimated by a specially designed base-learner employing L 1 regularization methods [30]. In connection with L 1 regularization, quantile regression for longitudinal data was first proposed by Koenker [31]. Here, we also include individual-specific slopes and smooth non-linear effects in the flexible predictor.
Contrary to the cross-sectional analysis, cAge is included and differs for different time points. The nonlinear fixed effect f 1τ describes an overall BMI pattern depending on age which is valid for all children, whereas the random effects b τ2i express child-specific linear deviations from this overall BMI pattern. All other covariates are time-constant. Again, we used the method presented in [28] to assess goodness-of-fit, in this case separately for the three different time points.
The optimal stopping iteration for the boosting algorithm was selected by applying subject-wise bootstrap. For this setting, it was impossible to compare quantile boosting to the benchmark algorithm, since quantile regression forest cannot account for a longitudinal data structure. Thus, we only calculated the PIs for BMI patterns of "new" children by ten-fold cross validation. To determine child-specfic PIs, for those children the childspecific intercepts and slopes were set to zero, which corresponds to their expected mean.

Results
Simulation study Table 1 shows the resulting mean conditional coverage from 100 simulation runs for 95% PIs. Quantile boosting clearly outperforms quantile regression forest for both setups. Only for the borders of the x-space (x 4 , x 5 ) in the high-dimensional scenario, the PIs fail to cover 95% of "future" observations. Figure 2 further illustrates the concept of conditional coverage. The boxplots display the empirical distribution of the "future" observations for each of the test points x 1 , ..., x 5 . The solid black lines are the true conditional quantiles and represent the true borders of a 95% PI for each test point. The colored lines show the resulting estimated PI borders from 100 simulation runs of the two algorithms (quantile boosting on the left in blue, quantile regression forest on the right in red). As displayed by Figure 2 (non-linear setup, low-dimensional scenario), quantile boosting seems to work best in the center of the x-space, which is represented by test point For the other test points, the standard errors for the estimated quantiles get larger, yielding less accurate PIs. Quantile regression forest have more problems in fitting the correct conditional quantiles, which further explains why the resulting PIs fail to achieve the conditional coverage in Table 1. Figure 3 displays the resulting effect estimates from 100 simulation runs of quantile boosting in the linear high-dimensional setup. The blue lines represent the quantile-specific true coefficients, which combine the effect of the covariate on the expected mean as well as on heteroscedasticity. The effect estimates corresponding to the non-informative covariates are combined in the rightmost boxplot. Therefore, Figure 3 illustrates the ability of the algorithm to select relevant covariates while intrinsically incorporating shrinkage of effect estimates.
In conclusion, PIs fitted by quantile boosting seem to cover future observations with the predefined coverage probability, conditional on the test points. The best results can be observed in the center of the x-grid. Quantile boosting outperforms the benchmark in both setups -linear and nonlinear setup -and for both scenarios -for the low-dimensional as well as for the highdimensional scenario. However, the evaluated simulation setups did not include interaction terms -which could have favored quantile regression forest. For our data analysis, we can rely on the result that PIs constructed by quantile regression lead to correct conditional coverage probabilities. Furthermore, we can benefit from quantile boosting since the algorithm is able to select relevant covariates and yields sparse models in highdimensional scenarios.

Predicting childhood BMI Data
To get a first impression of the data at hand, Figure 4 shows the empirical BMI distribution depending on age. It illustrates the age-specific skewness of the BMI distribution, beginning somewhere after the age of six, as well as the longitudinal data structure with repeated BMI observations per child at birth and around the age of 2, 4, 6, and 10.

Cross-sectional analysis
In our first cross-sectional analysis, we ignore the longitudinal character of the data and fit 95% PIs for the BMI around the age of four with data available from the children as two-year-olds, by both quantile boosting and quantile regression forest. Figure 5 shows the resulting PIs for six randomly chosen children (that were left out in the fitting process) emphasizing that length and level of the resulting PIs are in fact child-specific. The mean length of the PIs for all children is 3.55 kg/m 2 , while the lengths of the PIs range from 2.81 kg/m 2 to 5.62 kg/m 2 . Thus, the BMI prediction for some children is more precise than for others. Table 2 contains the estimated covariate effects for quantile boosting. It can be observed that not all covariates are selected during the boosting process, which again reflects the variable selection property of quantile boosting. For the 2.5% BMI quan-tile, cSex, cBreast and mEdu are excluded from the model, whereas for the 97.5% BMI quantile cBMI0, mDiffBMI and cBreast are excluded. For both quantiles, mBMI and cArea are chosen with similar effects. This can be interpreted as follows with respect to the PIs: Both PI borders for a child from a urban area, for example, are shifted by -0.029 kg/m 2 and -0.075 kg/m 2 compared to the PI borders for a child from an rural area. Interestingly, the effect of mSmoke has different signs for different quantiles, meaning that the effect of maternal smoking during pregnancy seems to be negative for lower BMI quantiles and positive for upper BMI quantiles. This results in a wider PI for a child whose mother smoked during pregnancy. The estimated non-linear effects of cBMI2 are presented in the Additional file 1, Figure S1.
At first glance, the PIs resulting from quantile regression forest -the red-colored PIs in Figure 5 -seem to be very similar to those from quantile boosting: the mean length is 3.48 kg/m 2 , ranging from 2.46 kg/m 2 to 6.62 kg/m 2 . We also conducted a quantitative comparison between the algorithms by 10-fold cross-validation. Figure 6 displays the empirical loss distributions of the estimated quantiles on children iteratively left out in the fitting process. These results suggest that quantile boosting outperforms quantile regression forest with respect to accuracy in the estimation of the PI borderŝ q 0.025 (x new ) andq 0.975 (x new ) . This result is further supported by the goodness-of-fit diagnostic plots in the additional files (Additional File 1, Figure S2). The plots refer to the underlying models which were used to estimate the borders of the PIs and show a slightly improved goodness-of-fit for quantile boosting. Even though we cannot check the conditional coverage of our PIs here, we rely on the findings from the simulation study and conclude that quantile boosting does not only provide PIs with interpretable additive effects, but also yields more accurate predictions than quantile regression forest.

Longitudinal analysis
In a second step, we used all information of the children at the age of two to predict their BMI patterns until the age of ten. Therefore, we included child-specific intercepts and slopes in the quantile boosting approach. Figure 7 shows the resulting PIs for six randomly chosen children.
Again, level and length of the PIs are child-specific, but the lengths of PIs at the age of ten are larger than the lengths at earlier time points. This seems to be realistic as we try to predict BMI values of children at the age of ten, only relying on information available as twoyear-olds. The mean length of the PIs of all children is 4.78 kg/m 2 , ranging from 2.52 kg/m 2 to 11.28 kg/m 2 . The increased length of the intervals again results from the children getting older. This result is further emphasized by the estimated non-linear effects of cAge (presented as Figure S3 in the Additional file 1). The estimated effect for the 97.5% BMI quantile, i.e. the upper border of the PIs, is strongly increasing after the age of six, whereas the effect for the lower border remains constant. This result also corresponds to the empirical age-specific BMI distribution observed in Figure 4. Apparently the resulting PIs reflect the risk of childhood obesity kicking-in somewhere after the age of six.
Effect estimates for other covariates are included in Table 2. The pattern of selected covariates roughly Intc. x 1 x 2 x 3 x 4 non−inf. corresponds to the cross-sectional analysis. Even though the effect signs and sizes show minor differences for some covariates, such as mEdu, the other effects on the PI borders remain stable across analyses, including the non-linear effect of cBMI2 (Additional file 1, Figure S3), confirming the presence of these effects. Diagnostic plots (Additional file 1, Figure S4) show a satisfying goodness-of-fit of the underlying models for the ages of four and six. Poorer results are obtained for the age of ten, which reflects the limited information available for this long-term prediction.

Discussion
The aim of the present work was to construct prediction intervals for future BMI values of individual children. We pursued this aim by applying quantile boosting -a boosting approach estimating additive quantile regression models -to directly model the borders of the PIs. As a result, we do not rely on any distributional assumptions.
A main advantage of PIs fitted by quantile boosting is that we can directly interpret the estimated effects with cAge  Figure S1 and Figure S3 in Additional file 1. regard to the interval borders. From the results of the cross-sectional analysis, for example, it follows that children whose mothers smoked during pregnancy have larger estimated PIs than other children. These conclusions could not have been drawn from quantile regression forest, an alternative approach to fitting nonparametric PIs, which leads to black box estimates.
The results of our simulation study suggest that quantile boosting outperforms quantile regression forest with respect to conditional coverage -which in our view is the key performance measure to evaluate PIs correctly. However, it is generally not possible to check conditional coverage in practical applications. In our data analyses, we thus had to rely on the findings from the simulation study. These findings were supported by the results of a formal comparison of empirical risks in the cross-sectional analysis, suggesting that quantile boosting provided more accurate predictions than quantile regression forest.
We could also benefit from the inherent shrinkage and variable selection properties of boosting in our analysis. Only a limited number of covariates was selected by the boosting algorithm, leading to sparse models. Note that it would even be possible to apply quantile  boosting to data sets with more co-variates than observations, i.e., in high-dimensional data settings. A limitation coming along with the shrinkage property is the absence of standard errors estimations for the effect estimates. As a result, we cannot compute statistical tests regarding the effects of covariates, e.g. report information about their significance. Although researchers in practice often feel uncomfortable in the absence of pvalues, we think that this limitation is acceptable here, as the focus is directed towards getting reliable predictions.
The resulting PIs of the longitudinal analysis emphasize further strengths of quantile boosting for fitting PIs. Relying on data available of the children as two-yearolds, we can fit accurate and child-specific PIs not only for BMI values around the age of four, but also for BMI patterns until the age of ten. Quantile boosting allows to explore longitudinal data structures by including individual-specific "random" effects, emphasizing the childspecific character for the resulting PIs. Here, we could observe that the lengths of the intervals strongly increase with the age of the children. From a methodological view, this absolutely reflects what we should expect from a valid method to fit PIs: The intervals do what they should, in reporting the increasing uncertainty in the prediction of BMI values until the age of ten based only on very limited information from the children in early childhood.
The lack of covariates explaining physical activity, nutrition and lifestyle habits of the children is of course a further limitation of the presented work. It would be interesting to see if this information could help for getting more precise predictions as presented in this paper.

Conclusion
In conclusion, we think that quantile boosting is a promising approach to construct prediction intervals with correct conditional coverage in a non-parametric way. It can be applied to longitudinal settings and is therefore in particular suitable for the prediction of BMI patterns or similar data, where assumptions of standard parametric approaches are not fulfilled.

Additional material
Additional file 1: Additional figures. Document containing following additional figures not included in the main manuscript: Figure S1 Resulting estimates for the the non-linear partial effect of the BMI at the age of two on the PI for childhood BMI around the age of four. The lines represent the partial effect on q 0:025 and q 0:975 respectively as the borders of a 95% PI in the cross-sectional analysis. Figure S2 Goodness-of-fit diagnostic plots according to [28] for the underlying models from the cross-sectional analysis (BMI of children at the age of four). Test observations were simulated from the conditional model distribution and compared to the empirical distribution of the response observations (left plot). The right plot shows the standardized deviation of quantiles from the simulated conditional distribution to the real ones. Blue points and bars refer to the results of quantile boosting whereas red points and bars refer to those from quantile regression forest. Figure S3 Resulting estimates for the the non-linear partial effect of the BMI at the age of two (left) and the age of the child (right) on the PIs for childhood BMI patterns. The lines represent the partial effect on q 0:025 and q 0:975 respectively as the borders of a 95% PI in the longitudinal analysis. Figure  S4 Goodness-of-fit diagnostic plots according to [28] for the underlying models from the longitudinal analysis (BMI of children at the ages of four, six and ten). Separately for the three different time points, test observations were simulated from the conditional model distribution and compared to the empirical distribution of the response observations in QQ-plots (first row). Barplots (second row) show the standardized deviation of quantiles from the simulated conditional distribution to the real ones.