Research article  Open  Open Peer Review  Published:
Transformations of summary statistics as input in metaanalysis for linear doseresponse models on a logarithmic scale: a methodology developed within EURRECA
BMC Medical Research Methodologyvolume 12, Article number: 57 (2012)
Abstract
Background
To derive micronutrient recommendations in a scientifically sound way, it is important to obtain and analyse all published information on the association between micronutrient intake and biochemical proxies for micronutrient status using a systematic approach. Therefore, it is important to incorporate information from randomized controlled trials as well as observational studies as both of these provide information on the association. However, original research papers present their data in various ways.
Methods
This paper presents a methodology to obtain an estimate of the dose–response curve, assuming a bivariate normal linear model on the logarithmic scale, incorporating a range of transformations of the original reported data.
Results
The simulation study, conducted to validate the methodology, shows that there is no bias in the transformations. Furthermore, it is shown that when the original studies report the mean and standard deviation or the geometric mean and confidence interval the results are less variable compared to when the median with IQR or range is reported in the original study.
Conclusions
The presented methodology with transformations for various reported data provides a valid way to estimate the dose–response curve for micronutrient intake and status using both randomized controlled trials and observational studies.
Background
Metaanalysis of the association between micronutrient intake and biochemical proxies for micronutrient status or function is needed when setting micronutrient recommendations. Information on this association may come from randomized controlled trials as well as from observational studies. In a randomized trial subjects are randomized to receive either the intervention treatment or the control treatment, and a metaanalysis of such studies will usually provide a mean difference in micronutrient status between placebo and intervention groups, answering the question whether the biochemical status marker responds to the dietary intake of a micronutrient [1–3]. However, this analysis does not provide an estimate of the slope of the dose–response relationship. On the other hand, a metaanalysis of observational studies provides an estimate of the slope of the dose–response relation, but observational studies are hampered by for instance measurement error in the intake estimates, which causes bias in the reported association [4–6].
Ideally, information from observational studies and randomized controlled trials should be compared or even combined in a single metaanalysis to ensure that all reported information is taken into account over a broad range of intake. This requires that the summary statistics reported in individual studies are transformed into estimates of the dose–response relation. Since both intake and status are continuous variables, this estimate is actually an estimate of the regression coefficient of the linear regression of micronutrient status on micronutrient intake. The individual estimates of the dose–response regression coefficient may then be combined in a metaanalysis.
The statistical combination of study results may be complicated by the variety of ways that individual studies report the summary statistics. The results from randomized controlled trials as well as the baseline summary statistics of micronutrient intake and status may be reported as means, medians or geometric means. Variability is often reported as standard deviations, standard errors, interquartile ranges (IQR), ranges or confidence intervals (CI). In observational studies the relation between intake and status can be reported as a Pearson correlation coefficient, a Spearman rank correlation coefficient or a regression coefficient. In addition, either the intake variable or the status variable or both could have been logarithmically transformed before the correlation or association was calculated. All these different ways of reporting need to be standardized before metaanalysis is even possible.
This paper gives an overview of transformation methods to algebraically derive an estimate from each study of the regression coefficient (slope, b) and its standard error (se(b)), for studies that do not directly report these. The methods are validated by comparing the calculated values with theoretical values in a smallscale simulation study.
Methods
In order to derive transformations we assume a bivariate normal distribution on the logscale for intake and status of an individual person. The logscale was chosen because both intake and status values are always above zero, and the observed distributions of the micronutrient variables are often rightskewed. Moreover, as the true shape of the dose–response curve is usually unknown the linear relation between logarithmically transformed quantities provides the simplest approximation.More in detail, for the dose–response metaanalysis of observational studies we assume that ${\xi}_{0}$ (intake of micronutrient) and ${\eta}_{0}$ (status or continuous health outcome) are lognormally distributed. The assumption of bivariate normality entails a linear association between $\xi =\text{ln}\left({\xi}_{0}\right)$and $\eta =\text{ln}\left({\eta}_{0}\right)$, where ln denotes the natural logarithm. Note that we use the Greek letters ξ and η for the theoretical values of intake and status/response, and the Latin letters X and Y for the observed values of these variables. Furthermore, we reserve letters without subscript (e.g. X and Y) for values expressed on the lnscale, and use letters with subscript 0 (e.g., X_{0} and Y_{0}) for values expressed on the absolute (i.e., original) scale.
The process of data transformations to obtain the required statistics from what is reported in observational studies, consists of four steps (Figure 1). The first step is to obtain the mean of X (mX) and Y (mY) and the standard deviation of X (sX) and Y (sY). Secondly, the mean of X_{0} (mX_{0}) and Y_{0} (mY_{0}) and the standard deviation of X_{0} (sX_{0}) and Y_{0} (sY_{0}) are calculated when needed for the calculations in step 3. In this third step the correlation coefficient of the association between X and Y (rXY) is calculated from the reported data. In the last step, the regression coefficient of the linear regression from Y on X (bYX) is calculated from rXY, and the se(bYX) is calculated from rXY, sY, sX and the sample size (n). For reports on randomized controlled trials, the process consists of three steps. In the first step, mY and sY are obtained for both intervention and placebo group. In the second step, mX is obtained, and in the last step, bYX and se(bYX) are calculated. The equations for all these transformations are given below.
Univariate transformations
First, we describe how the univariate statistics of the normal distributions at the lnscale can be obtained from various reported statistics. We present formulas for mX and sX, which of course can also be used similarly for mY and sY in observational studies. For randomized controlled trials the situation is different, because the variation in X is artificial and is not described by a normal distribution. Therefore, the transformations should be used only to obtain mY and sY in the intervention and placebo groups separately. In most trials the withingroup variation in X will be ignorable compared with the difference between the groups, consequently mX is calculated simply as mX_{con} = ln(mX_{0_con}) for the placebo group and as mX_{int} = ln(mX_{0_int}) the intervention group.
For these transformations, we assume that $\xi $ is normally distributed with parameters ${\mu}_{\xi}$and ${\sigma}_{\xi}$. For a lognormal distribution the mean on the absolute scale, ${\mu}_{{\xi}_{0}}$, is given by ${\mu}_{{\xi}_{0}}=exp\left({\mu}_{\xi}+0.5\phantom{\rule{0.12em}{0ex}}{\sigma}_{\xi}^{2}\right)$ and the standard deviation on the absolute scale, ${\sigma}_{{\xi}_{0}}$, is given by ${\sigma}_{{\xi}_{0}}=exp\left({\mu}_{\xi}+0.5{\sigma}_{\xi}^{2}\right)\sqrt{exp\left({\sigma}_{\xi}^{2}\right)1}$. It follows that when the mean (mX_{0}) and the standard deviation (sX_{0}) are reported, mX can be calculated as:
where
The exponential function of the mean of the lognormal distribution is equal to the median on the absolute scale. Therefore, when the median (medX_{0}) has been reported on the absolute scale, mX is calculated as:
As a measure of variability an IQRx or range (rangex) is often reported together with the median or mean. The IQR is the difference between the third quartile Q_{3} and first quartile Q_{1} (the 75^{th} percentile and the 25^{th} percentile). Basically, there are two cases. If the lower and upper limits are reported as such, the difference between the lntransformed limits may be equated to an appropriate multiple of the standard deviation sX. On the other hand, if only the IQR or range is reported as such, the derivation is more complex. When IQRX_{0}is reported together with the median, the relation between these and sX is given by ${\text{IQRX}}_{0}={\text{medX}}_{0}\times \left[exp\left(\text{z}\xb7\text{sX}\right)exp\left(\text{z}\xb7\text{sX}\right)\right]$, where z represents the appropriate percentage point in the standard normal distribution (i.e., z_{0.75} = 0.6745).
In this case sX may be calculated as
When the IQR is reported together with the mean no explicit formula exists to derive sX. Therefore, to obtain an estimate of sX from these quantities a nonlinear function optimization is employed to find the value of sX for which the following equation holds
When the lower and upper bounds of the IQR (i.e., Q_{1}(X_{0}) and Q_{3}(X_{0}) respectively) are reported, rather than the difference, sX may be calculated as $\text{sX}=\left[{Q}_{3}\left(X\right){Q}_{1}\left(X\right)\right]/2\text{z.}$
The range is the difference between the maximum and the minimum value of the data. Equations (4) and (5) may be similarly used when the range is reported, but here we consider that the minimum and the maximum represent the lower and upper (1/n) fraction of the dataset of n observations. Therefore we expect a fraction p = 11/(2n) below the minimum and the same fraction above the maximum, and in the equations above we need to use z_{p}. For example, in a dataset with n = 100 we use z_{0.995} = 2.576.
The geometric mean (gm) of the lognormal distribution is equal to exp(mX), and is most often reported in papers together with the 95% confidence limits. mX and sX are obtained for these quantities using:
where X_{0,upp} is the upper limit, X_{0,low} is the lower limit of the 95% confidence interval and z_{0.975} = 1.96 represents the 97.5th percentage point in the standard normal distribution.
Then in step 2 for observational studies, mx and sx are calculated in case these estimates were not already available. These statistics at the original scale may be needed in the bivariate transformations described below. The equations are:
Bivariate transformations (to obtain regression or correlation coefficients)
For observational studies, the next step is to obtain an estimate of the correlation between X and Y (rXY). The equations below can be used to obtain rXY from reported correlation and regression coefficients taking into account the possibility that either X_{0}, log_{10}(X_{0}), X, Y_{0}, log_{10}(Y_{0}) or Y was used for the originally reported statistic.
When a study reports the association as a Spearman rank correlation coefficient (r_{S}), rXY is calculated as
Another option is that the association between X_{0} and Y_{0} is reported as a regression coefficient (bY_{0}X_{0}). In that case the correlation coefficient, rX_{0}Y_{0}, is calculated first using
and then rXY is calculated using the following equation which was derived from Johnson & Kotz [7]:
This formula (12) is also used when the Pearson product–moment correlation coefficient rX_{0}Y_{0} is directly reported in a paper.
For observational studies that report the regression coefficient between Y_{0} and X, the correlation coefficient, rXY_{0}, is calculated using
When log_{10}(X_{0}) is used instead of X, sX is replaced by sX/ln(10) in formula (13).
Then rXY is calculated using the following equation [8, 9]:
This formula (14) is also used when rXY_{0} is reported directly or when the Pearson product–moment correlation coefficient is reported between log_{10}(X_{0}) and Y_{0}.
When the regression coefficient between Y and X_{0} is reported in an observational study, the regression coefficient, rX_{0}Y, is calculated using
When log_{10}(Y_{0}) is used instead of Y, sY is replaced by sY/ln(10) in formula (15).
Using rX_{0}Y or the directly reported Pearson product–moment correlation coefficient between X_{0} and log_{10}(Y_{0}) or Y in an observational study, rXY is calculated using [8, 9]:
When the regression coefficient between X and Y is reported, rXY is calculated as
Calculation of dose–response regression coefficient
In the last step, for both observational studies and randomized controlled trials, we need to obtain bYX and se(bYX). For observational studies, the required regression coefficient bYX is calculated from the correlation coefficient:
and the corresponding standard error (se(bYX)) is calculated as
For randomized controlled trials, the required regression coefficient bYX is calculated as:
where ‘int’ indicates the intervention group and ‘con’ indicates the control or placebo group. The corresponding standard error is calculated as:
Simulation study
A simulation study was conducted to validate the performance of the transformations given in this paper. Bivariate lognormal data (X,Y) were simulated where X ~ Normal(1.60,0.85^{2}) and Y ~ Normal(5.70,0.45^{2}). Parameter values were based on values of vitamin B12 intake (X) and serum/plasma vitamin B12 (Y) [10–13]. Different strengths of the correlation between X and Y were simulated, namely 0.1, 0.5 and 0.9.
A sample of individuals (with sample size 100, 200 or 500) was randomly drawn, and values that represent different often used reporting options were calculated from this sample, namely the mean and SD, the median and IQR, the median and range and the geometric mean and 95% CI (all summary statistics on the absolute scale). Also, the correlation and regression coefficients of X and Y expressed in different scales were calculated. These ‘reported’ values were rounded to two decimal places. From these ‘reported’ values, the parameter estimates mX, mY, sX, sY and rXY were calculated using the transformations described in this paper. This process was repeated 1000 times.
Results
Table 1 shows the simulation results for the univariate statistics. On average the calculated values of mX and mY are almost the same as the true values, indicating that no important bias is present in these calculations. As expected, the 95% CI of the simulations is smaller for the simulations with a sample size of 500 than for the simulations with a sample size of 200 or 100. For sX and sY, the estimates are most precise when a geometric mean with a 95% CI is reported, and least precise when a median with a range is reported.
Figure 2 shows the simulation results when a correlation coefficient is reported, and Figure 3 shows the simulation results when a linear regression coefficient is reported. Both these figures show the simulation results with true rXY = 0.5. Results are similar for true rXY = 0.9 and true rXY = 0.1 (data not shown). For the situation in which a correlation coefficient is the reported bivariate statistic, there is no difference for the four univariate reporting options. Therefore, these results are pooled in Figure 2.
None of the combinations of univariate and bivariate reporting options shows evidence of bias with the average of the simulations almost equal to the true value. The width of the confidence interval indicates the variability of the simulations. Because there is no appreciable bias, a smaller CI width indicates that the individual simulations are closer to the true correlation. The accuracy is best when rX_{0}Y is reported and worst when rX_{0}Y_{0} is reported. As expected, the accuracy is also better when the sample size is larger. Figure 3 shows that the CI is wider when the reported univariate statistics are the median and IQR or median and range. The larger variation in the results for the transformation from bYX_{0} (Figure 3B) compared with the variation in the results from bY_{0}X (Figure 3C) is caused by the fact the X was simulated with larger standard deviation than Y.
Example
To illustrate the methodology some examples of its use on real data for vitamin B12 are reported in Table 2 (observational studies [14, 15]) and Table 3 (randomized controlled trials [16, 17]). The tables show the statistics as reported in the studies and the statistics that are calculated using the different equations presented in this paper (which are entitled ‘required statistics’ in the tables).
Discussion
The investigated means, standard deviations, correlation coefficients and sample sizes were based on reallife values. The univariate statistics that are investigated in this paper were limited to mean and SD, median and IQR or range and geometric mean and 95% CI. These do not represent all reporting options that can be encountered in the literature, but cover most published papers. Other combinations of univariate statistics that were seen are for example mean with IQR, mean with range, and geometric mean with standard deviation. Also, the investigated regression and correlation coefficients are limited in this paper to those on the absolute or logarithmic scale, whereas sometimes other transformations to normality have been used in reports, such as a square root transformation. However, as the logarithmic transformation is by far the most often used transformation in papers in the medical research area, the equations in this paper will cover most published papers in this field.
The bivariate normal linear model on the logarithmic scale is an approximation that is used here because the data are positive data. Note that it allows the relationship between X_{0} and Y_{0} to be a linear, monotonic convex or monotonic concave function (i.e., for a slope equal, higher or lower than one, respectively). Even though some randomized controlled trials may investigate the dose–response relationship by providing multiple dosages in their study, most of these studies include only one intervention and one control group and consequently it is often unknown what the true relationship is. Therefore, this approximation provides a practical methodology to estimate the dose–response relationship and to combine the results from randomized controlled trials and observational studies. It was outside the scope of the simulation study to investigate other shapes of the dose–response relation.
The transformations in this paper consider reported regression and correlation coefficients that are unadjusted for other variables. It is possible to adjust the equations for adjusted regression or correlation coefficients, if these adjustments were done on the logscale. However, most often adjustment has been done on another scale, and moreover studies do not report all required statistics. Therefore, we did not consider adjusted coefficients.
In this paper we presented a methodology that allows for information from RCTs and observational studies to be summarised in comparable statistics. One possible application is to combine results of both types of study in a single metaanalysis. In general, a metaanalysis should include as much information as possible. However, there may be systematic differences between observational studies and randomized controlled trials. Therefore, it is advisable to check whether the size of the estimated regression coefficient differs between these different study designs. This may be done by stratified analysis or by using metaregression techniques.
Conclusions
The presented methodology provides calculations to use results from published literature to estimate the slope of the dose–response relation incorporating information from both randomized controlled trials and observational studies. The simulations clearly show that there is no observable bias associated with the transformations. Also, it can be seen that when a regression coefficient is reported, it is preferable to report the univariate statistics as mean and SD or geometric mean and 95% CI rather than as median with IQR or range.
Authors’ information
OS and CD are both postdoctoral research fellows at the Division of Human Nutrition of Wageningen University, the Netherlands. PvtV is professor of Nutrition and Epidemiology at the Division of Human Nutrition, the Netherlands. HvdV is statistician at Biometris, Wageningen University and Research centre, the Netherlands.
Abbreviations
 b:

Regression coefficient
 CI:

Confidence interval
 gm:

Geometric mean
 IQR:

Interquartile range
 m:

Mean
 med:

Median
 r:

Correlation coefficient
 s:

Standard deviation
 se:

Standard error.
References
 1.
Hoey L, Strain JJ, McNulty H: Studies of biomarker responses to intervention with vitamin B12: a systematic review of randomized controlled trials. Am J Clin Nutr. 2009, 89: 1981S1996S. 10.3945/ajcn.2009.27230C.
 2.
Lowe NM, Fekete K, Decsi T: Methods of assessment of zinc status in humans: a systematic review. Am J Clin Nutr. 2009, 89: 2040S2051S. 10.3945/ajcn.2009.27230G.
 3.
RisticMedic D, Piskackova Z, Hooper L, Ruprich J, Casgrain A, Ashton K, Pavlovic M, Glibetic M: Methods of assessment of iodine status in humans: a systematic review. Am J Clin Nutr. 2009, 89: 2052S2069S. 10.3945/ajcn.2009.27230H.
 4.
Kipnis V, Freedman LS: Impact of exposure measurement error in nutritional epidemiology. J Natl Cancer Inst. 2008, 100: 16581659. 10.1093/jnci/djn408.
 5.
Kohlmeier L, Bellach B: Exposure assessment error and its handling in nutritional epidemiology. Annu Rev Public Health. 1995, 16: 4359. 10.1146/annurev.pu.16.050195.000355.
 6.
Prentice RL: Dietary assessment and the reliability of nutritional epidemiology research reports. J Natl Cancer Inst. 2010, 102: 583585. 10.1093/jnci/djq100.
 7.
Johnson NL, Kotz S: Distributions in statistics: continuous multivariate distributions. 1972, Wiley, New York
 8.
Garvey PR:A family of joint probability models for cost and schedule uncertainties.27th Annual Department of Defense Cost Analysis Symposium September 1993.
 9.
Yuan PT: On the logarithmic frequency distribution and the semilogarithmic correlation. The Annals of Mathematical Statistics. 1933, 4: 3074. 10.1214/aoms/1177732821.
 10.
Al Khatib L, Obeid O, Sibai AM, Batal M, Adra N, Hwalla N: Folate deficiency is associated with nutritional anaemia in Lebanese women of childbearing age. Public Health Nutr. 2006, 9: 921927.
 11.
Bates CJ, Schneede J, Mishra G, Prentice A, Mansoor MA: Relationship between methylmalonic acid, homocysteine, vitamin B12 intake and status and socioeconomic indices, in a subset of participants in the British National Diet and Nutrition Survey of people aged 65 y and over. Eur J Clin Nutr. 2003, 57: 349357. 10.1038/sj.ejcn.1601540.
 12.
Hoey L, McNulty H, Askin N, Dunne A, Ward M, Pentieva K, Strain J, Molloy AM, Flynn CA, Scott JM: Effect of a voluntary food fortification policy on folate, related B vitamin status, and homocysteine in healthy adults. Am J Clin Nutr. 2007, 86: 14051413.
 13.
Shuaibi AM, Sevenhuysen GP, House JD: Validation of a food choice map with a 3day food record and serum values to assess folate and vitamin B12 intake in collegeaged women. J Am Diet Assoc. 2008, 108: 20412050. 10.1016/j.jada.2008.09.002.
 14.
Nath SD, Koutoubi S, Huffman FG: Folate and vitamin B12 status of a multiethnic adult population. Journal of the National Medical Association. 2006, 98: 6772.
 15.
Vogiatzoglou A, Smith AD, Nurk E, Berstad P, Drevon CA, Ueland PM, Vollset SE, Tell GS, Refsum H: Dietary sources of vitamin B12 and their association with plasma vitamin B12 concentrations in the general population: the Hordaland Homocysteine Study. Am J Clin Nutr. 2009, 89: 10781087. 10.3945/ajcn.2008.26598.
 16.
Ubbink JB, Vermaak WJ, van der Merwe A, Becker PJ, Delport R, Potgieter HC: Vitamin requirements for the treatment of hyperhomocysteinemia in humans. The Journal of nutrition. 1994, 124: 19271933.
 17.
Yajnik CS, Lubree HG, Thuse NV, Ramdas LV, Deshpande SS, Deshpande VU, Deshpande JA, Uradey BS, Ganpule AA, Naik SS, Joshi NP, Farrant H, Refsum H: Oral vitamin B12 supplementation reduces plasma total homocysteine concentration in women in India. Asia Pacific journal of clinical nutrition. 2007, 16: 103109.
Prepublication history
The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/12/57/prepub
Acknowledgements
We would like to thank the reviewer, Wolfgang Viechtbauer, for his valuable comments to the manuscript.
The work reported herein has been carried out within the EURRECA Network of Excellence (http://www.eurreca.org) which is financially supported by the Commission of the European Communities, specific Research, Technology and Development (RTD) Programme Quality of Life and Management of Living Resources, within the Sixth Framework Programme, contract no. 036196. This report does not necessarily reflect the Commission's views or its future policy in this area.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests’.
Authors’ contributions
OS participated in the design of the simulation study, performed the statistical analysis and drafted the manuscript. CD helped to draft the manuscript and participated in the design of the simulation study. PvtV participated in the coordination of the study and revised the manuscript critically. HvdV conceived of the study, helped with the statistical analysis and interpretation of the data and revised the manuscript critically. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Methodology
 Dose–response
 Metaanalysis
 EURRECA