Transformations of summary statistics as input in metaanalysis for linear doseresponse models on a logarithmic scale: a methodology developed within EURRECA
 Olga W Souverein^{1}Email author,
 Carla Dullemeijer^{1},
 Pieter van `t Veer^{1} and
 Hilko van der Voet^{2}
DOI: 10.1186/147122881257
© Souverein et al.; licensee BioMed Central Ltd. 2012
Received: 3 January 2012
Accepted: 12 April 2012
Published: 25 April 2012
Abstract
Background
To derive micronutrient recommendations in a scientifically sound way, it is important to obtain and analyse all published information on the association between micronutrient intake and biochemical proxies for micronutrient status using a systematic approach. Therefore, it is important to incorporate information from randomized controlled trials as well as observational studies as both of these provide information on the association. However, original research papers present their data in various ways.
Methods
This paper presents a methodology to obtain an estimate of the dose–response curve, assuming a bivariate normal linear model on the logarithmic scale, incorporating a range of transformations of the original reported data.
Results
The simulation study, conducted to validate the methodology, shows that there is no bias in the transformations. Furthermore, it is shown that when the original studies report the mean and standard deviation or the geometric mean and confidence interval the results are less variable compared to when the median with IQR or range is reported in the original study.
Conclusions
The presented methodology with transformations for various reported data provides a valid way to estimate the dose–response curve for micronutrient intake and status using both randomized controlled trials and observational studies.
Keywords
Methodology Dose–response Metaanalysis EURRECABackground
Metaanalysis of the association between micronutrient intake and biochemical proxies for micronutrient status or function is needed when setting micronutrient recommendations. Information on this association may come from randomized controlled trials as well as from observational studies. In a randomized trial subjects are randomized to receive either the intervention treatment or the control treatment, and a metaanalysis of such studies will usually provide a mean difference in micronutrient status between placebo and intervention groups, answering the question whether the biochemical status marker responds to the dietary intake of a micronutrient [1–3]. However, this analysis does not provide an estimate of the slope of the dose–response relationship. On the other hand, a metaanalysis of observational studies provides an estimate of the slope of the dose–response relation, but observational studies are hampered by for instance measurement error in the intake estimates, which causes bias in the reported association [4–6].
Ideally, information from observational studies and randomized controlled trials should be compared or even combined in a single metaanalysis to ensure that all reported information is taken into account over a broad range of intake. This requires that the summary statistics reported in individual studies are transformed into estimates of the dose–response relation. Since both intake and status are continuous variables, this estimate is actually an estimate of the regression coefficient of the linear regression of micronutrient status on micronutrient intake. The individual estimates of the dose–response regression coefficient may then be combined in a metaanalysis.
The statistical combination of study results may be complicated by the variety of ways that individual studies report the summary statistics. The results from randomized controlled trials as well as the baseline summary statistics of micronutrient intake and status may be reported as means, medians or geometric means. Variability is often reported as standard deviations, standard errors, interquartile ranges (IQR), ranges or confidence intervals (CI). In observational studies the relation between intake and status can be reported as a Pearson correlation coefficient, a Spearman rank correlation coefficient or a regression coefficient. In addition, either the intake variable or the status variable or both could have been logarithmically transformed before the correlation or association was calculated. All these different ways of reporting need to be standardized before metaanalysis is even possible.
This paper gives an overview of transformation methods to algebraically derive an estimate from each study of the regression coefficient (slope, b) and its standard error (se(b)), for studies that do not directly report these. The methods are validated by comparing the calculated values with theoretical values in a smallscale simulation study.
Methods
In order to derive transformations we assume a bivariate normal distribution on the logscale for intake and status of an individual person. The logscale was chosen because both intake and status values are always above zero, and the observed distributions of the micronutrient variables are often rightskewed. Moreover, as the true shape of the dose–response curve is usually unknown the linear relation between logarithmically transformed quantities provides the simplest approximation.More in detail, for the dose–response metaanalysis of observational studies we assume that ${\xi}_{0}$ (intake of micronutrient) and ${\eta}_{0}$ (status or continuous health outcome) are lognormally distributed. The assumption of bivariate normality entails a linear association between $\xi =\text{ln}\left({\xi}_{0}\right)$and $\eta =\text{ln}\left({\eta}_{0}\right)$, where ln denotes the natural logarithm. Note that we use the Greek letters ξ and η for the theoretical values of intake and status/response, and the Latin letters X and Y for the observed values of these variables. Furthermore, we reserve letters without subscript (e.g. X and Y) for values expressed on the lnscale, and use letters with subscript 0 (e.g., X_{0} and Y_{0}) for values expressed on the absolute (i.e., original) scale.
Univariate transformations
First, we describe how the univariate statistics of the normal distributions at the lnscale can be obtained from various reported statistics. We present formulas for mX and sX, which of course can also be used similarly for mY and sY in observational studies. For randomized controlled trials the situation is different, because the variation in X is artificial and is not described by a normal distribution. Therefore, the transformations should be used only to obtain mY and sY in the intervention and placebo groups separately. In most trials the withingroup variation in X will be ignorable compared with the difference between the groups, consequently mX is calculated simply as mX_{con} = ln(mX_{0_con}) for the placebo group and as mX_{int} = ln(mX_{0_int}) the intervention group.
As a measure of variability an IQRx or range (rangex) is often reported together with the median or mean. The IQR is the difference between the third quartile Q_{3} and first quartile Q_{1} (the 75^{th} percentile and the 25^{th} percentile). Basically, there are two cases. If the lower and upper limits are reported as such, the difference between the lntransformed limits may be equated to an appropriate multiple of the standard deviation sX. On the other hand, if only the IQR or range is reported as such, the derivation is more complex. When IQRX_{0}is reported together with the median, the relation between these and sX is given by ${\text{IQRX}}_{0}={\text{medX}}_{0}\times \left[exp\left(\text{z}\xb7\text{sX}\right)exp\left(\text{z}\xb7\text{sX}\right)\right]$, where z represents the appropriate percentage point in the standard normal distribution (i.e., z_{0.75} = 0.6745).
When the lower and upper bounds of the IQR (i.e., Q_{1}(X_{0}) and Q_{3}(X_{0}) respectively) are reported, rather than the difference, sX may be calculated as $\text{sX}=\left[{Q}_{3}\left(X\right){Q}_{1}\left(X\right)\right]/2\text{z.}$
The range is the difference between the maximum and the minimum value of the data. Equations (4) and (5) may be similarly used when the range is reported, but here we consider that the minimum and the maximum represent the lower and upper (1/n) fraction of the dataset of n observations. Therefore we expect a fraction p = 11/(2n) below the minimum and the same fraction above the maximum, and in the equations above we need to use z_{p}. For example, in a dataset with n = 100 we use z_{0.995} = 2.576.
where X_{0,upp} is the upper limit, X_{0,low} is the lower limit of the 95% confidence interval and z_{0.975} = 1.96 represents the 97.5th percentage point in the standard normal distribution.
Bivariate transformations (to obtain regression or correlation coefficients)
For observational studies, the next step is to obtain an estimate of the correlation between X and Y (rXY). The equations below can be used to obtain rXY from reported correlation and regression coefficients taking into account the possibility that either X_{0}, log_{10}(X_{0}), X, Y_{0}, log_{10}(Y_{0}) or Y was used for the originally reported statistic.
This formula (12) is also used when the Pearson product–moment correlation coefficient rX_{0}Y_{0} is directly reported in a paper.
When log_{10}(X_{0}) is used instead of X, sX is replaced by sX/ln(10) in formula (13).
This formula (14) is also used when rXY_{0} is reported directly or when the Pearson product–moment correlation coefficient is reported between log_{10}(X_{0}) and Y_{0}.
When log_{10}(Y_{0}) is used instead of Y, sY is replaced by sY/ln(10) in formula (15).
Calculation of dose–response regression coefficient
Simulation study
A simulation study was conducted to validate the performance of the transformations given in this paper. Bivariate lognormal data (X,Y) were simulated where X ~ Normal(1.60,0.85^{2}) and Y ~ Normal(5.70,0.45^{2}). Parameter values were based on values of vitamin B12 intake (X) and serum/plasma vitamin B12 (Y) [10–13]. Different strengths of the correlation between X and Y were simulated, namely 0.1, 0.5 and 0.9.
A sample of individuals (with sample size 100, 200 or 500) was randomly drawn, and values that represent different often used reporting options were calculated from this sample, namely the mean and SD, the median and IQR, the median and range and the geometric mean and 95% CI (all summary statistics on the absolute scale). Also, the correlation and regression coefficients of X and Y expressed in different scales were calculated. These ‘reported’ values were rounded to two decimal places. From these ‘reported’ values, the parameter estimates mX, mY, sX, sY and rXY were calculated using the transformations described in this paper. This process was repeated 1000 times.
Results
Simulation results for mX, sX, mY and sY
n  mX  sX  mY  sY  

True  1.6  0.85  5.7  0.45  
Mean, SD  100  1.6 (1.41.8)  0.82 (0.651.06)  5.7 (5.65.8)  0.45 (0.370.53) 
200  1.6 (1.41.7)  0.83 (0.701.03)  5.7 (5.65.8)  0.45 (0.400.51)  
500  1.6 (1.51.7)  0.84 (0.750.98)  5.7 (5.75.7)  0.45 (0.420.49)  
Median, IQR  100  1.6 (1.41.8)  0.84 (0.631.10)  5.7 (5.65.8)  0.44 (0.350.56) 
200  1.6 (1.51.8)  0.85 (0.701.02)  5.7 (5.65.8)  0.45 (0.380.53)  
500  1.6 (1.51.7)  0.85 (0.760.95)  5.7 (5.75.7)  0.45 (0.400.50)  
Median, range  100  1.6 (1.41.8)  0.83 (0.581.14)  5.7 (5.65.8)  0.44 (0.320.60) 
200  1.6 (1.51.8)  0.83 (0.631.12)  5.7 (5.65.8)  0.44 (0.350.58)  
500  1.6 (1.51.7)  0.83 (0.681.06)  5.7 (5.75.7)  0.44 (0.360.56)  
Gm, 95% CI  100  1.6 (1.41.8)  0.85 (0.730.97)  5.7 (5.65.8)  0.45 (0.380.51) 
200  1.6 (1.51.7)  0.85 (0.770.94)  5.7 (5.65.8)  0.45 (0.410.49)  
500  1.6 (1.51.7)  0.85 (0.800.90)  5.7 (5.75.7)  0.45 (0.420.48) 
None of the combinations of univariate and bivariate reporting options shows evidence of bias with the average of the simulations almost equal to the true value. The width of the confidence interval indicates the variability of the simulations. Because there is no appreciable bias, a smaller CI width indicates that the individual simulations are closer to the true correlation. The accuracy is best when rX_{0}Y is reported and worst when rX_{0}Y_{0} is reported. As expected, the accuracy is also better when the sample size is larger. Figure 3 shows that the CI is wider when the reported univariate statistics are the median and IQR or median and range. The larger variation in the results for the transformation from bYX_{0} (Figure 3B) compared with the variation in the results from bY_{0}X (Figure 3C) is caused by the fact the X was simulated with larger standard deviation than Y.
Example
Example statistics for observational studies on vitamin B12 intake (X) and vitamin B12 status (Y)
Reference  Observed univariate statistics  Observed bivariate statistic  Required statistics  

Type X_{0} and Y_{0}  X_{0}  Y_{0}  n  Association  mX  sX  mY  sY  rXY  bYX  se(bYX)  
[14]  Mean, SD  9.3, 9.3  330, 140  177  rX_{0}Y_{0}  0.16  1.88  0.83  5.72  0.41  0.19  0.09  0.04 
[15]  gm, 95% CI  7.3, 7.17.5  354, 348360  1329  r_{s}  0.19  1.99  0.51  5.87  0.32  0.19  0.12  0.02 
Example statistics for randomized controlled trials on vitamin B12 intake and vitamin B12 status
Reference  Observed univariate statistics  Required statistics  

X_{0}*  Type Y_{0}  Y_{0}  n  mX  mY  sY  bYX  se(bYX)  
[16]  intervention  405  mean, SD  379, 189  17  6.00  5.83  0.47  0.12  0.03 
control  5  mean, SD  211, 77  17  1.61  5.29  0.35  
[17]  intervention  505  med, IQR  198, 158271  20  6.22  5.29  0.40  0.13  0.04 
control  5  med, IQR  110, 73165  20  1.61  4.70  0.60 
Discussion
The investigated means, standard deviations, correlation coefficients and sample sizes were based on reallife values. The univariate statistics that are investigated in this paper were limited to mean and SD, median and IQR or range and geometric mean and 95% CI. These do not represent all reporting options that can be encountered in the literature, but cover most published papers. Other combinations of univariate statistics that were seen are for example mean with IQR, mean with range, and geometric mean with standard deviation. Also, the investigated regression and correlation coefficients are limited in this paper to those on the absolute or logarithmic scale, whereas sometimes other transformations to normality have been used in reports, such as a square root transformation. However, as the logarithmic transformation is by far the most often used transformation in papers in the medical research area, the equations in this paper will cover most published papers in this field.
The bivariate normal linear model on the logarithmic scale is an approximation that is used here because the data are positive data. Note that it allows the relationship between X_{0} and Y_{0} to be a linear, monotonic convex or monotonic concave function (i.e., for a slope equal, higher or lower than one, respectively). Even though some randomized controlled trials may investigate the dose–response relationship by providing multiple dosages in their study, most of these studies include only one intervention and one control group and consequently it is often unknown what the true relationship is. Therefore, this approximation provides a practical methodology to estimate the dose–response relationship and to combine the results from randomized controlled trials and observational studies. It was outside the scope of the simulation study to investigate other shapes of the dose–response relation.
The transformations in this paper consider reported regression and correlation coefficients that are unadjusted for other variables. It is possible to adjust the equations for adjusted regression or correlation coefficients, if these adjustments were done on the logscale. However, most often adjustment has been done on another scale, and moreover studies do not report all required statistics. Therefore, we did not consider adjusted coefficients.
In this paper we presented a methodology that allows for information from RCTs and observational studies to be summarised in comparable statistics. One possible application is to combine results of both types of study in a single metaanalysis. In general, a metaanalysis should include as much information as possible. However, there may be systematic differences between observational studies and randomized controlled trials. Therefore, it is advisable to check whether the size of the estimated regression coefficient differs between these different study designs. This may be done by stratified analysis or by using metaregression techniques.
Conclusions
The presented methodology provides calculations to use results from published literature to estimate the slope of the dose–response relation incorporating information from both randomized controlled trials and observational studies. The simulations clearly show that there is no observable bias associated with the transformations. Also, it can be seen that when a regression coefficient is reported, it is preferable to report the univariate statistics as mean and SD or geometric mean and 95% CI rather than as median with IQR or range.
Authors’ information
OS and CD are both postdoctoral research fellows at the Division of Human Nutrition of Wageningen University, the Netherlands. PvtV is professor of Nutrition and Epidemiology at the Division of Human Nutrition, the Netherlands. HvdV is statistician at Biometris, Wageningen University and Research centre, the Netherlands.
Abbreviations
 b:

Regression coefficient
 CI:

Confidence interval
 gm:

Geometric mean
 IQR:

Interquartile range
 m:

Mean
 med:

Median
 r:

Correlation coefficient
 s:

Standard deviation
 se:

Standard error.
Declarations
Acknowledgements
We would like to thank the reviewer, Wolfgang Viechtbauer, for his valuable comments to the manuscript.
The work reported herein has been carried out within the EURRECA Network of Excellence (http://www.eurreca.org) which is financially supported by the Commission of the European Communities, specific Research, Technology and Development (RTD) Programme Quality of Life and Management of Living Resources, within the Sixth Framework Programme, contract no. 036196. This report does not necessarily reflect the Commission's views or its future policy in this area.
Authors’ Affiliations
References
 Hoey L, Strain JJ, McNulty H: Studies of biomarker responses to intervention with vitamin B12: a systematic review of randomized controlled trials. Am J Clin Nutr. 2009, 89: 1981S1996S. 10.3945/ajcn.2009.27230C.View ArticlePubMedGoogle Scholar
 Lowe NM, Fekete K, Decsi T: Methods of assessment of zinc status in humans: a systematic review. Am J Clin Nutr. 2009, 89: 2040S2051S. 10.3945/ajcn.2009.27230G.View ArticlePubMedGoogle Scholar
 RisticMedic D, Piskackova Z, Hooper L, Ruprich J, Casgrain A, Ashton K, Pavlovic M, Glibetic M: Methods of assessment of iodine status in humans: a systematic review. Am J Clin Nutr. 2009, 89: 2052S2069S. 10.3945/ajcn.2009.27230H.View ArticlePubMedGoogle Scholar
 Kipnis V, Freedman LS: Impact of exposure measurement error in nutritional epidemiology. J Natl Cancer Inst. 2008, 100: 16581659. 10.1093/jnci/djn408.View ArticlePubMedGoogle Scholar
 Kohlmeier L, Bellach B: Exposure assessment error and its handling in nutritional epidemiology. Annu Rev Public Health. 1995, 16: 4359. 10.1146/annurev.pu.16.050195.000355.View ArticlePubMedGoogle Scholar
 Prentice RL: Dietary assessment and the reliability of nutritional epidemiology research reports. J Natl Cancer Inst. 2010, 102: 583585. 10.1093/jnci/djq100.View ArticlePubMedPubMed CentralGoogle Scholar
 Johnson NL, Kotz S: Distributions in statistics: continuous multivariate distributions. 1972, Wiley, New YorkGoogle Scholar
 Garvey PR:A family of joint probability models for cost and schedule uncertainties.27th Annual Department of Defense Cost Analysis Symposium September 1993.
 Yuan PT: On the logarithmic frequency distribution and the semilogarithmic correlation. The Annals of Mathematical Statistics. 1933, 4: 3074. 10.1214/aoms/1177732821.View ArticleGoogle Scholar
 Al Khatib L, Obeid O, Sibai AM, Batal M, Adra N, Hwalla N: Folate deficiency is associated with nutritional anaemia in Lebanese women of childbearing age. Public Health Nutr. 2006, 9: 921927.View ArticlePubMedGoogle Scholar
 Bates CJ, Schneede J, Mishra G, Prentice A, Mansoor MA: Relationship between methylmalonic acid, homocysteine, vitamin B12 intake and status and socioeconomic indices, in a subset of participants in the British National Diet and Nutrition Survey of people aged 65 y and over. Eur J Clin Nutr. 2003, 57: 349357. 10.1038/sj.ejcn.1601540.View ArticlePubMedGoogle Scholar
 Hoey L, McNulty H, Askin N, Dunne A, Ward M, Pentieva K, Strain J, Molloy AM, Flynn CA, Scott JM: Effect of a voluntary food fortification policy on folate, related B vitamin status, and homocysteine in healthy adults. Am J Clin Nutr. 2007, 86: 14051413.PubMedGoogle Scholar
 Shuaibi AM, Sevenhuysen GP, House JD: Validation of a food choice map with a 3day food record and serum values to assess folate and vitamin B12 intake in collegeaged women. J Am Diet Assoc. 2008, 108: 20412050. 10.1016/j.jada.2008.09.002.View ArticlePubMedGoogle Scholar
 Nath SD, Koutoubi S, Huffman FG: Folate and vitamin B12 status of a multiethnic adult population. Journal of the National Medical Association. 2006, 98: 6772.PubMedPubMed CentralGoogle Scholar
 Vogiatzoglou A, Smith AD, Nurk E, Berstad P, Drevon CA, Ueland PM, Vollset SE, Tell GS, Refsum H: Dietary sources of vitamin B12 and their association with plasma vitamin B12 concentrations in the general population: the Hordaland Homocysteine Study. Am J Clin Nutr. 2009, 89: 10781087. 10.3945/ajcn.2008.26598.View ArticlePubMedGoogle Scholar
 Ubbink JB, Vermaak WJ, van der Merwe A, Becker PJ, Delport R, Potgieter HC: Vitamin requirements for the treatment of hyperhomocysteinemia in humans. The Journal of nutrition. 1994, 124: 19271933.PubMedGoogle Scholar
 Yajnik CS, Lubree HG, Thuse NV, Ramdas LV, Deshpande SS, Deshpande VU, Deshpande JA, Uradey BS, Ganpule AA, Naik SS, Joshi NP, Farrant H, Refsum H: Oral vitamin B12 supplementation reduces plasma total homocysteine concentration in women in India. Asia Pacific journal of clinical nutrition. 2007, 16: 103109.PubMedGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/12/57/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.