Transformations of summary statistics as input in meta-analysis for linear dose-response models on a logarithmic scale: a methodology developed within EURRECA
© Souverein et al.; licensee BioMed Central Ltd. 2012
Received: 3 January 2012
Accepted: 12 April 2012
Published: 25 April 2012
To derive micronutrient recommendations in a scientifically sound way, it is important to obtain and analyse all published information on the association between micronutrient intake and biochemical proxies for micronutrient status using a systematic approach. Therefore, it is important to incorporate information from randomized controlled trials as well as observational studies as both of these provide information on the association. However, original research papers present their data in various ways.
This paper presents a methodology to obtain an estimate of the dose–response curve, assuming a bivariate normal linear model on the logarithmic scale, incorporating a range of transformations of the original reported data.
The simulation study, conducted to validate the methodology, shows that there is no bias in the transformations. Furthermore, it is shown that when the original studies report the mean and standard deviation or the geometric mean and confidence interval the results are less variable compared to when the median with IQR or range is reported in the original study.
The presented methodology with transformations for various reported data provides a valid way to estimate the dose–response curve for micronutrient intake and status using both randomized controlled trials and observational studies.
KeywordsMethodology Dose–response Meta-analysis EURRECA
Meta-analysis of the association between micronutrient intake and biochemical proxies for micronutrient status or function is needed when setting micronutrient recommendations. Information on this association may come from randomized controlled trials as well as from observational studies. In a randomized trial subjects are randomized to receive either the intervention treatment or the control treatment, and a meta-analysis of such studies will usually provide a mean difference in micronutrient status between placebo and intervention groups, answering the question whether the biochemical status marker responds to the dietary intake of a micronutrient [1–3]. However, this analysis does not provide an estimate of the slope of the dose–response relationship. On the other hand, a meta-analysis of observational studies provides an estimate of the slope of the dose–response relation, but observational studies are hampered by for instance measurement error in the intake estimates, which causes bias in the reported association [4–6].
Ideally, information from observational studies and randomized controlled trials should be compared or even combined in a single meta-analysis to ensure that all reported information is taken into account over a broad range of intake. This requires that the summary statistics reported in individual studies are transformed into estimates of the dose–response relation. Since both intake and status are continuous variables, this estimate is actually an estimate of the regression coefficient of the linear regression of micronutrient status on micronutrient intake. The individual estimates of the dose–response regression coefficient may then be combined in a meta-analysis.
The statistical combination of study results may be complicated by the variety of ways that individual studies report the summary statistics. The results from randomized controlled trials as well as the baseline summary statistics of micronutrient intake and status may be reported as means, medians or geometric means. Variability is often reported as standard deviations, standard errors, interquartile ranges (IQR), ranges or confidence intervals (CI). In observational studies the relation between intake and status can be reported as a Pearson correlation coefficient, a Spearman rank correlation coefficient or a regression coefficient. In addition, either the intake variable or the status variable or both could have been logarithmically transformed before the correlation or association was calculated. All these different ways of reporting need to be standardized before meta-analysis is even possible.
This paper gives an overview of transformation methods to algebraically derive an estimate from each study of the regression coefficient (slope, b) and its standard error (se(b)), for studies that do not directly report these. The methods are validated by comparing the calculated values with theoretical values in a small-scale simulation study.
In order to derive transformations we assume a bivariate normal distribution on the log-scale for intake and status of an individual person. The log-scale was chosen because both intake and status values are always above zero, and the observed distributions of the micronutrient variables are often right-skewed. Moreover, as the true shape of the dose–response curve is usually unknown the linear relation between logarithmically transformed quantities provides the simplest approximation.More in detail, for the dose–response meta-analysis of observational studies we assume that (intake of micronutrient) and (status or continuous health outcome) are log-normally distributed. The assumption of bivariate normality entails a linear association between and , where ln denotes the natural logarithm. Note that we use the Greek letters ξ and η for the theoretical values of intake and status/response, and the Latin letters X and Y for the observed values of these variables. Furthermore, we reserve letters without subscript (e.g. X and Y) for values expressed on the ln-scale, and use letters with subscript 0 (e.g., X0 and Y0) for values expressed on the absolute (i.e., original) scale.
First, we describe how the univariate statistics of the normal distributions at the ln-scale can be obtained from various reported statistics. We present formulas for mX and sX, which of course can also be used similarly for mY and sY in observational studies. For randomized controlled trials the situation is different, because the variation in X is artificial and is not described by a normal distribution. Therefore, the transformations should be used only to obtain mY and sY in the intervention and placebo groups separately. In most trials the within-group variation in X will be ignorable compared with the difference between the groups, consequently mX is calculated simply as mXcon = ln(mX0_con) for the placebo group and as mXint = ln(mX0_int) the intervention group.
As a measure of variability an IQRx or range (rangex) is often reported together with the median or mean. The IQR is the difference between the third quartile Q3 and first quartile Q1 (the 75th percentile and the 25th percentile). Basically, there are two cases. If the lower and upper limits are reported as such, the difference between the ln-transformed limits may be equated to an appropriate multiple of the standard deviation sX. On the other hand, if only the IQR or range is reported as such, the derivation is more complex. When IQRX0is reported together with the median, the relation between these and sX is given by , where z represents the appropriate percentage point in the standard normal distribution (i.e., z0.75 = 0.6745).
When the lower and upper bounds of the IQR (i.e., Q1(X0) and Q3(X0) respectively) are reported, rather than the difference, sX may be calculated as
The range is the difference between the maximum and the minimum value of the data. Equations (4) and (5) may be similarly used when the range is reported, but here we consider that the minimum and the maximum represent the lower and upper (1/n) fraction of the dataset of n observations. Therefore we expect a fraction p = 1-1/(2n) below the minimum and the same fraction above the maximum, and in the equations above we need to use zp. For example, in a dataset with n = 100 we use z0.995 = 2.576.
where X0,upp is the upper limit, X0,low is the lower limit of the 95% confidence interval and z0.975 = 1.96 represents the 97.5th percentage point in the standard normal distribution.
Bivariate transformations (to obtain regression or correlation coefficients)
For observational studies, the next step is to obtain an estimate of the correlation between X and Y (rXY). The equations below can be used to obtain rXY from reported correlation and regression coefficients taking into account the possibility that either X0, log10(X0), X, Y0, log10(Y0) or Y was used for the originally reported statistic.
This formula (12) is also used when the Pearson product–moment correlation coefficient rX0Y0 is directly reported in a paper.
When log10(X0) is used instead of X, sX is replaced by sX/ln(10) in formula (13).
This formula (14) is also used when rXY0 is reported directly or when the Pearson product–moment correlation coefficient is reported between log10(X0) and Y0.
When log10(Y0) is used instead of Y, sY is replaced by sY/ln(10) in formula (15).
Calculation of dose–response regression coefficient
A simulation study was conducted to validate the performance of the transformations given in this paper. Bivariate lognormal data (X,Y) were simulated where X ~ Normal(1.60,0.852) and Y ~ Normal(5.70,0.452). Parameter values were based on values of vitamin B12 intake (X) and serum/plasma vitamin B12 (Y) [10–13]. Different strengths of the correlation between X and Y were simulated, namely 0.1, 0.5 and 0.9.
A sample of individuals (with sample size 100, 200 or 500) was randomly drawn, and values that represent different often used reporting options were calculated from this sample, namely the mean and SD, the median and IQR, the median and range and the geometric mean and 95% CI (all summary statistics on the absolute scale). Also, the correlation and regression coefficients of X and Y expressed in different scales were calculated. These ‘reported’ values were rounded to two decimal places. From these ‘reported’ values, the parameter estimates mX, mY, sX, sY and rXY were calculated using the transformations described in this paper. This process was repeated 1000 times.
Simulation results for mX, sX, mY and sY
Gm, 95% CI
None of the combinations of univariate and bivariate reporting options shows evidence of bias with the average of the simulations almost equal to the true value. The width of the confidence interval indicates the variability of the simulations. Because there is no appreciable bias, a smaller CI width indicates that the individual simulations are closer to the true correlation. The accuracy is best when rX0Y is reported and worst when rX0Y0 is reported. As expected, the accuracy is also better when the sample size is larger. Figure 3 shows that the CI is wider when the reported univariate statistics are the median and IQR or median and range. The larger variation in the results for the transformation from bYX0 (Figure 3B) compared with the variation in the results from bY0X (Figure 3C) is caused by the fact the X was simulated with larger standard deviation than Y.
Example statistics for observational studies on vitamin B12 intake (X) and vitamin B12 status (Y)
Example statistics for randomized controlled trials on vitamin B12 intake and vitamin B12 status
The investigated means, standard deviations, correlation coefficients and sample sizes were based on real-life values. The univariate statistics that are investigated in this paper were limited to mean and SD, median and IQR or range and geometric mean and 95% CI. These do not represent all reporting options that can be encountered in the literature, but cover most published papers. Other combinations of univariate statistics that were seen are for example mean with IQR, mean with range, and geometric mean with standard deviation. Also, the investigated regression and correlation coefficients are limited in this paper to those on the absolute or logarithmic scale, whereas sometimes other transformations to normality have been used in reports, such as a square root transformation. However, as the logarithmic transformation is by far the most often used transformation in papers in the medical research area, the equations in this paper will cover most published papers in this field.
The bivariate normal linear model on the logarithmic scale is an approximation that is used here because the data are positive data. Note that it allows the relationship between X0 and Y0 to be a linear, monotonic convex or monotonic concave function (i.e., for a slope equal, higher or lower than one, respectively). Even though some randomized controlled trials may investigate the dose–response relationship by providing multiple dosages in their study, most of these studies include only one intervention and one control group and consequently it is often unknown what the true relationship is. Therefore, this approximation provides a practical methodology to estimate the dose–response relationship and to combine the results from randomized controlled trials and observational studies. It was outside the scope of the simulation study to investigate other shapes of the dose–response relation.
The transformations in this paper consider reported regression and correlation coefficients that are unadjusted for other variables. It is possible to adjust the equations for adjusted regression or correlation coefficients, if these adjustments were done on the log-scale. However, most often adjustment has been done on another scale, and moreover studies do not report all required statistics. Therefore, we did not consider adjusted coefficients.
In this paper we presented a methodology that allows for information from RCTs and observational studies to be summarised in comparable statistics. One possible application is to combine results of both types of study in a single meta-analysis. In general, a meta-analysis should include as much information as possible. However, there may be systematic differences between observational studies and randomized controlled trials. Therefore, it is advisable to check whether the size of the estimated regression coefficient differs between these different study designs. This may be done by stratified analysis or by using meta-regression techniques.
The presented methodology provides calculations to use results from published literature to estimate the slope of the dose–response relation incorporating information from both randomized controlled trials and observational studies. The simulations clearly show that there is no observable bias associated with the transformations. Also, it can be seen that when a regression coefficient is reported, it is preferable to report the univariate statistics as mean and SD or geometric mean and 95% CI rather than as median with IQR or range.
OS and CD are both postdoctoral research fellows at the Division of Human Nutrition of Wageningen University, the Netherlands. PvtV is professor of Nutrition and Epidemiology at the Division of Human Nutrition, the Netherlands. HvdV is statistician at Biometris, Wageningen University and Research centre, the Netherlands.
We would like to thank the reviewer, Wolfgang Viechtbauer, for his valuable comments to the manuscript.
The work reported herein has been carried out within the EURRECA Network of Excellence (http://www.eurreca.org) which is financially supported by the Commission of the European Communities, specific Research, Technology and Development (RTD) Programme Quality of Life and Management of Living Resources, within the Sixth Framework Programme, contract no. 036196. This report does not necessarily reflect the Commission's views or its future policy in this area.
- Hoey L, Strain JJ, McNulty H: Studies of biomarker responses to intervention with vitamin B-12: a systematic review of randomized controlled trials. Am J Clin Nutr. 2009, 89: 1981S-1996S. 10.3945/ajcn.2009.27230C.View ArticlePubMedGoogle Scholar
- Lowe NM, Fekete K, Decsi T: Methods of assessment of zinc status in humans: a systematic review. Am J Clin Nutr. 2009, 89: 2040S-2051S. 10.3945/ajcn.2009.27230G.View ArticlePubMedGoogle Scholar
- Ristic-Medic D, Piskackova Z, Hooper L, Ruprich J, Casgrain A, Ashton K, Pavlovic M, Glibetic M: Methods of assessment of iodine status in humans: a systematic review. Am J Clin Nutr. 2009, 89: 2052S-2069S. 10.3945/ajcn.2009.27230H.View ArticlePubMedGoogle Scholar
- Kipnis V, Freedman LS: Impact of exposure measurement error in nutritional epidemiology. J Natl Cancer Inst. 2008, 100: 1658-1659. 10.1093/jnci/djn408.View ArticlePubMedGoogle Scholar
- Kohlmeier L, Bellach B: Exposure assessment error and its handling in nutritional epidemiology. Annu Rev Public Health. 1995, 16: 43-59. 10.1146/annurev.pu.16.050195.000355.View ArticlePubMedGoogle Scholar
- Prentice RL: Dietary assessment and the reliability of nutritional epidemiology research reports. J Natl Cancer Inst. 2010, 102: 583-585. 10.1093/jnci/djq100.View ArticlePubMedPubMed CentralGoogle Scholar
- Johnson NL, Kotz S: Distributions in statistics: continuous multivariate distributions. 1972, Wiley, New YorkGoogle Scholar
- Garvey PR:A family of joint probability models for cost and schedule uncertainties.27th Annual Department of Defense Cost Analysis Symposium September 1993.
- Yuan PT: On the logarithmic frequency distribution and the semi-logarithmic correlation. The Annals of Mathematical Statistics. 1933, 4: 30-74. 10.1214/aoms/1177732821.View ArticleGoogle Scholar
- Al Khatib L, Obeid O, Sibai AM, Batal M, Adra N, Hwalla N: Folate deficiency is associated with nutritional anaemia in Lebanese women of childbearing age. Public Health Nutr. 2006, 9: 921-927.View ArticlePubMedGoogle Scholar
- Bates CJ, Schneede J, Mishra G, Prentice A, Mansoor MA: Relationship between methylmalonic acid, homocysteine, vitamin B12 intake and status and socio-economic indices, in a subset of participants in the British National Diet and Nutrition Survey of people aged 65 y and over. Eur J Clin Nutr. 2003, 57: 349-357. 10.1038/sj.ejcn.1601540.View ArticlePubMedGoogle Scholar
- Hoey L, McNulty H, Askin N, Dunne A, Ward M, Pentieva K, Strain J, Molloy AM, Flynn CA, Scott JM: Effect of a voluntary food fortification policy on folate, related B vitamin status, and homocysteine in healthy adults. Am J Clin Nutr. 2007, 86: 1405-1413.PubMedGoogle Scholar
- Shuaibi AM, Sevenhuysen GP, House JD: Validation of a food choice map with a 3-day food record and serum values to assess folate and vitamin B-12 intake in college-aged women. J Am Diet Assoc. 2008, 108: 2041-2050. 10.1016/j.jada.2008.09.002.View ArticlePubMedGoogle Scholar
- Nath SD, Koutoubi S, Huffman FG: Folate and vitamin B12 status of a multiethnic adult population. Journal of the National Medical Association. 2006, 98: 67-72.PubMedPubMed CentralGoogle Scholar
- Vogiatzoglou A, Smith AD, Nurk E, Berstad P, Drevon CA, Ueland PM, Vollset SE, Tell GS, Refsum H: Dietary sources of vitamin B-12 and their association with plasma vitamin B-12 concentrations in the general population: the Hordaland Homocysteine Study. Am J Clin Nutr. 2009, 89: 1078-1087. 10.3945/ajcn.2008.26598.View ArticlePubMedGoogle Scholar
- Ubbink JB, Vermaak WJ, van der Merwe A, Becker PJ, Delport R, Potgieter HC: Vitamin requirements for the treatment of hyperhomocysteinemia in humans. The Journal of nutrition. 1994, 124: 1927-1933.PubMedGoogle Scholar
- Yajnik CS, Lubree HG, Thuse NV, Ramdas LV, Deshpande SS, Deshpande VU, Deshpande JA, Uradey BS, Ganpule AA, Naik SS, Joshi NP, Farrant H, Refsum H: Oral vitamin B12 supplementation reduces plasma total homocysteine concentration in women in India. Asia Pacific journal of clinical nutrition. 2007, 16: 103-109.PubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/12/57/prepub