 Research
 Open access
 Published:
Reexpressing coefficients from regression models for inclusion in a metaanalysis
BMC Medical Research Methodology volume 24, Article number: 6 (2024)
Abstract
Metaanalysis poses a challenge when original study results have been expressed in a nonuniform manner, such as when regression results from some original studies were based on a logtransformed key independent variable while in others no transformation was used. Methods of reexpressing regression coefficients to generate comparable results across studies regardless of data transformation have recently been developed. We examined the relative bias of three reexpression methods using simulations and 15 real data examples where the independent variable had a skewed distribution. Regression coefficients from models with logtransformed independent variables were reexpressed as though they were based on an untransformed variable. We compared the reexpressed coefficients to those from a model fit to the untransformed variable. In the simulated and real data, all three reexpression methods usually gave biased results, and the skewness of the independent variable predicted the amount of bias. How best to synthesize the results of the logtransformed and absolute exposure evidence streams remains an open question and may depend on the scientific discipline, scale of the outcome, and other considerations.
Introduction
The results of a group of studies deemed comparable can be synthesized quantitatively using metaanalysis. To base the metaanalysis on all available data, “Results extracted from study reports may need to be converted to a consistent, or usable, format for analysis” [1]. Methods of converting data presented by authors into a format suitable for metaanalysis have been well developed for effect sizes based on categorical representation of exposure. Our focus here, however, was on continuous measures of exposure, for which such methods are somewhat limited [2].
Our particular interest was in reexpression of results so that they could be included in a metaanalysis that could best inform a risk assessment. More specifically, the element of a risk assessment that we focused on in this work was metaanalysis to support a dose–response assessment [3]. Dose–response assessment in risk assessment is conducted so that the risk associated with any specific amount of an exposure can be examined [4]. When a dose–response evaluation is based on metaanalytic results, such results are more straightforward to relate to a specific exposure level if derived from models with exposure on an absolute, untransformed value. While our analyses also speak to matters related to the hazard assessment element of a risk assessment, these are addressed in our discussion. At any event, in conducting metaanalysis of exposure effects that might inform a risk assessment, often one encounters some original data reports where models of outcome were fitted in relation to log of exposure and others fitted in relation to absolute exposure, posing challenges to synthesis.
Various approaches to the problem of inconsistentlyexpressed effect estimates have been recommended or used in practice [5,6,7,8]. Obtaining the raw data or asking authors to reanalyze their data are the ideal solutions, though not always practical. When these options are not feasible, the results from studies using the lessfrequent approach have been be excluded from the metaanalysis [6], or preferably the results of studies that used transformed and original units are analyzed separately [7, 8], and then synthesized without metaanalysis (SWiM) [5]. Some authors, however, have recently used reexpression methods to address the problem [9, 10]. The validity of these reexpression methods, however, has not been evaluated in detail. Here we consider methods of reexpressing regression coefficients from linear models fit to a logtransformed exposure variable as the coefficient that would have been obtained had the authors left the exposure in its original units. We refer to this process as reexpression of β to an untransformed basis.
An algebraic method of reexpressing regression coefficients was recently described and evaluated using one simulated data set and one set of parameters [11]. RodriguezBarranco et al. found that in the setting of a logtransformed lognormally distributed independent variable, when the β coefficient from a model fit to the transformed data was reexpressed to what they would have gotten had the model been fit to the untransformed data, the reexpressed coefficient was half the size of the true (fitted) coefficient. They recommended caution in applying their method when the distribution of the independent variable was markedly asymmetric. More recently, other authors have developed computational methods of reexpressing coefficients from models fit to a logtransformed independent variable to approximate what would have been obtained if the model had been fit with the original unit continuous independent variable [9, 10]. The basic principle is to minimize the difference between the y predicted from y = β·log(x) and the y predicted from a y = β·x (over the same range of x) by varying β in the second equation. Figure 1 may aid visualization of the task, where y from y = β·log(x) is shown with a light blue dotted line, and the difference in y from a straight line is minimized over a range of x. When Steenland et al. originally described the procedure it was for a fixed range of x, applicable to a specific exposure, and the validity of their method was not evaluated. Dzierlenga et al. (2020) used the same basic principle as Steenland et al. with a modification of the method to be more flexible with respect to the range of the exposure variable and found that it performed well when evaluated using data from five studies of one exposure. In addition to the above reexpression methods, we developed a third (“Alternative”) estimator that is algebraic but different than that of RodriguezBarranco et al., and introduce it below, in the methods section.
The goal of the present project was to evaluate the validity of reexpression of regression coefficients to an untransformed basis for three methods using a wide variety of simulated and real data examples. To provide a context for interpretation of our results we have designated an amount of relative bias that we considered important. We note that an acceptable magnitude of bias is often not quantitated in reports like ours (e.g., [12, 13]). Reluctance to define generaluse cutoffs for acceptable bias is also reflected throughout the epidemiologic literature; for example see ROBINSE material on confounding [14]. Nonetheless, Freidrich et al. (2008), in their simulation study, defined bias as a ≥ 5% difference from an estimand [15]. A wellregarded textbook, Modern Epidemiology, 3rd Ed., p 261: gives a 5–10% difference in effect estimates as an amount that might be considered important, but they note (p. 262) that “the exact cutoff for importance is somewhat arbitrary but is limited in range by the subject matter” [16]. At any event, for the purposes of the present investigation, we considered a bias of ≥ 5% as reflecting an undesirable property of an estimator.
Methods
In this section, we present the simulation study that was used to evaluate the three estimators, and then describe the real datasets that were used to further evaluate the estimators. Our description of the simulation study follows the “ADEMP” format recommended by Morris et al. (2019), where ADEMP stands for Aims, Data generating mechanism, Estimand (target of analysis), Methods, and Performance measures [17]. The methods subsection of the ADEMP gives a detailed specification of the estimators and is thus relatively long.
Description of the simulation in ADEMP format
Aims
To examine bias in and coverage of the estimated regression coefficients (regression coefficient that would have been obtained had the original analysts not logtransformed exposure before fitting a regression model) calculated by three methods.
Data generating mechanism (DGM)
An independent random variable x with a log normal distribution used to define the dependent variable y = β_{DGM}·log_{b}(x) + e. The model parameters, possible values, and rationale for the chosen values are shown in Table 1 [18]. A β_{DGM} = 0 was not studied in the simulations because it caused instability in the relative bias performance measure. A range of σ, the standard deviation of the logtransformed exposure values, was chosen to cover the approximate range observed in the 15 real data studies. A factorial simulation design was used with the parameter values indicated in Table 1. Specifically, every possible combination of parameter values was used, for a total of 960 simulation scenarios (each with n_{sim} = 2000).
Estimand
The β coefficient from fitting y = β_{Estimand}·x + e with an ordinary least squares (OLS) model.
Methods
The three estimators evaluated were: 1) as described by RodriguezBarranco et al. (2017), 2) as described by Dzierlenga et al. (2020), and 3) an approach we introduce below and call the Alternative estimator. We refer to these as β_{RB}, β_{Dz}, and β_{Alt}, respectively.
An algebraic method for reexpression of β to an untransformed exposure was first presented by RodriguezBarranco et al. (2017). Equation 1 below shows their formula (see Model B in Table 1 of their publication):
In Eq. 1, β_{RB} is the reexpressed β coefficient using the RodriguezBarranco method, b is the log base used to transform x, c is the absolute change in exposure x (c = 1 unit of exposure in the present study), E[X] is the mean of the exposure, and β_{…} is the regression coefficient from the model using the logtransformed exposure. The same formula was applied to the confidence limits of β from the log(x) model.
A computational method for reexpression of β to an untransformed exposure was developed by Steenland et al. (2018), who described their method as
… iteratively minimizing the squared deviation of a new linear curve from the original logarithmic one, over a scale of 0 to 10 ng/ml PFOA [perfluorooctanoic acid], typical of studies in the general population. We also minimized squared deviation of a linear upper and lower confidence limit from the original logarithmic confidence interval curves. For any given study, the iteration was conducted by minimizing the sum of squares of the difference between the candidate linear curve and the logarithmic curve reported in the literature, across 10 points, at 1, 2…. through 10 ng/ml. Iteration began with an educated guess for a candidate linear curve that would approximate the logarithmic curves and proceeded by varying the candidate linear curve until the sum of squares of the differences were minimized.
Dzierlenga et al. (2020) used this same principle to calculate β_{Dz}, though it modified it so that the method was more flexible with respect to the range of the exposure variable. The modification used an algorithmic optimization over 6 points from the 25th to the 75th percentiles (25th, 35th…75th) of the estimated exposure distribution.
The Alternative method of algebraic reexpression that we developed for this report was based on the principle of calculating, on the untransformed scale of exposure, the increment that represented a doubling, a 2.718fold increase, or a tenfold increase (i.e., one log unit, with a base of 2, e, or 10). This was done by subtracting or adding 0.5 units on the log scale to the log(median exposure), backtransforming the results, and taking the difference (see Eq. 2).
In Eq. 2, I is the increment used to reexpress β from log to linear and b is the logarithm base. Then
The same formula was applied to the confidence limits of β from the log(x) model. R scripts/functions and data files for applying each of these three reexpression methods are available in the supplemental materials (Supplemental_Code.zip).
Performance measures
We focused on relative bias, coverage probability, and the Monte Carlo standard error of the relative bias for each estimator. An example of the formula for the mean relative bias for a given scenario is:
where, e.g., \({\beta }_{{RB}_{i}}\) refers to the β coefficient obtained from the RodriguezBarranco et al. estimator for the i^{th} repetition, \({\beta }_{{estimand}_{i}}\) refers to the β coefficient obtained from the ordinary least squares estimator on the untransformed, simulated data, and n_{sim} is the number of simulations conducted. Absolute value of the β_{estimand} is used as the denominator in order to generate the correct sign for the absolute bias when both β values are negative. β_{estimand} is used rather than β_{DGM} to calculate the relative bias in Eq. 4 so that the results reflect the performance of the estimator(s) in specific datasets. An example formula for the Monte Carlo standard error of the relative bias for a given scenario is:
Evaluation of the determinants of relative bias using the simulated data.
After running the simulations using the parameter values shown in Table 1, for each estimator we fit ordinary least squares models of the relative bias as a function of median, σ, b (log base), n_{obs} (number of observations), and β_{DGM}, and interaction terms between σ and these variables. A bothdirections stepwise approach was taken where the multiple of the number of degrees of freedom used for the penalty (k) was set to a value ~ 3.84 (p < 0.05 in Chisquare test) and the optimal model was selected by minimization of the AIC value [19]. Each observation in the dataset analyzed was the average result from 2000 simulations. Use of the average rather than the data for all 1,920,000 (960·2000) observations resulted in essentially the same models and produced more interpretable plots.
Evaluation of the validity of the three estimators using real data
To further evaluate the validity of reexpression methods and guide our simulations, we sought examples for various types of outcomes (dichotomous, logcontinuous, untransformed continuous) and a variety of environmental agents with exposure measured using a biomarker. Environmental exposures measured with a biomarker frequently are used in risk assessment and often have skewed distributions with a long tail to the right. We first identified a series of published analyses based on data that were publicly available. Second, we identified a similar series of published analyses that did not have raw data available but that presented regression results obtained with and without log transformation of the exposure.
For the example data that involved our reanalysis of published results, we chose results that could efficiently be replicated to a reasonable degree of accuracy using the originally described methods. When the authors presented results for more than one outcome or more than one exposure in a report, in general we arbitrarily chose one result that was statistically significant for inclusion in our evaluation; the exception was data from Xu et al. (2020), for which we included two results. Xu et al. (2020) showed results for two different outcomes, one continuous, and one dichotomous, that were examined in relation to the same exposure; the regression coefficients were statistically significant for both. A more detailed description of the methods of identifying the real data examples is in Suppl. Methods Sect. 1.
For each real data example, we calculated the relative bias for each of the three estimators (compared to the coefficient from models using the untransformed exposure), and then for the 15 examples calculated the median, quartiles, and range of relative bias values for each estimator.
In the two example datasets where the relative bias in the three estimators was largest, we explored whether the exclusion of influential observations affected the accuracy of the reexpression using β_{Dz}. In two additional examples datasets where the relative bias was typical of other studies, we also examined the effect of excluding influential points on the validity of the reexpression with β_{Dz}. Influential observations were identified with a difference in β analysis (change in β with each observation excluded one at a time) performed on the regression using untransformed exposure. A ttestlike statistic was used to identify the 5% of points that were unusually influential (DFBETAS> 2/√n) [20]. In addition, to evaluate whether our results were sensitive to the specific results selected as real data examples from the 15 reports, in each report we enumerated all results eligible for inclusion in our analysis, and selected one at random (regardless of statistical significance); when only two such results were available, however, we selected the one not previously selected. We refer these additional results below as the second set of real data examples. Please see Suppl. Methods Sect. 2 for more details.
Adjustment for bias in the estimators
The regression equations we developed to evaluate the determinants of relative bias in the simulated data (Sect. 2.2) were used to predict the relative bias in each estimator based on σ and other parameters, as needed. The predicted relative bias was used to estimate what the value of the estimator would have been were it not biased, e.g., β_{Alt,adjusted} = β_{Alt}/(1 + predicted relative bias of β_{Alt}). We applied this to the real datasets, to see if the adjustment resulted in an estimator with less relative bias.
Results
Simulations
A simplified example simulation with data generated by y = β_{DGM}·log_{e}(x) and parameters β_{DGM} = 1, median = 1, σ = 0.5, SD_{e} = 0 is depicted in Fig. 1. In this scenario β_{RB} slightly undershot the slope estimated from the fitted regression line, whereas the β_{Dz} and β_{Alt} estimators overshot the fitted slope, by a slightly greater magnitude. The range of parameter values in the simulation and original set of real data examples overlapped substantially (Table 1, Suppl. Table S1).
The relative bias of β_{RB} was a function of σ and the median exposure level (Fig. 2A). When x was significantly skewed (e.g., σ = 0.65) and the median was 1, the relative bias was close to zero, but with other combinations of σ and median the range of bias was substantial. The coefficients for the model of relative bias in β_{RB} are shown in Table S2.
The relative bias of β_{Dz} was primarily a function of σ (Fig. 2B, Table S2). Within the parameter space investigated, the absolute difference in relative bias due to the interaction of σ and n_{obs} was, with n_{obs} = 8474 (cf. n_{obs} = 162), < 0.05 (not shown).
The relative bias of β_{Alt} was primarily a function of σ and log base (Fig. 2C, Table S2). When log base was 10, at a given value of σ the relative bias was lower than when log base was 2 or e. When log base was 2 or e, β_{Alt} performed similarly to β_{Dz}.
As noted earlier, the models of relative bias for each estimator were fit to datasets with an n of 960, where each of the 960 observations was the average of 2000 simulations for each scenario (parameter set). When the same models were fit to all of the original data points (960·2000) the model fit statistics were essentially the same (Table S3).
Figure 2D shows the relative bias after restricting the parameter set for the simulation to best display the key properties of each estimator: all depend on σ, β_{RB} additionally depends on the median, and β_{Alt} additionally depends on the log base. The overall interpretation based on the figure was that in the simulations in general, with σ > 0.45 the estimators were substantially biased except for specific circumstances where β_{RB} did well. Another way to summarize the overall findings was by the performance measures presented in Table 2, based on the results for all simulation scenarios with β_{Estimand}> 0. The median Monte Carlo standard error (MCSE) across all three estimators was 0.002. Thirty percent of the 2880 simulations (960 scenarios ·3 estimators) had an MCSE > 0.005. More than 95% of all 2880 simulations had an MCSE that was ≤ 0.02 (relatively small compared with the average relative bias). Among the < 5% with an MCSE > 0.02, the n_{obs} was 162 and the log base was 10 in all instances. The maximum MCSEs were: β_{RB}, 0.196; β_{Dz}, 0.320; and β_{Alt}, 0.262. The coverage probabilities were substantially below 95%, reflecting how infrequently the estimators performed well. Compared with β_{RB}, the other two estimators, on average, had larger positive bias, but with higher coverage probabilities.
Because the regression analysis indicated that the main determinants of bias were σ, median, and log base for one or more of the estimators, for each estimator we examined coverage in relation to two values of these three parameters (Table 3). In general, as the average relative bias increased, the coverage decreased. The coverage tended to be better with exposure reexpressed by the original authors using a log base 10 than log base 2.
The real data and application of the estimators to it
We identified nine published analyses of data for which the raw data were publicly available and that met our criteria for selection (Table S4, Table S5 for second set of real data) [21,22,23,24,25,26,27,28]. The results of our reanalyses were generally the same order of magnitude as those originally published (Tables S4 and S5). The specific finding that we used in the analysis and its location in the original publication are listed in Supplementary Material Table S6 (Table S7 for second set of real data), as are the median, quartiles, and mean of the exposure distributions, which were estimated in some cases as indicated by table footnotes. We identified six published analyses of data where the original authors presented regression results using exposure with and without a logtransformation (Table S8, Table S9 for second set of real data) [10, 29,30,31,32,33]. Five of these were included in the assessment of validity by Dzierlenga et al. (2020) [9]. Among the fifteen example studies, a variety of outcomes and exposure variables were examined, though in two thirds of the studies the exposure was a perfluoroalkyl substance (either PFOA, perfluorohexanesulphonic acid, or perfluorooctane sulfonic acid).
When β was reexpressed as if it had been fit to untransformed exposure data, the range in relative bias across all three estimators was 0.5 to 16.8 (Table 4) and the interquartile ranges in relative bias were relatively wide. In the comparison of results for specific studies across reexpression methods, the relative bias was, for most of the studies, similar across methods (Table 4). These were studies where the median exposure was > 4 units (Table S6) – as would be expected based on Fig. 2D. For the Lee et al. (2020) and the two Xu et al. (2020) results [22, 24], however, β_{RB} had a much smaller relative bias than the other two methods. In these three instances, the median of the exposure variable was less than one, which was not the case for the other studies (see Supplementary Materials Table S6) and the σ was > 0.8 – which is the setting where the relative bias in β_{RB} was expected to be relatively small compared with the other estimators.
Our results for Odebeatu et al. (2019) and Pilkerton et al. (2018) were the ones with the greatest discrepancy between the reexpressed β coefficients and the β fitted to the untransformed exposure [23, 26] (Table 4). This discrepancy suggested that there may have been observations that were influential, and that the influence was affected by whether the exposure had been logtransformed. Thus, we conducted an analysis of whether exclusion of influential points affected the accuracy of the reexpression. For comparison, similar analyses were conducted using the data from Cheang et al. (2021) and Xu et al. (2020) (dichotomous outcome), which showed smaller relative differences between the reexpressed and fitted βs. The analyses with and without the inclusion of especially influential points in the real data sets showed that the accuracy of the reexpression estimators was affected by their exclusion (Table S10). The relative bias was affected by the influential points more so for Odebeatu et al. (2019) and Pilkerton et al., (2019) than for Cheang et al. (2021) and Xu et al. (2020), but even with the exclusion of influential observations the reexpression methods still had a high relative bias.
As was true for the original set of real data examples, the range of parameter values in the simulation and second set of real data examples overlapped substantially (Table 1, Suppl. Table S1). When the relative bias of the reexpressed ϐ coefficients was examined using the second set of real data examples (Suppl. Materials, Table S1), the range of relative bias (18.1 to 10.7) was greater than in the original set of real examples (0.5 to 16.8), and the interquartile ranges were narrower for β_{RB} than for β_{Dz} and β_{Alt}. In general, however, these distributions were all relatively wide, as in the original set of real data examples. For some studies the relative bias was similar across estimators (e.g., Abraham et al., 2020; Bulka et al., 2021; Darrow et al., 2013; Pilkerton et al., 2018; and Stein et al., 2016). As with the original set of data examples, agreement in degree of bias across reexpression methods tended to be higher when the median exposure was > 4. As before, a tendency for β_{RB} to have the lowest bias occurred when the median exposure was < 1 (Lee et al., 2020), especially when σ was > 0.8 (Odebeatu et al., 2019; Xu et al. 2020b). Similarly, β_{Dz} and β_{Alt} tended to have a smaller relative bias than β_{RB} when the median exposure was > 1 and σ was < 0.8 (e.g., Apelberg et al., 2008; Xu et al. 2020a). But median exposure and σ did not perfectly predict the lowestbias estimator, and few results had a relative bias that was in the range of 0 ± 0.05.
When we used the regression equations (Table S2) to predict the relative bias in the estimators when applied to each of the real data examples, and then adjusted the reexpressed β to remove the bias, the adjusted βs, on average, showed less relative bias, but as before, the interquartile range of the adjusted relative bias was wide (Table 5).
Discussion
In the simulations, the bias in each of the three estimators was evaluated in relation to the median of the exposure variable, the skewness in the exposure variable, the log base used to transform the exposure variable, the β in the model generating the data, and the n_{obs} simulated. For all three reexpression methods, the relative bias was more positive as the skewness of the exposure distribution increased. The relative bias in β_{RB} was also determined by the median of the exposure distribution, and the relative bias in β_{Alt} was also affected by the base of the log used to transform the exposure variable. Although a few specific circumstances were found where the relative bias in a given reexpression method was lower, in general, when the skewness of x was large enough that a log transformation might be applied, the methods gave results that were sufficiently biased that their use would not be advisable. The results from applying the reexpression methods to real datasets generally agreed with those from the simulation, but the relative bias was greater than predicted based on the simulations. The relative bias in the real data was not much affected by the exclusion of influential observations. The especially high relative bias of the reexpression methods in the case of the Odebeatu et al. (2019) data may have been due to the small size of the slope being reexpressed.
RodriguezBarranco et al. (2017) recognized the importance of skewness in causing bias in their estimator, though the degree of skewness in their simulations was not specified and only one median value was used. For the reexpression method proposed by Steenland et al. (2018), apparently it was assumed that if an exposure distribution had an upper bound near 10 units, their empirical reexpression method would be sufficiently accurate [10]. Our results suggested that the range of exposure was predictive of the validity of the reexpression only for the RB estimator. In a previous evaluation of bias in the Dzierlenga estimator [9], little bias was found. The five empirical data examples in that previous evaluation were all included in the present analysis. The relatively small number of empirical data studies in the previous evaluation may have led to an overlyoptimistic appraisal of the method.
In this report we focused on reexpressing regression coefficients from linear models fit to a logtransformed exposure variable. We could have also addressed the opposite: reexpression of regression coefficients from linear models fit to the untransformed exposure variable. To simplify the manuscript, we did not address this opposite type of reexpression. In risk assessment, results based on untransformed exposure are usually of greatest use, hence our focus on expressing all results in absolute units.
The real data examples used to inform the parameter space in the simulations represented a limited range of subject matter. In other fields the parameter space may differ from what we investigated. For example, if the n_{obs} in a study exceeded 8474, then the interaction between n_{obs} and σ might have a larger effect on the relative bias of the Dzierlenga estimator than noted here. Furthermore, the informal nature of our process for identifying real data examples to inform the parameter space (Suppl. Methods Sects. 1 & 2) precludes generalizing our simulation results to all environmental epidemiology studies with exposure measured with a biomarker. Nonetheless, the range of parameter values was broad enough it seems likely that the results may apply to many environmental epidemiology and perhaps other studies where relatively little variance in the outcome is explained. Similarly, the results of using the reexpression methods on the real data examples cannot be generalized to all environmental epidemiology studies with exposure measured with a biomarker. Examination of results for the real data examples, however, provided insights into the behavior of the reexpression methods not provided by the simulations alone and suggested that in practice, none of the reexpression methods were likely to work well. We also recognize that the focus on outcomeexposure relations considered here was a simple linear relationship, and that the dose–response relation in a given study might be better represented with a quadratic or other function.
How best to synthesize the results of the logtransformed and absolute exposure evidence streams remains an open question and may depend on the scientific discipline, scale of the outcome, and other considerations. In fields such as economics and psychology, metaanalysis of correlation coefficients is a wellrecognized approach that could be applied to the evidence synthesis problem discussed here [12]. Regression coefficients would need to first be reexpressed as correlation coefficients [13]. Metaanalysis of correlation coefficients when both the outcome and exposure are continuous variables is a widely used approach in some fields [34]. However, Pearson correlation coefficients depend on the variance of the outcome and exposure [35], which can vary across studies. In epidemiology, metaanalysis of correlations has been criticized because they can distort the results [36]. In the field of randomized clinical trials, metaanalysis of correlation coefficients has received scant discussion, while Synthesis Without Metaanalyses (SWiM) is wellaccepted [2]. Our particular interest was in reexpression of results so that they could be included in a metaanalysis that could inform a risk assessment. In that context, the two relevant elements of a risk assessment are hazard identification and dose–response assessment. As noted in the introduction, when a dose–response evaluation is based on metaanalytic results, such results are more straightforward to relate to a specific exposure level if derived from models with exposure in absolute, untransformed units. For hazard identification the results of epidemiologic studies with exposure that has been logtransformed and those with exposure in absolute units are both informative and use of SWiM might be the best solution to the synthesis problem. For a dose–response assessment in environmental epidemiology the reexpression methods studied in the present work appear to cause more bias than would be acceptable. A more general discussion of issues in evidence synthesis methods has been addressed elsewhere [37] and is outside the scope of the present work.
The results of this assessment of validity have implications for systematic reviewers and metaanalysts considering or using these reexpression methods. The bias due to reexpression with the three methods evaluated was affected by the skewness of the exposure variable, and, for some estimators, the median exposure or the type of transformation used. Even with adjustment for the bias these reexpression methods, the estimates, on average, were too biased, and too variable in their degree of bias, to justify their use to support metaanalyses used in risk assessment. Future studies comparing different methods of synthesis across evidence streams might clarify the settings in which distortion of results might be most likely to occur, quantify the magnitude of distortion, and explicate their strengths and weaknesses.
Availability of data and materials
The data used in this manuscript are freely available from their respective publications or upon request from the original authors. R scripts/functions and data files for applying each of these three reexpression methods are available in the supplemental materials (Supplemental_Code.zip).
Abbreviations
 β _{Estimand} :

β Coefficient from fitting y = β_{Estimand}·x + e with an ordinary least squares (OLS) model
 β _{RB} :

Estimator evaluated using the RodriguezBarranco method
 β _{Dz} :

Estimator evaluated using the Dzierlenga method
 β _{Alt} :

Estimator evaluated using the Alternative method
 ADEMP:

Aims, Data generating mechanism, Estimand, Methods, and Performance measures (Simulation study methodology)
 DGM:

Data generating mechanism
 PFOA:

Perfluorooctanoic acid
References
Higgins JP, Li T, Deeks JJ. Choosing effect measures and computing estimates of effect. In: Cochrane Handbook for Systematic Reviews of Interventions. John Wiley & Sons, Ltd; 2019. p. 143–76.
Deeks JJ, Higgins JP, Altman DG. Analysing data and undertaking metaanalyses. In: Cochrane Handbook for Systematic Reviews of Interventions. John Wiley & Sons, Ltd; 2023.
Allen B, Shao K, Hobbie K, Mendez W, Lee JS, Cote I, et al. Systematic doseresponse of environmental epidemiologic studies: Dose and response preanalysis. Environ Int. 2020;142:105810.
National Research Council. Science and Decisions: Advancing Risk Assessment. Washington, D.C.: National Academies Press; 2009. https://doi.org/10.17226/12209.
Campbell M, McKenzie JE, Sowden A, Katikireddi SV, Brennan SE, Ellis S, et al. Synthesis without metaanalysis (SWiM) in systematic reviews: reporting guideline. BMJ. 2020;368:l6890.
McCabe JJ, O’Reilly E, Coveney S, Collins R, Healy L, McManus J, et al. Interleukin6, Creactive protein, fibrinogen, and risk of recurrence after ischaemic stroke: Systematic review and metaanalysis. Eur Stroke J. 2021;6:62–71.
Negri E, Metruccio F, Guercio V, Tosti L, Benfenati E, Bonzi R, et al. Exposure to PFOA and PFOS and fetal growth: a critical merging of toxicological and epidemiological data. Crit Rev Toxicol. 2017;47:482–508.
Ye X, Kong W, Zafar MI, Chen LL. Serum triglycerides as a risk factor for cardiovascular diseases in type 2 diabetes mellitus: a systematic review and metaanalysis of prospective studies. Cardiovasc Diabetol. 2019;18:48.
Dzierlenga MW, Crawford L, Longnecker MP. Birth weight and perfluorooctane sulfonic acid: a randomeffects metaregression analysis. Environ Epidemiol Phila Pa. 2020;4:e095.
Steenland K, Barry V, Savitz D. Serum Perfluorooctanoic Acid and Birthweight: An Updated Metaanalysis With Bias Analysis. Epidemiol Camb Mass. 2018;29:765–76.
RodríguezBarranco M, Tobías A, Redondo D, MolinaPortillo E, Sánchez MJ. Standardizing effect size from linear regression models with logtransformed variables for metaanalysis. BMC Med Res Methodol. 2017;17:44.
van Aert RCM. Metaanalyzing partial correlation coefficients using Fisher’s z transformation. Res Synth Methods. 2023;14:768–73.
Souverein OW, Dullemeijer C, van’t Veer P, van der Voet H. Transformations of summary statistics as input in metaanalysis for linear doseresponse models on a logarithmic scale: a methodology developed within EURRECA. BMC Med Res Methodol. 2012;12:57.
Risk of bias tools  ROBINSE tool. https://www.riskofbias.info/welcome/robinsetool. Accessed 3 Nov 2023.
Friedrich JO, Adhikari NKJ, Beyene J. The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in metaanalysis: a simulation study. BMC Med Res Methodol. 2008;8:32.
Rothman KJ, Lash TL, Greenland S. Modern Epidemiology. Third. Philadelphia Baltimore New York: LWW; 2008.
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38:2074–102.
Aslibekyan S, Wiener HW, Wu G, Zhi D, Shrestha S, de Los Campos G, et al. Estimating proportions of explained variance: a comparison of whole genome subsets. BMC Proc. 2014;8 Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo:S102.
Venables WN, Ripley BD. Modern Applied Statistics with S. 4th ed. New York, NY: Springer; 2002.
Belsley DA, Kuh E, Welsch RE. Regression diagnostics: identifying influential data and sources of collinearity. New York: Wiley; 1980.
Bulka CM, Avula V, Fry RC. Associations of exposure to perfluoroalkyl substances individually and in mixtures with persistent infections: Recent findings from NHANES 1999–2016. Environ Pollut Barking Essex. 1987;2021(275):116619.
Lee S, Min JY, Min KB. Female Infertility Associated with Blood Lead and Cadmium Levels. Int J Environ Res Public Health. 2020;17:E1794.
Odebeatu CC, Taylor T, Fleming LE, J Osborne N. Phthalates and asthma in children and adults: US NHANES 2007–2012. Environ Sci Pollut Res Int. 2019;26:28256–69.
Xu C, Liang J, Xu S, Liu Q, Xu J, Gu A. Increased serum levels of aldehydes are associated with cardiovascular disease and cardiovascular risk factors in adults. J Hazard Mater. 2020;400:123134.
Stein CR, McGovern KJ, Pajak AM, Maglione PJ, Wolff MS. Perfluoroalkyl and polyfluoroalkyl substances and indicators of immune function in children aged 12–19 y: National Health and Nutrition Examination Survey. Pediatr Res. 2016;79:348–57.
Pilkerton CS, Hobbs GR, Lilly C, Knox SS. Rubella immunity and serum perfluoroalkyl substances: Sex and analytic strategy. PLoS ONE. 2018;13:e0203330.
Cheang I, Liao S, Zhu X, Lu X, Zhu Q, Yao W, et al. Association of acrylamide hemoglobin biomarkers with serum lipid levels in general US population: NHANES 2013–2016. Ecotoxicol Environ Saf. 2021;214:112111.
Abraham K, Mielke H, Fromme H, Völkel W, Menzel J, Peiser M, et al. Internal exposure to perfluoroalkyl substances (PFASs) and biological markers in 101 healthy 1yearold children: associations between levels of perfluorooctanoic acid (PFOA) and vaccine response. Arch Toxicol. 2020;94:2131–47.
Apelberg BJ, Witter FR, Herbstman JB, Calafat AM, Halden RU, Needham LL, et al. Cord serum concentrations of perfluorooctane sulfonate (PFOS) and perfluorooctanoate (PFOA) in relation to weight and size at birth. Environ Health Perspect. 2007;115:1670–6.
Washino N, Saijo Y, Sasaki S, Kato S, Ban S, Konishi K, et al. Correlations between prenatal exposure to perfluorinated chemicals and reduced fetal growth. Environ Health Perspect. 2009;117:660–7.
Hamm MP, Cherry NM, Chan E, Martin JW, Burstyn I. Maternal exposure to perfluorinated acids and fetal growth. J Expo Sci Environ Epidemiol. 2010;20:589–97.
Chen MH, Ha EH, Wen TW, Su YN, Lien GW, Chen CY, et al. Perfluorinated compounds in umbilical cord blood and adverse birth outcomes. PLoS ONE. 2012;7:e42474.
Darrow LA, Stein CR, Steenland K. Serum perfluorooctanoic acid and perfluorooctane sulfonate concentrations in relation to birth outcomes in the MidOhio Valley, 2005–2010. Environ Health Perspect. 2013;121:1207–13.
Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to MetaAnalysis. 1st ed. Chichester, U.K: Wiley; 2009.
Freund JE, Walpole RE. Mathematical statistics. 3rd ed. Englewood Cliffs, N.J.: PrenticeHall; 1980.
Greenland S, O’Rourke K. Chapter 33 Metaanalysis. Mod Epidemiol Lippincott Williamns Wilkins. 2008;:652–81.
Mueller M, D’Addario M, Egger M, Cevallos M, Dekkers O, Mugglin C, et al. Methods to systematically review and metaanalyse observational studies: a systematic scoping review of recommendations. BMC Med Res Methodol. 2018;18:44.
Acknowledgements
Dr. Michael W. Dzierlenga provided critical comments on an early draft of the manuscript.
Funding
Support for this project (Linakis, Van Landingham, Longnecker) came from 3M to Ramboll. 3M did not influence the research in any way and encouraged publication in a peerreviewed journal.
Author information
Authors and Affiliations
Contributions
Dr. Linakis ran the reexpression algorithm on the examples and simulation datasets, conducted the analysis of factors that determine validity, computed results using the other reexpression methods, and contributed to the draft of the manuscript. Ms. Van Landingham replicated the analysis of the example NHANES studies, conducted the influence analyses, and contributed to the draft of the manuscript. Dr. Gasparini helped to critically evaluate and provide feedback on the methods and presentation of the results and contributed to the draft of the manuscript. Dr. Longnecker conceived of the study, designed it, obtained funding for it, and drafted the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
3M, the company that funded this research, was not involved in the preparation of the manuscript. The authors retained sole control of the manuscript content and the findings, and statements in this paper are those of the authors and not those of the author’s employer or the sponsors. No authors were directly compensated by 3M.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Linakis, M.W., Van Landingham, C., Gasparini, A. et al. Reexpressing coefficients from regression models for inclusion in a metaanalysis. BMC Med Res Methodol 24, 6 (2024). https://doi.org/10.1186/s1287402302132y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402302132y