Incorporating nonlinearity into mediation analyses
 George J. Knafl^{1}Email author,
 Kathleen A. Knafl^{1},
 Margaret Grey^{2},
 Jane Dixon^{2},
 Janet A. Deatrick^{3} and
 Agatha M. Gallo^{4}
DOI: 10.1186/s1287401702966
© The Author(s). 2017
Received: 4 October 2016
Accepted: 23 January 2017
Published: 21 March 2017
Abstract
Background
Mediation is an important issue considered in the behavioral, medical, and social sciences. It addresses situations where the effect of a predictor variable X on an outcome variable Y is explained to some extent by an intervening, mediator variable M. Methods for addressing mediation have been available for some time. While these methods continue to undergo refinement, the relationships underlying mediation are commonly treated as linear in the outcome Y, the predictor X, and the mediator M. These relationships, however, can be nonlinear. Methods are needed for assessing when mediation relationships can be treated as linear and for estimating them when they are nonlinear.
Methods
Existing adaptive regression methods based on fractional polynomials are extended here to address nonlinearity in mediation relationships, but assuming those relationships are monotonic as would be consistent with theories about directionality of such relationships.
Results
Example monotonic mediation analyses are provided assessing linear and monotonic mediation of the effect of family functioning (X) on a child’s adaptation (Y) to a chronic condition by the difficulty (M) for the family in managing the child's condition. Example moderated monotonic mediation and simulation analyses are also presented.
Conclusions
Adaptive methods provide an effective way to incorporate possibly nonlinear monotonicity into mediation relationships.
Keywords
Adaptive regression Childhood chronic conditions Fractional polynomials Mediation Moderated mediation NonlinearityBackground
Mediation is an important issue considered in the behavioral, medical, and social sciences, addressing situations where the effect of a predictor variable X on an outcome (or dependent or response) variable Y is explained to some extent by an intervening, mediator variable M. Methods for addressing mediation have existed for some time [1–3]. Since then, they have undergone a variety of refinements [4–28].
Relationships underlying mediation are commonly treated as linear in Y, X, and M. Mediating relationships, however, can be nonlinear. A few authors have addressed nonlinearity in the mediation context [29–32]. Pearl [30, 31] has developed a general approach for quantifying direct, indirect, and total effects allowing for nonlinearity as well as for categorical variables. Standard polynomials sometimes can be effectively used to address nonlinearity in predictors, but not in general [33]. To fully address nonlinearity requires more complex methods like nonparametric regression. Methods are needed to assess linearity of mediation relationships and to conduct mediation analyses when the underlying relationships are nonlinear.
The purpose of this paper is 1. To present an approach for assessing linearity of mediation relationships and for conducting nonlinear mediation analyses, based on the adaptive regression methods of Knafl and Ding [34], and 2. To provide example analyses using these methods. Adaptive methods were originally developed for nonlinear modeling in the Poisson regression context with Poisson distributed count outcomes [35], but the methods extend to linear regression with continuous outcomes treated as normally distributed, as often used in mediation analyses, as well as logistic regression with discrete outcomes [36]. SAS® (SAS Institute, Inc., Cary, NC) macros have been developed to support adaptive regression. Knafl [37] presents examples of the use of one of these macros for nonlinear growth curve modeling. These methods address not only nonlinearity in predictors but also nonlinearity in outcomes when those outcomes are continuous and positive valued. They also allow for correlated outcomes and/or nonconstant variances.
Behavioral, medical, and social science theories usually do not hypothesize that relationships are nonlinear but implicitly assume they are linear. However, Hayes and Preacher [29] provide a variety of examples of behavioral theories that incorporate nonlinearity. In any case, hypothesized relationships are usually stated in terms of directionality: as X increases, Y increases or decreases. Such statements are not inherently linear but can be represented more generally by possibly nonlinear monotonic relationships. Monotonicity is operationalized in this paper with single power transforms. Moderated monotonic mediation is also addressed in the paper.
Methods
This section starts by formulating standard linear mediation relationships, then nonlinear mediation relationships followed by monotonic mediation relationships. Adaptive regression methods are described next along with how they can be used to assess monotonic mediation and moderated monotonic mediation. The section ends with a description of the two data sets used in example analyses.
Linear mediation relationships
The random variables U _{ Y,1}, U _{ M }, and U _{ Y,2} are assumed to have mean zero. They can be thought of as omitted factors (as in [31]) or as random errors.
A variety of tests for a zero indirect effect are available. Sobel’s test [21, 39] is the best known, but MacKinnon et al. [14] considered 14 alternatives including Sobel’s test. Difference in coefficients approaches capitalize on the first equality for Δ while product of coefficients approaches including Sobel’s test capitalize on the second equality. The assumptions underlying these tests can be questionable, and so the test based on the bootstrapped confidence interval (CI) for Δ has been proposed as a robust alternative [21, 29, 40, 41], especially when biascorrected [42]. However, biascorrected bootstrapped CIs can generate inflated Type I errors [43].The strength of the indirect effect [21] can be measured by the relative indirect effect \( {\varDelta}^{\hbox{'}}=\raisebox{1ex}{$\varDelta $}\!\left/ \!\raisebox{1ex}{$ c$}\right. \) (often denoted as P _{ M }).
Since mediation addresses causality, temporal precedence is an important issue. This can be addressed by measuring X before measuring M and that before measuring Y [44]. However, mediation sometimes is addressed using crosssectional data. Equations (1)(3) are the same for these two cases. Random assignment of the settings of X provides support for causality for the X to Y relationship and the X to M relationship, but the M to Y relationship can still be confounded, and even when X, M, and Y have been measured longitudinally [45]. Nonexperimental or observational studies need theoretical justification for hypothesized causal relationships [21]. Pearl [31] has provided sufficient conditions for identifying direct, indirect, and total effects for observational studies.
Nonlinear mediation relationships
General approach for addressing nonlinearity
Pearl’s more general approach
Examples
The expected value E(U _{ M } ^{2} ) > 0 except in trivial cases. Consequently, this is not the same value as obtained by the HayesPreacher product of coefficients approach. The problem with that approach is that U _{ M } cannot always be ignored in computing indirect effects. Similar problems could occur if f _{ Y } is nonlinear in u _{ Y }. This result indicates the shortcoming of trying to generalize the indirect effect using a product of coefficients approach.
Monotonic mediation
Examples
We recommend using the special case with p " = 1 due to its desirable properties. It provides a natural generalization of the linear case of models (1)(3) to account for monotonicity. In what follows, p " = 1 is assumed unless otherwise stated.
Adaptive regression
Methods are needed for estimating the relationships of models (4)(5) and (9)(10) and for assessing whether those relationships are nonconstant in X or M and whether they are reasonably treated as linear in Y, X, and/or M or are distinctly nonlinear in any of those variables. We use adaptive regression modeling [34] for these purposes. Methods are needed as well for assessing whether the instantaneous natural indirect effect \( \frac{dNIE(x)}{dx} \) is nonzero. This can be addressed as in linear mediation with bootstrapped CIs, but now computed for a grid of possible values x for X.
Knafl et al. [35] developed adaptive regression methods for nonlinear modeling of Poisson distributed count outcome variables Y. Knafl et al. [36] extended this to generalized linear modeling including adaptive linear regression with outcomes treated as normally distributed, as often used in mediation analyses (but they used these methods to address modeling of electronically monitored medication adherence data rather than mediation). How adaptive regression analyses are conducted is described next.
Adaptive regression methods use likelihood crossvalidation (LCV) scores (as defined later) to evaluate and compare models. These scores generalize to contexts where estimation is based on maximizing likelihoodlike functions such as extended quasilikelihood functions [47]. Heuristic (i.e., rulebased) search techniques guided by LCV scores are used to identify effective fractional polynomial models [33] in primary predictor variables including predictors\( X \) and mediators M as needed for mediation. Fractional polynomials generalize standard polynomials, which use only nonnegative integer powers, to allow for the possibility of negative and fractional powers. Fractional polynomial models for continuous outcomes are linear in associated parameter vectors (consisting of the intercept and slopes for power transforms of predictors), and so they are linear models estimated using linear regression methods. However, these models are based on relationships that are in general nonlinear (or curvilinear) in the predictor X for models (4)(5) and (9)(10) as well as in the mediator M for models (5) and (10).
Power transforms f(X, p) are defined for arbitrary real valued primary predictors X and powers p by setting f(X, p) to X ^{ p } when X > 0, to 0 when X = 0, and to cos(π ⋅ p) ⋅ X^{ p } when X < 0 where\( \cos \) denotes the standard cosine function, π is the usual constant, and X denotes the absolute value of X. Note that for X < 0, the sign of f(X, p) oscillates between ±1 as the power \( p \) varies. For simplicity of notation, f(X, p) is denoted as X ^{ p }.
Likelihood crossvalidation
Larger LCV scores indicate better models, but not necessarily substantially better models. This issue is assessed with LCV ratio tests, analogous to likelihood ratio tests in being based on the χ^{2} distribution. LCV ratio tests are expressed in terms of a cutoff for a substantial (or distinct or significant) percent decrease in the LCV score, changing with the sample size. The formula for the cutoff is provided in Section 4.4.2 of Knafl and Ding [34] and in eq. (6) of Knafl et al. [36]. LCV ratio tests are more conservative than standard tests for zero coefficients (examples are provided in [48–50]) in the sense that removal from the model of a predictor with a significant slope does not always generate a substantial percent decrease in the LCV score. Consequently, LCV ratio tests are similar in effect to adjustments for multiple comparisons.
Overview of the adaptive modeling process for a fixed outcome
Adaptive fractional polynomial models for a fixed outcome in terms of primary predictors are identified using a heuristic search process beginning with an expansion phase, systematically adding power transforms of those primary predictors into the model, followed by a contraction phase, systematically removing any extraneous power transforms and adjusting the powers of the remaining transforms to improve the LCV score. The search process is controlled by tolerance parameters indicating the change in LCV scores that can be tolerated for each step in the search process. For example, the contraction stopping tolerance parameter setting is based on a LCV ratio test so that the final selected model is parsimonious. The tolerance parameter settings change with the number of measurements in the sample, thereby adjusting the search process by the sample size. The full adaptive modeling process is formulated in Chapter 20 of [34]. That full process is needed for estimating models (4)(5) with arbitrary nonlinearity. Estimation of the monotonic models of (9)(10) requires a simpler search process as addressed later.
The adaptive modeling process can generate adaptive models for variances (or more generally dispersions [47]) as well as for means. For example, when M and/or Y are continuous, the omitted factors or errors U _{ M } and U _{ Y } can be treated as normally distributed (or at least approximately so) with nonconstant variances that are functions of X, M, and/or some other primary predictors. The log of the variances is modeled as linear in the coefficient parameters for possibly power transformed primary predictors. Coefficient parameters for the means and variances are estimated jointly using maximum likelihood with likelihoods based on the normal distribution. For correlated continuous outcomes, for example, outcomes measured over clusters such as family members or patients of the same provider, likelihoods are based on the multivariate normal distribution.
Adaptive nonlinear moderation can be addressed simply by including interactions as primary predictors. More generally, the adaptive modeling process can automatically generate geometric combinations of two or more primary predictors, that is, products of power transforms of distinct subsets of the primary predictors using possibly different powers to transform those primary predictors. The subset of transforms in the geometric combinations and their powers are generated adaptively.
Searching for power transforms
A base model M _{0} is expanded to include a transform of a predictor X as follows. Let M _{0}(p) denote the model M _{0} with the power transform X ^{ p } added to it. A grid search is conducted first to maximize LCV(M _{0}(p)) over an initial set of powers. By default, the initial powers p = −3, −2.5, ⋯, −0.5, 0.5, ⋯, 2.5, 3 are used, but this set can be adjusted if desired. The power 0 is purposely not considered. For the case with X > 0, the effect of p = 0 on M _{0} depends on whether or not M _{0} includes an intercept. When there is an intercept, the power 0 corresponds to the natural log transform (demonstrated by taking the limit as p → 0); otherwise it corresponds to the constant transform adding in an intercept parameter to M _{0}. When X has zero or negative values, the effect is more complex. Not considering the power 0 avoids slowing the adaptive modeling process to check M _{0} to see what the effect of that power is. In any case, powers p close to 0 approximate the appropriate case without having to check to see which one it is. By default, the smallest powers in absolute value that are considered are ±0.0001, but this can be adjusted.
Next the choice of a power p _{2} for changes in powers of δ = ±0.1 is identified through a similar search on either side of p _{1}. Then, this is iterated searching around p _{2} over changes δ = ±0.01, then δ = ±0.001, and so on. By default, changes of at most δ = ±0.0001 are considered, but this can be adjusted. At any stage of this process, if the smallest of the three LCV scores analogous to those of inequality (11) compared to the largest of those three scores generates a percent decrease greater than the associated tolerance parameter (indicating these three LCV scores are not close to each other), continue the search with one more decimal digit; otherwise stop the process. When the process stops at the j ^{th} stage, the selected model is M _{0}(p _{ j }).
By default, the expansion stopping tolerance is set generously to 2.5 times the cutoff for a LCV ratio test, and so it is likely a transform of X would be added to the base model. However, it is also possible that the expansion might not add a transform of X to the base model, supporting the conclusion that X does not have an effect on the outcome Y after controlling for the transforms of the base model.
Identification of a transform of a predictor to add to a base model is a small part of the full adaptive modeling process. Multiple predictors need to be considered as part of the expansion; the best transform to remove from a base model needs to be determined as part of the contraction. With each such removal, the powers of the remaining transforms need to be adjusted to improve the LCV score. However, models (9)(10) with p " = 1 and p and q fixed require identification of only the single power q ' for X in (9) and the single power p ' for X in (10). An algorithm for identifying choices for these two powers is defined later.
For the general adaptive modeling process, tests for zero coefficients for transforms in selected models are inappropriate to conduct since these tests are usually significant. Due to the contraction heuristics, the removal of each transform of the selected model generates a substantial percent decrease in the LCV score. However, this is not the case for models (9)(10) based on single transforms of X. While the selected power transform for X in one of these models generates an optimal LCV score, the LCV score for the associated constant base model might not generate a substantial percent decrease in the LCV score, and so the slope for that selected power transform might not always be significantly nonzero. However, a LCV ratio test can also be used to assess the impact of including X in the model versus not including it and is likely to be more conservative than the test for a zero slope for X.
Setting the number of folds
Knafl et al. [35] used\( 10 \) folds for estimating nonlinear individualsubject medication adherence curves on the recommendation of Kohavi [51]. However, the data sets they used had limited sample sizes at most \( 100. \) The choice of the number k of folds may need to be adjusted for different sample sizes. Knafl and Grey [52] investigated this issue for exploratory factor analysis models. They found that for three different sets of items the same number of factors was selected by maximizing LCV scores as long as the value of k was not too small. Consequently, they recommended using the first local maximum in k over multiples of 5 folds for some benchmark analysis, in their case the selection of the number of factors extracted through maximum likelihood. This choice balances the need for a sufficiently large number of folds while limiting the amount of computation. Section 2.8 of [34] provides a more complete assessment of this issue.
The composite mediation model
Identifying power transforms for \( \mathrm{X} \) in the composite model
Since power transforms are added to the model without adjusting the powers of previously added transforms, there might be an improvement if these powers are adjusted. The formal power transform process is defined in Section 20.4.4 of [34]. Informally, each power is adjusted using the power adjustment process described earlier but starting at its current value holding the other powers fixed. Stop if no adjusted powers provide an improvement in the LCV score. Otherwise, continue the process using the adjusted power generating the best improvement in the LCV score. However, since the generated model (12) for fixed p and q has only two transforms, the improvement if any is unlikely to be substantial. Consequently, power adjustment is treated as an optional feature of the adaptive monotonic mediation process.
The model for the variances (technically, the model for the log of the variances) based on I _{1} and I _{2} results in separate but constant variances for U _{ M } and U _{ Y }. The associated covariance matrix also allowing for the single general correlation ρ is called compound symmetry heterogeneous (CSH). Nonconstant variances can be generated by considering X _{1}, X _{2}, and/or M as primary predictors for modeling the variances. The full adaptive modeling process can be used for modeling the variances since these need not be monotonic. This is achieved by including I _{1} and I _{2} in the base model for the variances, not restricting the expansion of the model for the variances, and restricting the contraction to contract only transforms from the model for the variances. By default, the contraction considers removal of the intercepts corresponding to I _{1} and I _{2} in the model for the variances, but this can be overridden.
Covariates can be included in model (12). For each covariate Z, include Z _{1} = (Z 0)^{ T } and Z _{2} = (0 Z)^{ T } as primary predictors for the means and/or variances, respectively, addressing models (9)(10). The full adaptive modeling process can be used to generate power transforms of Z _{1} and Z _{2} for modeling the means and/or variances.
Poweradjusted likelihood crossvalidation
Assume that Y > 0 so that transforms Y ^{ p } are welldefined for all p. Model (10) for Y ^{ p } for a fixed power\( p \) can be estimated using existing adaptive regression methods as described above. Standard LCV scores for these models, not accounting for the power p, as defined earlier are based on the normal density function. The power \( p \) can be chosen by maximizing an alternative poweradjusted LCV score that also accounts for power transformation of the outcome Y using the poweradjusted likelihood function as defined next.
The univariate outcome transformation process can be applied as well to model (9). The formulation also extends readily to multivariate data. When those data are based on repeatedly measuring the same outcome Y at different times or over different conditions (such as members of the same family or patients of the same provider), it is reasonable to transform each such outcome measurement with the same power p. However, model (12) requires consideration of different powers p and q for Y and M with associated poweradjusted LCV scores LCV(p, q).
Identification of appropriate choices for \( p \) and q can be achieved by starting with p = q = 1 and using grid searches to adjust p with q = 1 fixed giving LCV(p _{1}, 1) and to adjust q with p = 1 fixed giving LCV(1, q _{1}). If LCV(p _{1}, 1) > LCV(1, q _{1}), use a grid search in q with p = p _{1}; otherwise use a grid search in p with q = q _{1}. This generates powers p _{2} and q _{2}. In example analyses, the grid searches are first conducted over changes of ±0.5 generating the powers p _{2} and q _{2}, then over changes of ±0.1 starting at the powers p _{2} and q _{2} generating the powers p _{3} and q _{3}, and then stops identifying powers for Y and M to within one decimal digit.
Monotonic mediation analysis
 1.
Selecting the number k of folds.
The benchmark analysis is the generation of model (12) constrained so that p = q = 1, that is, with untransformed M and Y, by adaptively expanding the model for the means in X _{1} and X _{2} limiting the number of power transforms for each of these predictors to one. The number of folds to use in all subsequent analyses is the one generating the first local maximum in the standard LCV score over multiples of 5.
 2.
Selecting the powers p and q.
Use the search through alternative values for p and q based on poweradjusted LCV scores LCV(p,q) described earlier to generate the full model (12) without changing the CSH covariance structure. Use the selected powers p and q in subsequent analyses unless otherwise indicated.
 3.
Assessing the need for transforming M and Y.
Use a LCV ratio test to compare LCV(1,1) (the same as its standard LCV score) generated in Step 1 to LCV(p,q) generated in Step 2.
 4.
Assessing the need for transforming X.
For given values of p and q, use a LCV ratio test to compare LCV(p,q) as generated in Step 2, with X _{1} and X _{2} adaptively transformed, to the LCV(p,q) score for the associated model linear in X _{1} and X _{2}.
When the power q ' = 1, the instantaneous natural indirect effect \( \frac{dNIE(x)}{dx}= a\cdot b \) is constant. Whether the instantaneous natural indirect effect is reasonably treated as constant can be assessed with a LCV ratio test comparing a given model with q ' ≠ 1 to that model adjusted to satisfy q ' = 1.
 5.
Assessing mediation relationships.
In the linear mediation case of models (1)(3), mediation requires significantly nonzero slopes a and b. In the monotonic mediation context of model (12), these issues can be addressed with LCV ratio tests. To assess for a dependence of M ^{ q } on X in the component model (9), compare the LCV(p,q) score for the full model (12) to the associated model with X _{1} removed (or with a = 0) and the transform for X _{2} adaptively generated. To assess the dependence of Y ^{ p } on M ^{ q } in the component model (10) with p " = 1, compare the LCV(p,q) score for the full model (12) to the associated model with M ^{ q } removed (or with b = 0) and the transforms for X _{1} and X _{2} adaptively generated. It is also possible to assess the dependence of Y ^{ q } on \( X \) in the component model (10) by comparing the LCV(p,q) score for the full model (12) to the associated model with X _{2} removed (or with c ' = 0) and the transform for X _{1} adaptively generated.
 6.
Considering nonconstant variances.
With base model the adaptive model (12) with CSH covariance structure generated in Step 2 with initial powers p ' and q ' for X _{1} and X _{2}, adaptively expand and then contract the model for the variances in X _{1}, X _{2}, and M, allowing for possible adjustment of the powers p ' and q ' of the base model as part of the contraction, but leaving M ^{ q } in the base model untransformed (so that p " = 1).
 7.
Considering covariates.
Covariates can be considered for inclusion in the component models (9)(10) of model (12). With base model the adaptive model of Step 2, adaptively expand and then contract the model for the means and the variances in Z _{1} and Z _{2} for all covariates Z. As part of the contraction, allow for possible adjustment of the powers p ' and q ' of the base model, leave M ^{ q } in the base model untransformed (so that p " = 1), and only contract transforms of covariates from the model for the means. This can be combined with Step 6 to allow the variances to depend as well on X _{1}, X _{2}, and M.
 8.
Assessing ρ = 0.
For any model generated earlier, compare its LCV score using a LCV ratio test to the associated model constrained to satisfy ρ = 0.
 9.
Assessing p " = 1
With base model the adaptive model of Step 2, adaptively retransform the model allowing the powers of the single transforms of X _{1}, X _{2}, and M ^{q} to be changed to improve the LCV score. The assumption p " = 1 is supported if the associated model generated with p " = 1 in Step 2 is a competitive alternative to this latter model. Similar assessments can be conducted allowing for nonconstant variances as in Step 6, for covariates as in Step 7, and/or ρ = 0 as in Step 8.
 10.
Assessment of Model Assumptions
Yuan and MacKinnon [26] provide a detailed discussion of the impact of violations of the constant variances and normality assumptions of standard regression models. Modeling of variances can address the first problem. Data transformation, applied to outcomes and/or predictors, as considered here can sometimes resolve normality problems, but not always. Consequently, model assumption assessments are important to conduct in general regression contexts including the special case of mediation analyses, whether treated as linear or nonlinear. Should data transformation not resolve such problems, then quantile regression methods as described by Yuan and MacKinnon [26] are more appropriate to use. However, if data transformation resolves model assumption problems, then normalitybased methods are optimal [26] when applied to the transformed data and so would likely generate more efficient and powerful estimates.
In the case of mediation analyses, the use of bootstrapped CIs on indirect effects circumvents distributional assumption problems for parametric estimates of those effects. However, as demonstrated by the simulations of [26], bootstrap methods cannot fully address distributional assumption problems for the data. Moreover, bootstrapped CIs are likely to be relatively narrower, and so more precise, when data are transformed to be as close as possible to normal, than when untransformed. The example analyses reported later provide two examples supporting this conclusion. This, of course, assumes that the indirect effect has been consistently estimated so that the true value is in the confidence interval.
For composite model (12), the constant variances assumption can be addressed with a LCV ratio test comparing constant and nonconstant variances models to assess whether variances are reasonably close to constant or are distinctly nonconstant. If a sufficiently broad set of primary predictors for the variances are considered, the associated nonconstant variances model should provide an appropriate depiction of the variances, thereby relaxing the constant variances assumption if necessary. These variances estimates combined with the estimated correlation provide estimates Σ _{ s }(S) of the covariance matrices for observed outcome vectors \( {\boldsymbol{Y}}_{\boldsymbol{s}}^{\mathit{\hbox{'}}} \) for subjects with indexes s in the set S. Associated residual vectors are u _{ s }(S) = \( {\boldsymbol{Y}}_{\boldsymbol{s}}^{\mathit{\hbox{'}}} \) − μ _{ s }(S) where μ _{ s }(S) are estimated mean vectors for subjects \( s \) based on X _{1s }, X _{2s }, M _{s}, and possibly covariates. Associated standardized or scaled residuals are given by stdu _{ s }(S) = (V _{ s } ^{ T } (S))^{− 1} · u _{ s }(S) where V _{ s }(S) is the square root of Σ _{ s }(S) determined by its Cholesky decomposition. The combined standardized residuals over all s can be used to assess the normality assumption by visually checking for linearity in the associated normal (probability) plot and with the ShapiroWilk test for normality of the standardized residuals.
 11.
Assessing natural indirect effects
Once an appropriate choice for composite model (12) with p " = 1 has been identified, possibly including nonconstant variances and/or covariates, the assessment of whether the instantaneous natural indirect effect function \( \frac{dNIE(x)}{dx} \) for this model is nonzero needs to be assessed. This can be addressed with bootstrapped CIs [21, 29, 40, 41], but computed for a grid of possible values x for the predictor X. For models with a constant instantaneous natural indirect effect (i.e., with q ' = 1), only one value for X need be considered. The biascorrected version [42] is recommended by MacKinnon et al. [40], and so is used in example analyses unless otherwise indicated. All reported CIs are based on 1,000 resamples.
The powers p, q, p ', and q ' for a composite model (12) as well as powers for all covariate predictors and variance predictors are held fixed with associated slope parameters estimated for resamples of the composite data. The generated 95% CI for the instantaneous natural indirect effect function at each nonzero value of \( x \) has lower and upper boundsDefine the normalized width W of these intervals as$$ L(x)={b}_L\cdot {a}_L\cdot q\mathit{\hbox{'}}\cdot {x}^{q\mathit{\hbox{'}}1}< U(x)={b}_U\cdot {a}_U\cdot q\mathit{\hbox{'}}\cdot {x}^{q\mathit{\hbox{'}}1}. $$$$ W=\frac{ \max \left(\left L(x)\right,\left U(x)\right\right) \min \left(\left L(x)\right,\left U(x)\right\right)}{ \max \left(\left L(x)\right,\left U(x)\right\right)}=1\frac{ \min \left(\left{b}_L\cdot {a}_L\right,\left{b}_U\cdot {a}_U\right\right)}{ \max \left(\left{b}_L\cdot {a}_L\right,\left{b}_U\cdot {a}_U\right\right)}, $$which is constant in nonzero \( x \) with a value between 0 and 1. Models for the data generating smaller values for W provide more precise predictions of the instantaneous natural indirect effect function.
Moderated monotonic mediation
One of the covariates Z might be considered as a moderator. There are a variety of ways that models (9)(10) can be adjusted to accommodate moderation. For example, Preacher, Rucker, and Hayes [53] propose five alternatives. Under their fifth alternative, Z moderates the effect of X on M, X on Y, and M on Y. Under this alternative, models (9)(10) become possibly with other covariates included where is the set of all possible values z for Z and I(Z = z) the indicator for Z = z (i.e., it equals 1 when Z = z and 0 otherwise). The dependence of M on X and Y on X and M have been defined separately for the values z of Z taking an analysis of variance approach. This formulation allows associated intercepts, slopes, and powers to change with the values z of Z while preserving monotonicity. However, it requires estimation of model parameters for each value z of Z. This requirement is reasonable for moderators Z with discrete numbers of possible values, but problematic for continuous moderators Z with many possible values z but sparse numbers of observations for some values z. In this latter case, the moderator Z could be replaced by a split based on its tertiles, quartiles, etc.
Under (13)(14), the instantaneous natural direct effect \( {c}^{\mathit{\hbox{'}}}( z)\cdot {p}^{\mathit{\hbox{'}}}( z)\cdot {X}^{p^{\mathit{\hbox{'}}}(z)1} \) (assuming for simplicity that p ' (z) ≠ 0) changes with the values z of Z. When p " (z)≡1, the instantaneous natural indirect effect b(z) ⋅ a(z) ⋅ q′(z) ⋅ X ^{(q ' (z) − 1)} (assuming for simplicity that q′(z) ≠ 0) also changes with the values z of Z. The associated normalized widths W(z) change with z as well.
The individual moderation components of model (15) can be assessed by comparing model (15) to the model with each of those moderation component removed using LCV ratio tests. Specifically, moderation of the effect of X on M in model (13) can be assessed by replacing in (15) with β _{3} ⋅ X _{1} ^{ q '} . Moderation of the effect of X on Y in model (14) can be assessed by replacing in (15) with β _{4} ⋅ X _{2} ^{ p '} . Moderation of the effect of M on Y in model (14) can be assessed by replacing in (15) with β _{5} ⋅ M ^{ q }.
It is also possible to test effects for specific values z ' of Z. Specifically, the effect of X on M in model (13) for the value z′ of Z can be assessed by replacing in (15) with . The effect of X on Y in model (14) for the value\( z^{\prime } \) of Z can be assessed by replacing in (15) with . The effect of M on Y in model (14) for the value z′ of Z can be assessed by replacing in (15) with .
Data on family management of childhood chronic conditions
Example analyses are reported later using a subset of data from a crosssectional study on family management of childhood chronic conditions [54] reported by 187 partnered mothers. General family functioning is measured using the General Functioning Scale of the McMaster Family Assessment Device [55], coded so that larger values indicate better family functioning with range 1–4. Difficulty managing the child’s condition is measured by the family life difficulty scale of the Family Management Measure [54], coded so that larger scores mean more difficulty. This scale measures the extent to which having a child with a chronic condition makes family life more difficult. Child adaptation, in terms of the intensity of the child’s conductdisordered behavior, is measured using the Eyberg Child Behavior Inventory [56], coded so that larger values indicate better child adaptation or less conductdisordered behavior.
Example analyses use these family management data to demonstrate nonlinear mediation analyses by considering mediation of the impact of family functioning X on child adaptation Y by difficulty M in managing the condition. The cutoff for a substantial percent decrease for these data using models (12) and (15) with 374 = 2 ⋅ 187 measurements is 0.51%. Example analyses assume p " = 1 or p " (z) = 1 for all values z of a moderator Z unless otherwise stated.
The proposed mediation relationships for these observational data can be justified on the following basis. General family functioning would be developed by a family prior to the diagnosis of the child’s chronic condition, which would affect how difficult that chronic condition makes family life which would then affect the child’s adaption to the condition. However, the purpose of these analyses is to provide example mediation analyses not to establish mediation in this context.
Simulated mediation data
Results
Analyses at the beginning of this section consider mediation of the effect of family functioning on child adaptation to a childhood chronic condition by the difficulty in managing that condition. Intercept parameters for the means and variances are constrained in all analyses not to be removed as part of contractions. In computing LCV scores, Y ^{ p } and M ^{ q } measurements for the same mother are randomly assigned to the same fold. Analyses are also conducted at the end of this section using the simulated mediation data.
Selecting the number of folds
For model (12) applied to the family management data with p = q = 1 and CSH covariance structure, the adaptive model in X _{1} and X _{2} generates a first local maximum at k = 10 with \( 10 \)fold LCV score \( 0.015024 \) and selected powers p ' = −3 and q ' = 2. Using k = 5, the selected powers are p ' = 3 and q ' = 1.5 with \( 10 \)fold LCV \( 0.015018 \) and insubstantial percent decrease in the LCV score compared to the model selected with k = 10 of 0.04% (i.e., less than the cutoff of 0.51% for the data). Using k = 15, the selected powers are p ' =−1 and q ' = 1.5 with \( 10 \)fold LCV \( 0.015005 \) and insubstantial percent decrease in the LCV score of 0.13%. Consequently, the generated model is reasonably robust to the choice of the number of folds. Subsequent analyses all use k = 10 folds for computing standard as well as poweradjusted LCV scores.
Using\( 10 \) folds, the number of measurements per fold ranges from \( 26 \) to \( 56 \) for \( 13 \) to \( 28 \) mothers. Consequently, fold complements contain at least \( 318 \) or \( 85.0\% \) of the \( 374 \) available measurements, and so deleted parameter estimates should be reasonably reliable.
Selecting the powers \( p \) and \( q \) for \( Y \) and \( M \)
For model (12) with CSH covariance structure, the adaptive search for models in \( {\boldsymbol{X}}_1 \) and \( {\boldsymbol{X}}_2 \) with varying choices for \( p \) and \( q \), first selects the powers p = 1 and q = 0 (i.e., the log transform) over changes in these powers of ±0.5 with LCV(1, 0) = 0.015683 and then the powers p = 1.3 and q = 0 over changes in these powers of ±0.1 with LCV(1.3,0) = 0.015776. For untransformed Y and M with p = q = 1, LCV(1, 1) = 0.015024 (the same as its standard LCV score reported above), and so the percent decrease is substantial at \( 4.77\% \). Consequently, the mediation relationships are distinctly nonlinear in \( Y \) and \( M \). Model (12) for the standard linear mediation model with p = q = p ' = q ' = 1 has even smaller LCV(1,1) score 0.014958, and so is substantially improved upon by consideration of monotonicity.
Furthermore, the adaptively generated model with\( Y \) transformed and \( M \) held untransformed, that is, with q = 1, generates the powers p = 1.4, p ' = −5, and q ' = 2 with LCV(1.4,1) score 0.015143 and substantial percent decrease compared to the model with p = 1.3 and q = 0 of 4.01%. Consequently, relationships (9)(10) are distinctly nonlinear in \( M \). Moreover, the adaptively generated model with \( Y \) untransformed, that is, with p = 1, generates the powers q = 0, p ' = −2, and q ' = 2.4 with LCV(1,0) score 0.15683 and substantial percent decrease compared to the model with p = 1.3 and q = 0 of 0.59%. Hence, relationship (10) is distinctly nonlinear in Y.
Assessing ρ = 0
Using p = 1.3, q = 0, p ' = −2.5, and q ' = 3.5 as selected with the CSH covariance structure, the estimated correlation is −0.08. Rerunning this model with ρ = 0, the LCV(1.3,0) score is 0.015804. Using p = 1.3 and q = 0 as selected with CSH covariance structure, but with ρ = 0, the adaptively generated model (12) has somewhat different powers p ' = −3 and q ' = 3 and LCV(1.3,0) score that also rounds to 0.015804. Since this score is larger than the LCV(1.3,0) score under CSH, the omitted variables or errors U _{ M } and U _{ Y } are reasonably treated as independent. Using the values of p and q selected under CSH reduces the computations compared to identifying adaptive values for p and q under ρ = 0. However, in this case, the same powers p = 1.3 and q = 0 are adaptively identified with ρ = 0 starting from p = q = 1. An alternative approach would be to start the search for p and q with ρ = 0 at the values generated for CSH while searching over grids of \( \pm 0.1 \) to reduce the computations. This also generates the same solution \( p=1.3 \) and \( q=0 \).
For the case \( p= q=1 \) with selected powers \( {p}^{\hbox{'}}=3 \) and \( {q}^{\hbox{'}}=2 \), a similar result holds. The estimated correlation is \( 0.05 \) and the model with \( \rho =0 \) has LCV score \( 0.015043 \), larger than the score \( 0.015024 \) reported earlier for the associated model with CSH covariance structure. Subsequent analyses use \( \rho =0 \) since it provides an improvement in these two cases.
Assessing a = 0
The adaptively generated model with \( p=1.3 \), \( q=0 \), \( \rho =0 \), and constrained not to include a transform of \( {\boldsymbol{X}}_1 \) in the model for the means has model for the means depending on \( {\boldsymbol{X}}_2 \) transformed by the power p' = −3 with LCV(1.3,0) score \( 0.015077 \) and substantial percent decrease \( 4.60\% \) compared to the model also including a transform of \( {\boldsymbol{X}}_1. \) Consequently, transformed difficulty \( \log M \) in model (9) is reasonably assumed to depend substantially on family functioning \( X \).
Assessing \( b=0 \)
The adaptively generated model with \( p=1.3 \), \( q=0 \), \( \rho =0 \), and constrained not to include the transform \( {\boldsymbol{M}}^q \) in the model for the means has model for the means depending on \( {\boldsymbol{X}}_1 \) and \( {\boldsymbol{X}}_2 \) transformed by the powers p′ = −0.4 and q′ = 3, respectively, with LCV(1.3,0) and substantial percent decrease \( 1.07\% \) compared to the model also including the transform of M ^{ q }. Consequently, transformed adaptation \( {Y}^{1.3} \) in model (10) is reasonably assumed to depend substantially on transformed difficulty \( \log M \) corresponding to \( q=0 \).
Assessing c ' = 0
The adaptively generated model with \( p=1.3 \), \( q=0 \), \( \rho =0 \), and constrained not to include a transform of \( {\boldsymbol{X}}_2 \) in the model for the means has model for the means depending on \( {\boldsymbol{X}}_1 \) transformed by the power q' = 3 with LCV(1.3,0) and substantial percent decrease \( 0.57\% \) compared to the model also including a transform of \( {\boldsymbol{X}}_2 \). Consequently, in model (10), transformed adaptation \( {Y}^{1.3} \) is reasonably assumed to depend substantially on family functioning \( X \), and so the instantaneous natural direct effect is substantial.
Assessing p " = 1
Using the model (12) selected with \( p=1.3 \), \( q=0 \), and \( \rho =0 \) (i.e., with p' = −3 and q' = 3), an adaptive transformation of this model allowing for p" ≠ 1 (and also possible adjustments to p' and q') is the same model generated with p" = 1. Consequently, the simplifying assumption p" = 1 is reasonable in this case.
Assessing nonconstant variances
As described earlier, adaptive models can be generated allowing for variances of the omitted factors or errors \( {U}_M \) and \( {U}_Y \) to depend on transforms of \( {\boldsymbol{X}}_1 \), \( {\boldsymbol{X}}_2 \), and \( \boldsymbol{M} \). Starting from the adaptively generated model for \( p=1.3 \) and \( q=0 \) with \( \rho =0 \) (i.e., with p' = −3 and q' = 3), the expansion adds in transforms of these predictors to the model for the variances, but the contraction removes all of them, leaving the base constant variances model. This result indicates that the variances are reasonably treated as constant in \( {\boldsymbol{X}}_1 \), \( {\boldsymbol{X}}_2 \), and \( \boldsymbol{M} \).
Also considering a covariate
The family management study enrolled parents of children with a variety of chronic conditions. One childhood chronic condition type was Crohn’s disease or a bowel disorder with \( 54 \) (\( 28.9\% \)) children having this condition. The indicator \( Z \) for having this condition can be considered as a possible covariate for models (9)(10). Adaptive modeling starts with the adaptively generated model for \( p=1.3 \), \( q=0 \), and \( \rho =0 \) (i.e., with p' = −3 and q' = 3) along with constant variances (and so based on \( {\boldsymbol{I}}_1 \) and \( {\boldsymbol{I}}_2 \)). The model for the means is expanded considering the indicators \( {\boldsymbol{Z}}_1 \) and \( {\boldsymbol{Z}}_2 \) (as defined similarly to \( {\boldsymbol{X}}_1 \) and \( {\boldsymbol{X}}_2 \)) while the model for the variances is expanded considering arbitrary transforms of \( {\boldsymbol{X}}_1 \), \( {\boldsymbol{X}}_2 \), and \( \boldsymbol{M} \) along with \( {\boldsymbol{Z}}_1 \) and \( {\boldsymbol{Z}}_2 \). The contraction is constrained so that \( {\boldsymbol{M}}^q \) is not retransformed in the model for the means (so \( {p}^{\hbox{'}\hbox{'}}=1 \)) while \( {\boldsymbol{X}}_1 \) and \( {\boldsymbol{X}}_2 \) are not removed from the model for the means, but associated powers are allowed to be changed.
The generated model has the same powers p' = −3 and q' = 3 as without consideration of \( {\boldsymbol{Z}}_1 \) and \( {\boldsymbol{Z}}_2 \) along with the covariate \( {\boldsymbol{Z}}_2 \) added to both models for the means and the variances and no transforms of \( {\boldsymbol{X}}_1 \), \( {\boldsymbol{X}}_2 \), and \( \boldsymbol{M} \) in the model for the variances. The LCV(1.3,0) score is \( 0.016062 \), which is a substantial improvement on the score \( 0.015804 \) for the associated model not considering covariates with percent decrease \( 1.61\% \). Consequently, the indicator for having Crohn’s disease or a bowel disorder substantially influences the means and variances for model (10), but not for model (9) (since only \( {\boldsymbol{Z}}_2 \) is included in the generated model and not \( {\boldsymbol{Z}}_1 \)).
Model assumptions
Results for the selected model
Estimated instantaneous total, natural direct, natural indirect, and relative natural indirect effects for the monotonic mediation model for child adaptation as a function of family functioning as mediated by difficulty controlling for having Crohn’s disease or a bowel disorder with independent omitted factors or errors
Family functioning  Instantaneous total effect  Instantaneous natural direct effect  Instantaneous natural indirect effect  Relative instantaneous natural indirect effect 

1  3092.9  3087.9  4.9  0.002 
2  212.6  193.0  19.6  0.092 
3  82.3  38.1  44.1  0.536 
4  90.5  12.1  78.5  0.867 
Bootstrapped CIs
The standard linear mediation model (i.e., with p = q = p' = q' = 1 and \( \rho =0 \)) generates the same bootstrapped CI of \( 2.00 \) – \( 11.96 \) for the instantaneous natural indirect effect at each value of family functioning. The normalized width \( W \) for this CI is \( 0.83 \).
Biascorrected bootstrapped \( 95\% \) confidence intervals for the monotonic mediation model for child adaptation as a function of family functioning as mediated by difficulty controlling for having Crohn’s disease or a bowel disorder with independent omitted factors or errors
Family functioning x  Lower bound \( L(x) \) on natural indirect effect^{a}  Upper bound \( U(x) \) on natural indirect effect^{a}  Normalized width \( W \) of the confidence interval^{b} 

1  2.28  8.19  0.72 
2  9.14  32.76  0.72 
3  20.56  73.72  0.72 
4  36.55  130.05  0.72 
The bootstrapped CI of \( 2.00 \) – \( 11.96 \) for the linear moderation model (as reported earlier) overlaps with the bootstrapped CIs for \( x=1 \) and \( x=2 \), but is entirely below the bootstrapped CIs for \( x=3 \) and \( x=4 \), suggesting that the linear moderation model in this case generates biased estimates of natural indirect effects for larger values of family functioning.
Moderated monotonic mediation
Another childhood chronic condition type for the family management study was diabetes with \( 51 \) (\( 27.3\% \)) children having this condition. The indicator \( Z \) for having this condition can be considered as a possible moderator for models (13)(15). As before, the indicator for the child having Crohn’s disease or a bowel disorder is considered as a covariate to include in both the models for the means and the variances as well as the model for the variances to depend on transforms of \( {\boldsymbol{X}}_1 \), \( {\boldsymbol{X}}_2 \), and \( \boldsymbol{M} \).
The generated model for the case of children with a chronic condition other than diabetes (\( Z=0 \)) has powers p'(0) = −1.7 and q'(0) = 6. The generated model for the case of children with diabetes (\( Z=1 \)) has the power q′(1) = 0.5 with no transform of family functioning (so a missing p′(1) power). As before the covariate Crohn’s disease or a bowel disorder is included in the model for the means and variances, but only in the component of model (15) addressing the outcome variable, that is, submodel (14). The LCV(1.3,0) score is \( 0.016215 \), which is a substantial improvement on the score \( 0.016062 \) for the associated model not considering moderation with percent decrease \( 0.94\%. \) Consequently there is distinct moderated mediation.
The linear moderated mediation model (i.e., with p = q = 1, p'(0) = q(0) = 1, p'(1) = q(1) = 1, and \( \rho =0 \)) allowing for effects of the covariate having Crohn’s disease or a bowel disorder on the means and variances as well as possibly nonlinear effects of \( {\boldsymbol{X}}_1 \), \( {\boldsymbol{X}}_2 \), and \( \boldsymbol{M} \) on the variances has means and variances depending on the covariate having Crohn’s disease of a bowel disorder for only the submodel (14) as for the moderated monotonic mediation model. Its LCV(1,1) score is \( 0.015460 \) with substantial percent decrease \( 4.66\% \). Consequently, the moderated mediation is distinctly nonlinear.
Model (15) adjusted to remove moderation of the effect of transformed \( X \) on transformed \( M \) in submodel (13) has LCV(1.3,0) score \( 0.016112 \) with substantial percent decrease \( 0.64\% \) compared to the full model (15). Moreover, model (15) adjusted to remove moderation of the effect of transformed \( X \) on transformed \( Y \) in submodel (14) has LCV(1.3,0) score \( 0.016112 \) (same score as above but a different model) with substantial percent decrease \( 0.64\% \) compared to the full model (15). Consequently, there is distinct moderation of both effects of the predictor \( X \) in model (15).
On the other hand, model (15) adjusted to remove moderation of the effect of transformed M on transformed Y in submodel (14) has LCV(1.3,0) score \( 0.016155 \) with insubstantial percent decrease \( 0.37\% \) compared to the full model (15). Consequently, this is a parsimonious, competitive alternative model, and the effect of transformed \( M \) on transformed \( Y \) is reasonably considered not to be moderated by having diabetes.
Model (15) with the term β _{3}(0) ⋅ X _{1} ^{ q ' (0)} ⋅ I(Z = 0) removed has powers q ' (1) = 0.5 and p ' (0) = −1 with LCV(1.3,0) score \( 0.015846 \) and substantial percent decrease \( 2.28\% \). Thus, there is a distinct effect of transformed \( X \) on transformed \( M \) for children with a chronic condition other than diabetes in submodel (13). With the term β _{5}(0) ⋅ M ^{ q } ⋅ I(Z = 0) removed, the model has powers q'(0) = 6, q ' (1) = 0.5, and p ' (0) = 0.5 with LCV(1.3,0) score \( 0.015949 \) and substantial percent decrease \( 1.64\% \). Thus, there is a distinct effect of transformed \( M \) on transformed \( Y \) for children with a chronic condition other than diabetes in submodel (14).
Model (15) with the term β _{3}(1) ⋅ X _{1} ^{ q ' (1)} ⋅ I(Z = 1) removed has powers q'(0) = 6 and p'(0) = −1 with LCV(1.3,0) score \( 0.015674 \) and substantial percent decrease \( 3.34\% \). Thus, there is a distinct effect of transformed \( X \) on transformed \( M \) for children with diabetes in submodel (13). On the other hand, with the term β _{5}(1) ⋅ M ^{ q } ⋅ I(Z = 1) removed, the model has powers q'(0) = 6, q'(1) = 0.5, and p'(0) = −1 with LCV(1.3,0) score \( 0.016137 \) and insubstantial percent decrease \( 0.48\% \). Thus, the effect of transformed \( M \) on transformed \( Y \) for children with diabetes in submodel (14) is not distinct. This latter result indicates that the effect of transformed \( X \) on transformed \( Y \) is not distinctly mediated by transformed \( M \) for children with diabetes.
With the term \( {\beta}_4(0)\cdot {\boldsymbol{X}}_2^{p^{\hbox{'}}(0)}\cdot I\left( Z=0\right) \) removed, the model has powers q'(0) = 6 and q'(1) = 0.5 with LCV(1.3,0) score \( 0.016058 \) and substantial percent decrease \( 0.97\% \). This result indicates that there is a distinct effect of transformed \( X \) on transformed \( Y \) for children with a chronic condition other than diabetes in submodel (14). On the other hand, the fact that \( {\beta}_4(1)\cdot {\boldsymbol{X}}_2^{p^{\hbox{'}}(1)}\cdot I\left( Z=0\right) \) is not in the generated model indicates that the effect of transformed \( X \) on transformed \( Y \) for children with diabetes in submodel (14) is not distinct.
Estimated instantaneous total, natural direct, natural indirect, and relative natural indirect effects for the monotonic mediation model for child adaptation as a function of family functioning as mediated by difficulty controlling for having Crohn’s disease or a bowel disorder and as moderated by having diabetes with independent omitted factors or errors
Diabetes^{a}  Family functioning  Instantaneous total effect  Instantaneous natural direct effect  Instantaneous natural indirect effect  Relative instantaneous natural indirect effect 

no  1  931.5  931.4  0.1  <0.001 
no  2  235.8  232.8  3.0  0.013 
no  3  126.3  103.5  22.8  0.181 
no  4  154.4  58.2  96.1  0.623 
Biascorrected bootstrapped \( 95\% \) confidence intervals for the monotonic mediation model for child adaptation as a function of family functioning as mediated by difficulty controlling for having Crohn’s disease or a bowel disorder and as moderated by having diabetes with independent omitted factors or errors
Diabetes^{a}  Family functioning x  Lower bound \( L(x) \) on natural indirect effect^{b}  Upper bound \( U(x) \) on natural indirect effect^{b}  Normalized width W of the confidence interval^{c} 

no  1  0.029  0.18  0.84 
no  2  0.93  5.72  0.84 
no  3  7.09  43.43  0.84 
no  4  29.88  183.00  0.84 
In summary, moderated mediation for the family management data is distinctly nonlinearly monotonic. However, mediation only occurs for mothers of children with a chronic condition other than diabetes and not for mothers of children with diabetes.
Example analyses of the simulated mediation data
Using the CSH covariance structure with \( p= q=1 \), the first local maximum in the LCV score occurs at \( k=15 \) with LCV(1,1) \( =0.41542 \), and so \( k=15 \) folds are used to compute subsequent LCV scores for the simulated data. The generated CSH model allowing for arbitrary \( p \) and \( q \) has \( p=0 \) and \( q=0.3 \) with distinctly improved \( \mathrm{L}\mathrm{C}\mathrm{V}\left(0,0.3\right)=0.93577 \) (i.e., the percent decrease for the \( p= q=1 \) model is 55.6%, much larger than the cutoff of 0.95% for the data). Also, the estimated correlation for this model is the substantial value \( 0.92 \), suggesting highly dependent omitted factors or errors \( {U}_M \) and \( {U}_Y. \) However, the generated model allowing for arbitrary \( p \) and \( q \) with uncorrelated omitted factors or errors (i.e.., with \( \rho =0 \)), has \( p=0.4 \) and \( q=1 \) (i.e., the true values for these powers) with distinctly improved LCV(0.4,1) \( =1.08555 \) (i.e., the percent decrease for the model with \( p=0 \) and \( q=0.3 \) is 13.8%), indicating that the omitted factors or errors are reasonably treated as independent as simulated. Under this model, p' = 1.1 and q' = 1.2, close to their simulated values of p' = q' = 1.
The constant instantaneous indirect effect model associated with this latter model (i.e., with q′ = 1) has LCV(0.4,1) \( =1.08074 \) and insubstantial percent decrease \( 0.44\% \). Consequently, the instantaneous natural indirect effect is reasonably considered to be constant as simulated. Also, the true model as simulated (i.e., also setting \( {p}^{\hbox{'}}=1\Big) \), has LCV(0.4,1) \( =1.07928 \) and insubstantial percent decrease \( 0.58\% \), indicating that the true model is a competitive alternative for the simulated data.
Using the constant instantaneous natural indirect effect model (i.e., with \( p=0.4 \), \( q=1 \), p' = 1.1, and q' = 1), the \( 95\% \) bootstrapped CI with bias correction for that effect is \( 0.0358 \) – \( 0.0563 \), and so contains the true constant instantaneous natural indirect effect \( 0.04 \). The associated \( 95\% \) bootstrapped CI without bias correction is \( 0.0363 \) – \( 0.0567 \), and so is quite similar, suggesting that bias correction has not inflated the Type I error in this case.
Using the standard linear mediation model (i.e., with \( p=1 \), \( q=1 \), p' = 1, q' = 1, and \( \rho =0 \)), the \( 95\% \) bootstrapped CI with bias correction for the constant instantaneous indirect effect is \( 0.0480 \) – \( 0.0974 \), and so does not contain the true constant instantaneous natural indirect effect \( 0.04 \). The associated \( 95\% \) bootstrapped CI without bias correction is \( 0.0481 \) – \( 0.0986 \), and so is quite similar and also does not contain the true value.
Discussion
We formulated and demonstrated an approach for conducting possibly moderated monotonic mediation analyses based on adaptively selected fractional polynomial models. This formulation considers transformation of outcomes, predictors, and mediators, not just predictors and mediators as previously considered [29–32]. Results of the example analyses of the family management data indicated that transformation of positive valued continuous outcomes can provide distinct improvements over leaving those outcomes untransformed and can resolve problems with model assumptions.
Other nonlinear regression methods could have been used instead to estimate relationships. An advantage of fractional polynomial models is that they are based on linear regression models and so have no more limitations, assumptions, and requirements than models used in linear mediation analyses. Associated derivatives are also readily computed as needed for estimating monotonic instantaneous natural indirect, natural direct, and total effects.
The example analyses used likelihood crossvalidation (LCV) for evaluating models, both unadjusted and adjusted for transformation of the outcome. By assessing how a model performs on randomly selected subsets, model evaluation using LCV scores is robust to effects of chance variation compared to using likelihoods based on the complete data. LCV ratio tests, generalizing likelihood ratio tests, can be used to assess whether mediation relationships are constant, linear, or nonlinear as well as a variety of other issues.
The example analyses of the family management data demonstrated the need to address nonlinearity in the context of mediation. Relationships considered in mediation analyses can be distinctly nonlinear. Even when mediation relationships are reasonably treated as linear, consideration of nonlinear alternatives is needed to determine that this assumption holds. It is conventional to treat mediation relationships as linear without checking this assumption. However, like any assumption, it should be checked.
While the example analyses of the family management data demonstrated that distinct nonlinear monotonic mediation can be identified using composite model (12), linear mediation also held. However, the normality assumption was questionable for the linear mediation case and also the constant variances assumption. There are likely to be data sets where mediation can be identified only by consideration of monotonicity. However, there are also likely to be data sets where consideration of monotonicity does not resolve problems related to the normality and constant variances assumptions. Quantile regression methods [26] can be used in such cases.
The example data analyses of the family management data also demonstrated that standard linear mediation analyses can provide a misleading impression that the natural indirect effect is constant in the predictor \( X \) when the indirect effect can actually vary quite a bit from this constant value. Moreover, the single \( 95\% \) bootstrapped CI generated by the linear moderation analysis can be less precise than the CIs generated by a monotonic mediation analysis and can even not overlap for some values of the predictor \( X \), suggesting that a linear approach can generate biased indirect effects. Furthermore, the example analyses of the family management data demonstrated that moderated mediation analyses are important to consider because mediation may be weaker for some subpopulations than others and even not hold for some subpopulations (e.g., mothers of children with diabetes compared to mothers of children with other chronic conditions).
The example analyses of the simulated mediation data support the effectiveness of adaptive mediation modeling since the true model was a competitive alternative to the adaptively selected model and since bootstrapped \( 95\% \) CIs for the constant instantaneous indirect effect contained the true value. However, these analyses also demonstrate that allowing for correlated omitted factors or errors when in fact they are independent can generate models quite different from the true model, indicating the importance of conducting analyses of both cases. These analyses also demonstrate that conducting a standard linear mediation analysis when in fact the relationship (9) is nonlinear in the outcome (i.e., \( p\ne 1 \)) but when the true instantaneous natural indirect effect is constant (i.e., \( q^{\prime }=1 \)) can result in a biased \( 95\% \) CI for that constant effect. This is also likely to hold when relationships (8)(9) are nonlinear in the mediator (i.e., \( q\ne 1 \)).
Limitations
The example mediation analyses of the family management data were limited since they were based on crosssectional data, and so the timing of measurements for predictors, mediators, and outcomes could not be controlled to reflect precedence as needed to support causality [11, 44]. These analyses were also limited by the absence of control over the predictor variable given the nonexperimental design of the study [21]. However, the primary purpose of those analyses was to demonstrate nonlinear mediation and the need for such analyses. This purpose was effectively achieved by the example analyses. Further work, though, is needed to address monotonic mediation in situations where the timing of measurement of variables has been controlled and where the predictor is experimentally controlled. An advantage of an experimentally controlled predictor is that it would be categorical and hence not require transformation. However, the outcome \( Y \) and the mediator \( M \) might benefit from transformation. There is also a need to replicate these analyses using a wider variety of data sets.
Nonlinearity was addressed here for mediation involving univariate outcomes. However, mediation is also conducted using repeated measurements, either over clusters or longitudinally over time, analyzed with multilevel modeling, linear mixed modeling, or structural equation modeling [6, 13, 57–61]. However, linear relationships are usually assumed in these analyses. Further research is needed to extend monotonic mediation to these cases. The example analyses used a continuous outcome and mediator, but outcomes and mediators can sometimes be categorical [13]. Further research is needed to extend monotonic mediation to address categorical mediators and/or outcomes. The analyses also only addressed the case with a single mediator. An extension to monotonic mediation is needed that accounts for multiple mediators.
Conclusions
Mediation relationships are commonly assumed to be linear without assessing the validity of this assumption. Reported example analyses demonstrate that mediation relationships can be nonlinear. Moreover, standard linear mediation analyses can generate models that violate model assumptions and generate biased estimates of indirect effects, but this can in some cases be resolved through more general monotonic mediation analyses. Adaptive methods as extended here to the monotonic mediation and moderated monotonic mediation contexts can effectively account for nonlinearity in mediation relationships. The advantage of restricting to monotonic relationships is that no adjustments are needed to underlying theory about directionality between changes in pairs of variables.
Abbreviations
 CI:

Confidence interval
 CSH:

Compound symmetry heterogeneous
 LCV:

Likelihood crossvalidation
Declarations
Acknowledgements
Not Applicable.
Funding
The collection of the family management data used in the example analyses was funded by the National Institute of Nursing Research, National Institutes of Health (R01 NR08048).
Availability of data and materials
The family management data used in example analyses are available from Yale University but restrictions apply to the availability of these data, which were used under a data use agreement for the current study, and so are not publicly available. These data are, however, available from the corresponding author upon reasonable request and with permission of Yale University. The simulated mediation data are available at [62].
SAS macros have been developed to support adaptive (moderated) monotonic mediation analyses. Models for means and variances for fixed values of \( p \) and \( q \) are generated with the genreg (for general regression) macro. Adaptive composite models (12) and (15) are generated using the compmed (for composite mediation) macro. Bootstrapped \( 95\% \) CIs, either biascorrected or not, are also generated using this macro, generalizing the macro of [41], over a grid of possible values for the predictor \( X \) between its lowest and highest observed values and over alternative values for a moderator \( Z \) if included. These macros as well as code for generating example analyses are available at [62].
Authors’ contributions
The family management data used in the example analyses were collected as part of an instrument development project [54]. All authors were involved in the development and conduct of this project including data collection and data analysis. For the current project, GJK developed the methods, generated the simulated data and the results, and wrote the initial description of those methods, data, and results. All authors reviewed drafts of the manuscript, rewrote sections, offered critical comments, and approved the final version.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
The Institutional Review Board of the University of North Carolina at Chapel Hill approved secondary analyses of the family management data used in example analyses.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Baron RM, Kenny DA. The moderatormediator variable distinction in social psychology research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol. 1986;51:1173–82.View ArticlePubMedGoogle Scholar
 James LR, Brett JM. Mediators, moderators, and tests for mediation. J Appl Psychol. 1984;69:307–21.View ArticleGoogle Scholar
 Judd CM, Kenny DA. Process analysis: estimating mediation in treatment evaluation. Eval Rev. 1981;5:602–19.View ArticleGoogle Scholar
 Biesanz JC, Falk CF, Salavei V. Assessing mediational models: testing and interval estimation for indirect effects. Multivariate Behav Res. 2010;45:661–701.View ArticlePubMedGoogle Scholar
 Cheung GW, Lau RS. Testing mediation and suppression effects of latent variables: bootstrapping with structural equation models. Organ Res Meth. 2008;11:296–325.View ArticleGoogle Scholar
 Cole DA, Maxwell SE. Testing mediational models with longitudinal data: questions and tips on the use of structural equation modeling. J Abnorm Psychol. 2003;112:558–77.View ArticlePubMedGoogle Scholar
 Edwards JR, Lambert LS. Methods for integrating moderation and mediation: a general analytical framework using moderated path analysis. Psychol Methods. 2007;12:1–22.View ArticlePubMedGoogle Scholar
 Frazier PA, Tix AP, Barron KE. Testing moderator and mediator effects in counseling psychology. J Couns Psychol. 2004;51:113–34.Google Scholar
 Gu F, Preacher KJ, Ferrer E. A state space modeling approach to mediation analysis. J Educ Behav Stat. 2014;39:117–43.View ArticleGoogle Scholar
 Kraemer HC. Toward nonparametric and clinically meaningful moderators and mediators. Stat Med. 2008;27:1679–92.View ArticlePubMedGoogle Scholar
 Kraemer HC, Kiernan M, Essex M, Kupfer DJ. How and why criteria defining moderators and mediators differ between the Baron & Kenny and MacArthur approaches. Health Psychol. 2008;27:S101–8.View ArticlePubMedPubMed CentralGoogle Scholar
 Lepage B, Dedieu D, Savy N, Lang T. Estimating controlled direct effects of intermediate confounding of the mediatoroutcome relationship: comparison of five different methods. Stat Methods in Med Res. 2012;21:1–18.Google Scholar
 MacKinnon DP, Fairchild AJ, Fritz MS. Mediation analysis. Annu Rev Psychol. 2007;58:593–614.View ArticlePubMedPubMed CentralGoogle Scholar
 MacKinnon DP, Lockwood CM, Hoffman JM, West SG, Sheets V. A comparison of methods to test mediation and other intervening variable effects. Psychol Methods. 2002;7:83–104.View ArticlePubMedPubMed CentralGoogle Scholar
 Matthieu JE, Taylor SR. Clarifying conditions and decision points for mediational type inferences in organizational behavior. J Organ Behav. 2006;27:1031–56.View ArticleGoogle Scholar
 Muller D, Judd CM, Yzerbyt VY. When moderation is mediated and mediation moderated. J Pers Soc Psychol. 2005;89:852–63.View ArticlePubMedGoogle Scholar
 Muller D, Yzerbyt VY, Judd CM. Adjusting for a mediator with two crossed treatment variables. Organ Res Meth. 2008;11:224–40.View ArticleGoogle Scholar
 Nuitgen MB, Wetzels R, Matzke D, Dolan CV, Wagenmakers EJ. A default Bayesian hypothesis test for mediation. Behavioral Res Meth. 2015;47:85–97.View ArticleGoogle Scholar
 Pratschke J, Haase T, Comber H, Sharp L, de Camrgo CM, Johnson H. BMC Med Res Methodol. 2016;16:27. doi:10.1186/s1287401601306.View ArticlePubMedPubMed CentralGoogle Scholar
 Rochon J, du Bois A, Lange T. Mediation analysis of the relationship between institutional research activity and patient survival. BMC Med Res Methodol. 2014;14:9. doi:10.1186/14712288149.View ArticlePubMedPubMed CentralGoogle Scholar
 Shrout PE, Bolger N. Mediation in experimental and nonexperimental studies: new procedures and recommendations. Psychol Methods. 2002;7:422–45.View ArticlePubMedGoogle Scholar
 Taylor AB, MacKinnon DP, Tein JY. Tests of the threepath mediated effect. Organ Res Meth. 2008;11:241–69.View ArticleGoogle Scholar
 Valeri L, VanderWeele T. Mediation analysis allowing for exposure–mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol Methods. 2013;18:137–50.View ArticlePubMedPubMed CentralGoogle Scholar
 Vanderweele TJ. Mediation analysis with multiple versions of the mediator. Epidemiol. 2012;23:454–63.View ArticleGoogle Scholar
 Yuan Y, MacKinnon DP. Bayesian mediation analysis. Psychol Methods. 2009;14:301–22.View ArticlePubMedPubMed CentralGoogle Scholar
 Yuan Y, MacKinnon DP. Robust mediation analysis based on median regression. Psychol Methods. 2014;19:1–20.View ArticlePubMedGoogle Scholar
 Zhao X, Lynch Jr JG, Chen Q. Reconsidering Baron and Kenny: myths and truths about mediation analysis. J Cons Res. 2011;37:197–206.View ArticleGoogle Scholar
 Zu J, Yuan KH. Local influence and robust procedures for mediation analysis. Multivariate Behav Res. 2010;45:1–44.View ArticlePubMedGoogle Scholar
 Hayes AF, Preacher KJ. Quantifying and testing indirect effects in simple mediation models when the constituent paths are nonlinear. Multivariate Behav Res. 2010;45:627–60.View ArticlePubMedGoogle Scholar
 Pearl J. The foundations of causal inference. Sociol Methodol. 2010;40:75–149.View ArticleGoogle Scholar
 Pearl J. A general approach to causal mediation analysis. Psychol Methods. 2014;15:309–34.Google Scholar
 Stolzenberg RM. The measurement and decomposition of causal effects in nonlinear and nonadditive models. Sociol Methodol. 1980;11:459–88.View ArticleGoogle Scholar
 Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat. 1994;43:429–67.View ArticleGoogle Scholar
 Knafl GJ, Ding K. Adaptive regression for modeling nonlinear relationships. Switzerland: Springer International Publishing; 2016.View ArticleGoogle Scholar
 Knafl GJ, Fennie KP, Bova C, Dieckhaus K, Williams AB. Electronic monitoring device event modelling on an individualsubject basis using adaptive Poisson regression. Stat Med. 2004;23:783–801.View ArticlePubMedGoogle Scholar
 Knafl GJ, Delucchi KL, Bova CA, Fennie KP, Williams AB. A systematic approach for analyzing electronically monitored adherence data. In: Ekwall B, Cronquist M, editors. Micro electro mechanical systems (MEMS) technology, fabrication processes and applications, (Chapter 1, pp. 1–66). Hauppauge: Nova Science Publishers; 2010. https://www.novapublishers.com/catalog/product_info.php?products_id=19133. Accessed 7 Dec 2016.Google Scholar
 Knafl GJ. A SAS macro for adaptive regression modeling. In: Proceedings SAS global forum 2009; 2009. http://support.sas.com/resources/papers/proceedings09/1102009.pdf. Accessed 7 Dec 2016.
 MacKinnon DP, Krull JL, Lockwood CM. Equivalence of mediation, confounding, and suppression. Prev Sci. 2000;1:173–81.View ArticlePubMedPubMed CentralGoogle Scholar
 Sobel ME. Asymptotic confidence intervals for indirect effects in structural equation models. In: Leinhart S, editor. Sociological methodology, San Francisco: JosseyBass. 1982. p. 290–312.Google Scholar
 MacKinnon DP, Lockwood CM, Williams J. Confidence limits for the indirect effect: distribution of the product and resampling methods. Multivariate Behav Res. 2004;39:99–128.View ArticlePubMedPubMed CentralGoogle Scholar
 Preacher KJ, Hayes AF. SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behav Res Methods Instrum Comput. 2004;36:717–31.View ArticlePubMedGoogle Scholar
 Efron B. The jackknife, the bootstrap, and other resampling plans, CBMS 38, SIAMNSF. Philadelphia: Society for Industrial and Applied Mathematics; 1982.View ArticleGoogle Scholar
 Fritz MS, Taylor AB, MacKinnon DP. Explanation of two results in statistical mediation analysis. Multivariate Behav Res. 2012;47:61–87.View ArticlePubMedPubMed CentralGoogle Scholar
 MacKinnon DP, Luecken LJ. How and for whom? mediation and moderation in psychology. Health Psychol. 2008;27:S99–100.View ArticlePubMedPubMed CentralGoogle Scholar
 VanderWeele TJ, Vansteelandt S. Conceptual issues concerning mediation, interventions and composition. Stat Interface. 2009;2:457–68.View ArticleGoogle Scholar
 Box GEP, Tidwell PW. Transformation of the independent variables. Technometrics. 1962;4:531–50.View ArticleGoogle Scholar
 McCullagh P, Nelder JA. Generalized linear models. 2nd ed. Chapman & Hall/CRC: Boca Raton, FL; 1999.Google Scholar
 Knafl GJ, Riegel B. What puts heart failure patients at risk for poor medication adherence? Patient Prefer Adher. 2014;8:1007–18.Google Scholar
 Meghani SH, Knafl GJ. Patterns of analgesic adherence predict health care utilization among outpatients with cancer pain. Patient Prefer Adher. 2016;10:81–98.View ArticleGoogle Scholar
 Riegel B, Knafl GJ. Electronically monitored medication adherence predicts hospitalization in heart failure patients. Patient Prefer Adher. 2014;8:1–13.Google Scholar
 Kohavi R. A study of crossvalidation and bootstrap for accuracy estimation and model selection. In: Mellish CS, editor. Proceedings of the 14^{th} international joint conference on artificial intelligence. San Francisco: Morgan Kaufman; 1995. p. 1137–43.Google Scholar
 Knafl GJ, Grey M. Factor analysis model evaluation through likelihood crossvalidation. Stat Methods in Med Res. 2007;16:77–102.View ArticleGoogle Scholar
 Preacher KJ, Rucker DD, Hayes AF. Addressing moderated mediation hypotheses: theory, methods, and prescriptions. Multivariate Behav Res. 2007;42:185–227.View ArticlePubMedGoogle Scholar
 Knafl K, Deatrick JA, Gallo A, Dixon JK, Grey M, Knafl GJ, O’Malley JP. Development and testing of the Family Management Measure. J Pediatr Psychol. 2011;36:494–505.View ArticlePubMedGoogle Scholar
 Epstein N, Baldwin L, Bishop D. The McMaster Family Assessment Device. J Marital Fam Ther. 1983;9:171–80.View ArticleGoogle Scholar
 Eyberg S, Robinson E. Conduct problem behavior: standardization of a behavior rating scale with adolescents. J Clinl Child Psychol. 1983;12:347–54.Google Scholar
 Judd CM, Kenny DA, McCelland GH. Estimating and testing mediation in withinsubject designs. Psychol Methods. 2001;6:115–34.View ArticlePubMedGoogle Scholar
 Kenny DA, Korchmaros JD, Bolger N. Lower level mediation in multilevel models. Psychol Methods. 2003;8:115–28.View ArticlePubMedGoogle Scholar
 MacKinnon DP. Introduction to statistical mediation analysis. New York: Lawrence Erlbaum; 2008.Google Scholar
 Kenny DA, Kashy DA, Bolger N. Data analysis in social psychology. In: Gilbert S, Fiske T, Lindsay D, editors. Handbook of social psychology. 4th ed. New York: McGraw Hill; 1998. p. 115–28.Google Scholar
 Preacher KJ, Hayes AF. Contemporary approaches to assessing mediation in communication research. In: Hayes A, Slater MD, Snyder LB, editors. The SAGE sourcebook of advanced data analysis methods for communication research. Thousand Oaks, CA: Sage; 2008. p. 13–54.View ArticleGoogle Scholar
 Knafl G. Analyzing mediation data. 2016. http://www.unc.edu/~gknafl/mediation.html. Accessed 7 Dec 2016.