 Research
 Open access
 Published:
Adjusting for outcome risk factors in immigrant datasets: total or direct effects?
BMC Medical Research Methodology volume 23, Article number: 37 (2023)
Abstract
Background
When quantifying differences in health outcomes between immigrants and nonimmigrants, it is common practice to adjust for observed differences in outcome risk factors between the groups being compared. However, as some of these outcome risk factors may act as mediators on the causal path between the exposure and outcome, adjusting for these may remove effects of factors that characterize the immigrants rather than removing a bias between immigrants and nonimmigrants.
Methods
This study investigates the underlying conditions for which adjusting for outcome risk factors in regression models can lead to the estimation of either total or direct effect for the difference in health outcomes between immigrants and nonimmigrants. For this investigation, we use modern tools in causal inference to construct causal models that we believe are highly relevant in an immigrant dataset. In these models, the outcome risk factor is modeled either as a mediator, a selection factor, or a combined mediator/selection factor. Unlike mediators, selection factors are variables that affect the probability of being in the immigrant dataset and may contribute to a bias when comparing immigrants and nonimmigrants.
Results
When the outcome risk factor acts both as a mediator and selection factor, the adjustment for the risk factor in regression models leads to the estimation of what is known as a “controlled” direct effect. When the outcome risk factor is either a selection factor or a mediator alone, the adjustment for the risk factor in regression models leads to the estimation of a total effect or a controlled direct effect, respectively. In all regression analyses, also adjusting for various confounding paths, including mediatoroutcome confounding, may be necessary to obtain valid controlled direct effects or total effects.
Conclusions
Depending on the causal role of the outcome risk factors in immigrant datasets, regression adjustment for these may result in the estimation of either total effects or controlled direct effects for the difference in outcomes between immigrants and nonimmigrants. Because total and controlled direct effects are interpreted differently, we advise researchers to clarify to the readers which types of effects are presented when adjusting for outcome risk factors in immigrant datasets.
Background
In recent years, there has been a significant increase in the number of studies examining differences in health outcomes between immigrants and nonimmigrants [1,2,3,4,5,6,7,8,9,10]. In most of these studies, the differences are quantified by using regression models where the individual’s country or region of birth is the exposure. In addition to the exposure, several other variables associated with the outcome, such as age, education, overweight, and smoking, are included in the regression models [4,5,6,7,8,9,10]. The basic idea for including these is to adjust for potential bias that may arise due to the observed differences in these outcome risk factors between the compared groups.
However, the unequal distribution of an outcome risk factor between compared groups may not always represent a bias. A difference in the outcome risk factor may also arise because immigrants and nonimmigrants already predispose to a difference in the risk factor before immigration took place. For example, individuals from the migrating country may, in general, smoke cigarettes less often than the host population they move to. In this situation, the outcome risk factor becomes a mediator [11], in that the difference in the distribution of smoking between the migrant and host countries represents an association that is of interest for understanding the differences in health outcomes between immigrants and nonimmigrants.
Because outcome risk factors may act as mediators on the causal path between the exposure and outcome, the adjustment for such risk factors in regression models can lead to the estimation of a direct effect rather than a total effect [11,12,13]. In general, the direct effect refers to the association between the exposure and outcome unexplained by mediators lying on the causal path between the exposure and outcome, whereas the total effect refers to the association between the exposure and outcome without considering such mediators [11,12,13]. The total effect is the type of effect authors typically want to estimate when comparing immigrants and nonimmigrants in terms of health outcomes.
In this study, we use modern tools in causal inference to investigate the underlying conditions leading to unequal distributions in the outcome risk factors between immigrants and nonimmigrants. We then show how the adjustment for outcome risk factors in regression models under these conditions can lead to the estimation of either total or direct effects. A summary of our results is presented in a tabulated form to guide researchers that aim to report the correct effect type in their specific studies.
Methods
Causal interpretation
In causal inference, it is presumed that the exposure under study can be manipulated in the same way as a treatment assignment in a randomized controlled trial [14]. For country of birth and similar variables (e.g., sex and race), however, this manipulation cannot be directly performed because such variables do not correspond to clearly defined interventions [14,15,16,17]. To solve this, some authors suggest alternative representations of these exposures by specifying relevant components that are hypothetically manipulable [16, 17]. For example, if skin cancer is our outcome and country of birth is our exposure, we could let country of birth represent the joint effect of skin color and genes. Although interventions that would change these components are generally not feasible, describing such interventions can help clarify the causal interpretation of the effect estimates when contrasting country of birth [16, 17]. In this study, we do not specify the hypothesized components of country of birth, but we assume that they can be defined.
Directed acyclic graphs
To model the association between country of birth and a health outcome, we use directed acyclic graphs (DAG), which is a graphical tool for conducting causal inference in epidemiologic research [18]. Although DAGs have been applied in connection with health inequalities across race groups before [16, 19], there is little published information on how DAGs are applied to immigrant data.
In causal DAGs, we denote the variables nodes and let arrows between nodes represent causal effects. In brief, an exposure \(E\) has a direct effect on the outcome \(D\) if the two variables are connected with a single directed arrow \((E\to D)\). A variable can also have an indirect effect on another variable (\(E\to M\to D\)) via a mediator \(M\). When a variable \(C\) points at two other variables (\(E\leftarrow C\to D\)), the variable is defined as a common cause and corresponds to what epidemiologists call a confounding factor [20]. When two variables point at the same variable (\(E\to S\leftarrow D\)), the variable \(S\) is defined as a common effect or “collider” [18, 21, 22].
All above paths are defined as “open” except for the collider path, which is defined as a “closed” path. Open paths represent statistical associations, whereas closed or no paths represent the absence of associations. When controlling for a mediator or a confounding factor, for example, by regression adjustment or conditioning on a single variable value, the paths that were originally open get closed. If we, on the other hand, control for a collider, we open the path that originally was closed by the collider. Importantly, both confounding paths and collider paths should be closed to avoid biased associations between variables. When opening a collider path, the induced bias is usually called collider bias or selection bias [18, 21,22,23].
Immigrant datasets
Figure 1a represents a causal DAG for the association between country of birth and a health outcome before immigration takes place. We later extend this model to also include the more complex collider paths forming the basis of an immigrant dataset. For now, we will use the initial model to introduce variables, causal directions of variables, and effect types.
In Fig. 1a, we have assumed national representative data for two or more countries \(E\) (e.g., countries A and B) and that only one risk factor \({R}_{0}\) is present for the health outcome \(D\). The arrows from \(E\) to \(D\) and \({R}_{0}\) to \(D\) indicate that country of birth and the risk factor affect the outcome directly. We also suggest a national difference in the distribution of the outcome risk factor \({R}_{0}\) between the countries being compared, indicated by the arrow from \(E\) to \({R}_{0}\). An example could be that individuals from one country smoke cigarettes more often than individuals from the other country. The outcome risk factor, \({R}_{0}\), could also represent socioeconomic status (SES), body mass index (BMI), body height, or nutritional status.
Note that, the direction of the arrow for the national difference in the distribution of \({R}_{0}\) between the countries being compared is going from \(E\) to \({R}_{0}\) (\(E\to {R}_{0}\)), and not the other way around. The reason for this is that no known or unknown factor can influence which country one is born in, except perhaps factors that influence where the parents decide to live. Therefore, any observed difference in the distribution of an outcome risk factor \({R}_{0}\) between two or more countries must be inherent and specific for the compared countries, for example, due to the countries’ culture, tradition, genetic composition, or socioeconomic position. In this regard, it may be advantageous to have some knowledge of demographics in the relevant countries to be compared.
In Fig. 1a, we assumed national representative data of the countries being compared. However, datasets comparing immigrants and nonimmigrants often consist of national representative data for the receiving country (nonimmigrants) but subsamples of individuals for the migrating countries (immigrants). In addition, there may be a selection on the outcome risk factor \({R}_{0}\) among those who migrated. That is, the distribution of \({R}_{0}\) may differ between those who migrate and those who did not migrate from the migrating country. For example, resourceful individuals with higher SES might find it “easier” to migrate than those less resourceful with lower SES.
In Fig. 1b, we have introduced a binary variable \(S\) to indicate whether an individual is a member of the dataset to be analyzed (the immigrant dataset). If we let country A be the receiving country and country B the migrating country, then the immigrant dataset (\(S=1\)) would contain the representative set of individuals from country A (nonimmigrants) as well as a small subsample of individuals from country B (immigrants). Further, because the probability of being a member of the immigrant dataset naturally differs between country A and country B, we present an arrow from \(E\) to \(S\). Similarly, because the probability of being a member of the immigrant dataset may depend on the outcome risk factor \({R}_{0}\) (e.g., SES or BMI), we also present an arrow from \({R}_{0}\) to \(S\). The outcome risk factor will then also be a “selection factor” for being in the immigrant dataset.
Further, in immigrant datasets, the outcome risk factor is usually measured postmigration, often in connection with a health survey in the receiving country. Therefore, in Fig. 1b, we introduce \(R\) as the measured risk factor postmigration whereas \({R}_{0}\) represents the same unmeasured risk factor before migration took place. Note that, as \({R}_{0}\) is an ancestor of \(R\), the measured risk factor will lay in the causal path between \({R}_{0}\) and the outcome \(D\) (\({R}_{0}\to R\to D\)). However, we will not present an arrow from \(R\) to \(S\) as the risk factor is measured postmigration and can, therefore, not itself be a direct cause for being a member of the immigrant dataset \(S\).
Total effect and controlled direct effect
When comparing immigrants and nonimmigrants in terms of health outcomes, the goal is often to estimate the total effects. That is, we aim to estimate the effect of \(E\) on \(D\) without separating the different paths that could explain the effect of country of birth on the outcome. In general, the paths of the total effect can be separated into direct and indirect effects. The indirect effect refers to the part of the total effect that is explained by a given set of mediators, whereas the direct effect refers to the part of the total effect that is unexplained by the same mediators [11,12,13]. For example, in Fig. 1a, the total effect of \(E\) on \(D\) is composed of the direct effect \(E\to D\) and the indirect effect \(E\to {R}_{0}\to D\), where \({R}_{0}\) is a mediator. Consequently, to estimate the total effects of \(E\) on \(D\) in Fig. 1a, we would not control for the outcome risk factor \({R}_{0}\). If \({R}_{0}\) were to be controlled for, this would close the open path between \(E\) and \(D\) through \({R}_{0}\), leaving only the direct effect of \(E\) on \(D\).
When controlling for a mediator on the path between the exposure and outcome (for example by including it as a factor in a regression model), the resulting effect of the exposure on the outcome corresponds to what is known as a “controlled” direct effect [24]. At the population level, the controlled direct effect is defined as the average contrast between those with and without the exposure for a given value of the mediator [24]. Controlled direct effects can be estimated for both continuous and binary outcomes as well as for various effect measures, including odds ratio [25]. However, to obtain valid controlled direct effects for causal interpretation, at least two assumptions should be met [11, 12, 24,25,26]:

1.
There should be no unmeasured exposureoutcome confounding. Although this type of confounding is common in observational research, it is uncommon in an immigrant dataset, given that few factors can influence which country one is born in.

2.
There should be no unmeasured mediatoroutcome confounding. This type of confounding is often ignored in the literature [27]. If unmeasured mediatoroutcome confounding is present, adjusting for the mediator would lead to biased controlled direct effects due to conditioning on a collider.
In addition to these assumptions, any exposuremediator interaction should be accounted for [11, 12, 24,25,26]. If an interaction is present but ignored in regression modeling, the estimated controlled direct effect would be biased. Further, if an exposuremediator interaction is present, the controlled direct effect will vary by the levels of the mediator.
In this study, we do not discuss exposureoutcome confounding or exposuremediator interaction further (details on these concepts can be found in Hernan et al. [20] and Rijnhart et al. [26], respectively). However, we will have a closer look at the implications of unmeasured mediatoroutcome confounding (assumption 2.).
Natural indirect and direct effects
For many research questions, the main goal is to decompose the total effect into direct and indirect effects using mediation analysis. That is, the goal is to assess the extent to which the effect of an exposure on the outcome is explained or is unexplained by a given set of hypothesized mediators. A common approach for this goal is to estimate the socalled natural direct and indirect effects [24]. This approach for estimating effect types differs from controlled direct effects in analysis techniques, conditions, and interpretations [11, 12, 24,25,26,27,28]. Specifically, the estimated natural direct and indirect effects are only valid when assumptions 1. and 2. above are met. In addition, there should be no unmeasured exposuremediator confounding factors and no mediatoroutcome confounding factors affected by the exposure. Note that, estimating natural direct effects will not be covered in this article as we are evaluating the implications of the common practice of regression adjustment in immigrant datasets, and not evaluating the various analysis techniques used for estimating direct or indirect effects. Details on natural direct and indirect effects can be found in T. J. VanderWeele [12].
All methods were performed according to relevant guidelines.
Results
In the following, we investigate the underlying conditions leading to unequal distributions in the outcome risk factors between compared groups and show how the adjustment for outcome risk factors in regression models under these conditions can lead to either total effect or controlled direct effect.
Model A: the outcome risk factor is both a selection factor and mediator
In Fig. 1b, we assumed that both the exposure and the outcome risk factor affected the probability of being a member of the immigrant dataset S (E → Ⓢ ← R_{0}). This new structure added to Fig. 1b compared with Fig. 1a is an example of collider bias. In terms of DAGs, collider bias occurs when two variables point to the same variable (the collider), and this collider variable is adjusted for or conditioned on a specific value of its distribution [18, 21, 23]. In our DAG, \(S\) is the collider and it is restricted to only those who are a member of the immigrant dataset (\(S=1\)), indicated by the circle around \(S\). Importantly, conditioning on a single value of the collider \(S\) opens the path which originally was closed (not shown in the DAGs), thus inducing a biased association between \(E\) and \({R}_{0}\) [18, 21, 23].
However, in Fig. 1b, we also assumed that the countries of the immigrants and nonimmigrants already predispose to a difference in the outcome risk factor \({R}_{0}\) before immigration took place, indicated by the path \(E\to {R}_{0}\). Accordingly, the observed unequal distribution of the outcome risk factor between immigrants and nonimmigrants consists of both a preexisting difference in the outcome risk factor as well as a biased association induced due to the selection process involving the outcome risk factor. This biasing part would further bias the effect of \(E\) on \(D\) due to the path E → Ⓢ ← R_{0 }→ R → D. Consequently, to obtain valid effect estimates of the exposure on outcome, we need to remove the collider bias part of the observed unequal distribution between the compared groups.
In many situations, collider bias can be removed by simple regression modeling. To accomplish this, one needs to adjust for variables that lie on the biasing path between the exposure and outcome [18, 21, 29, 30]. In Fig. 1b, the outcome risk factor \(R\) is the only measured variable that keeps the biasing path open between \(E\) and \(D\). Indeed, in the migration literature, it is common practice to adjust for \(R\) in regression models due to the observed difference in \(R\) between immigrants and nonimmigrants. However, in doing this, we would not only remove the collider bias but also remove the indirect effect of \(E\) on \(D\) via \(R\) (\(E\to {R}_{0}\to R\to D\)). Consequently, when the outcome risk factor is both a mediator and selection factor in immigrant datasets, the adjustment for \(R\) would yield a controlled direct effect, and not a total effect, of \(E\) on \(D\). We later briefly discuss how total effects can be estimated under these conditions.
A challenge with adjusting for the outcome risk factor \(R\) arises when there are also confounding factors \(C\) for the relationship between \({R}_{0}/R\) and \(D\) (Fig. 1c). Because \({R}_{0}\) is a close ancestor of \(R\), and adjusting for \(R\) therefore also largely adjusts for \({R}_{0}\), this adjustment may induce another collider bias due to the path \(E\to {R}_{0}/R\leftarrow C\to D\). In this situation, additional adjustment for C is also needed to close this biasing path. If such mediatoroutcome confounding is suspected but not measured and adjusted for, one should evaluate its potential impact on the estimated effects by sensitivity analyses [12].
Model B: the outcome risk factor is a selection factor only
In Fig. 1ac, we assumed that the countries of the immigrants and nonimmigrants already predispose to a difference in the outcome risk factor \({R}_{0}\) before immigration took place. However, in some cases, this national \({R}_{0}\) distribution could be the same for the compared groups, with no arrow from \(E\) to \({R}_{0}\) (Fig. 1d). This would for example be the case when \({R}_{0}\) represents the variable sex. In that case, the observed difference in the outcome risk factor distribution between groups is only attributed to collider bias due to the path E → Ⓢ ← R_{0}. In other words, the outcome risk factor \({R}_{0}\) no longer acts as a mediator and remains a selection factor for immigration alone. Therefore, adjusting for the measured outcome risk factor \(R\) using adjusted regression would appropriately close the collider path E → Ⓢ ← R_{0}→ R → D, resulting in a total effect, and not a controlled direct effect, of \(E\) on \(D\).
Note that, in Fig. 1d the variable \(C\) is a confounding factor for the association between \({R}_{0}/R\) and \(D\). Once adjusting for the measured \(R\), one may also need to adjust for \(C\) to avoid another collider bias on the path \({R}_{0}\to R\leftarrow C\to D\). However, this bias may in general be small, as \({R}_{0}\) is a close ancestor of \(R\), and adjusting for \(R\) will largely also adjust for \({R}_{0}\) (and thereby close the confounding path \({R}_{0}\leftarrow C\to D\)). Indeed, when \(R\) is time constant (e.g., sex) or is measured at the point of immigration (setting \({R}_{0}=R\) in Fig. 1d), adjusting for \(R\) would be sufficient to close all biasing paths including the path through the confounding factor \(C\). In contrast, this would not suffice when \(R\) is both a mediator and selection factor (setting \({R}_{0}=R\) in Fig. 1c), in which adjustment for \(R\) would introduce collider bias due to the path \(E\to R\leftarrow C\to D\). Then, additional adjustment for \(C\) is needed to close the confounding path.
Model C: the outcome risk factor is a mediator only
If the probability of being a member of the immigrant dataset does not depend on the outcome risk factor (i.e., no selection factor), we no longer have an arrow from \({R}_{0}\) to \(S\) in Fig. 1b, c (not shown in the DAGs). In that case, the observed difference in outcome risk factor distribution between groups is only attributed to a preexisting difference between countries before immigration took place. That is, the outcome risk factor now acts as a mediator alone for the association between country of birth and the health outcome (\(E\to {R}_{0}\to R\to D\)), and no adjustment for collider bias is needed. An example of such a mediator could be body height, which is unlikely to be a direct selection factor for being a member of an immigrant dataset. Consequently, adjusting for such an outcome risk factor would yield a controlled direct effect, like that seen in Fig. 1a. On the other hand, refraining from adjusting for the same outcome risk factor (when it is a mediator alone and not a selection factor) in regression models would lead to the estimation of total effect. Note that the controlled direct effect is only valid if also adjusting for potential mediatoroutcome confounding factors.
Model D: the outcome risk factor is neither a selection factor nor a mediator
If the outcome risk factor is neither a selection factor for immigration nor a mediator for the association of the exposure and outcome, the distribution of the outcome risk factor should be similar for the compared groups. In this case, adjustment for the outcome risk factor \(R\) would not be needed to remove bias, and the effect estimates would be total.
Note on postmigration change in outcome risk factors
If the outcome risk factors \({R}_{0}\) (before migration) and \(R\) (some years postmigration) take different values, this could indicate that some factor associated with the receiving country may have caused this change. For instance, immigrants in the dataset can have their postmigration smoking status (\(R\)) changed compared to before immigration (\({R}_{0}\)), because the country where they move to have better education for citizens regarding the adverse effect of smoking, making these immigrants decide to stop smoking. Hence, there might be scenarios where \(S\) can influence \(R\), and where both \(S\) and \(R\) are mediators for the association between \(E\) and \(D\) due to the path \(E\to S\to R\to D\). This new path will only have consequences for the estimated effects in Fig. 1d but not in Fig. 1b, c. In Fig. 1d, adjusting for \(R\) and \(C\) after inclusion of the arrow from \(S\) to \(R\) would lead to controlled direct effect rather than the previous total effect.
Example of controlled direct effect
To illustrate how adjusting for outcome risk factors may lead to the estimation of a controlled direct effect instead of a total effect, we consider the study by Nilsen et al. [31] where the authors compared the risk of preterm preeclampsia (< 37 weeks of gestation) between immigrants and nonimmigrants according to immigrants’ reasons for immigration to Norway. The study included seven immigrant groups, but here we compare only one of the immigrant groups (immigrant refugee women) with the nonimmigrants.
The study reports results from two adjusted regression models. The first model adjusted for calendar year of birth, maternal age at birth, parity, marital status at birth, and chronic diseases (hypertension and diabetes). The second model additionally adjusted for SES, measured as maternal income and education. All variables were considered risk factors for preeclampsia and differed between the compared groups in initial data exploration in the immigrant dataset. Furthermore, they were measured postmigration around the time point of childbirth. For our illustration, we will assume that no other outcome risk factors existed for preeclampsia.
The suggested DAG for the fully adjusted model is shown in Fig. 2. The node \(Immigrant\, (E)\) is the exposure and represents the groups being compared. The node Ⓢ represents the dataset containing both representative nonimmigrant women and the subsample of selected refugees. Further, the gray nodes (\(R01R03\)) represent unmeasured outcome risk factors premigration, while the black nodes (\(R1R3\)) are measured risk factors postmigration. Also, we assume that the outcome risk factors \(Calendar\, year\), \(Marital\, status\), \(Parity\), and \(Chronic\, disease\), have the same causal roles, and have, therefore, combined these into \(Other \left(R01\right)\) and \(Other\, at\, birth\, (R1)\).
In essence, the DAG in Fig. 2 corresponds to that of Fig. 1b, where adjusting for all measured postmigration outcome risk factors (\(R1R3\)) is needed to remove collider bias. However, in doing this, we also close all the indirect paths between exposure \(E\) and outcome \(D\) via these factors. Consequently, adjusting for the measured postmigration outcome risk factors should result in a controlled direct effect, and not a total effect, for the association between \(Immigrant\, (E)\) and \(Preeclampsia\, (D)\). Nilsen et al. [31] did not report the type of effect, and readers might therefore interpret the adjusted association as total effect rather than controlled direct effect.
It may be tempting to refrain from adjusting for the outcome risk factor when it is both a mediator and a selection factor, in the belief that the effect would be total. However, if the adjustment for such outcome risk factors is ignored, the estimated total effect will be biased due to uncontrolled collider bias. For example, the model with and without adjustment for SES in Nilsen et al. [31] produces two different odds ratios, 1.28 vs 1.18, for preeclampsia. The second estimate (with adjustment for SES) is a valid controlled direct effect, whereas the first estimate (without adjustment for SES) is a biased effect estimate due to ignoring SES as a selection factor.
It is likely that immigrants can have their postmigration \(SES\, (R2)\) changed compared to before immigration (\(R02\)), indicated by the arrows from Ⓢ to \(SES\, (R2)\) in Fig. 2. We further believe that SES postmigration may influence the age at which one decides to have a child, indicated by the arrow from \(SES\, (R2)\) to \(Maternal\, age\, (R3)\). However, because all measured outcome risk factors are mediators on the causal path from \(E\) to \(D\), and at the same time are adjusted for, these additional paths would not affect the effect type or results from the adjusted regression model.
Discussion
When quantifying difference in health outcomes between immigrants and nonimmigrants, it is common practice to adjust for observed differences in outcome risk factors between the groups being compared. In this study, we showed that unequal distributions in the outcome risk factors between immigrants and nonimmigrants arise due to various conditions involving the outcome risk factors. When the outcome risk factor acts as a combined mediator/selection factor or as a mediator alone, the regression adjustment for the risk factor leads to a controlled direct effect. When the outcome risk factor acts as a selection factor and not a mediator, the adjustment for the risk factor leads to a total effect. A summary of our findings is presented in Fig. 3. Notably, in most research problems, several different types of outcome risk factors can be present for the outcome. If at least one of these risk factors is a combined mediator/selection factor or a mediator alone, adjusting for the whole set of outcome risk factors would yield a controlled direct effect.
While the aim of this paper was to investigate how adjustment for outcome risk factors can lead to the estimation of either total or direct effects, we also described what happens if we do not adjust for selection factors or mediators. Specifically, we showed that refraining from adjusting for a combined mediator/selection factor or a selection factor alone can induce collider bias unless adjusting for some other measured factor on the same causal path (model A and B). Conversely, no such bias would be induced when refraining from adjusting for a mediator alone (model C) or a factor that is neither a mediator nor a selection factor (model D). The last column of Fig. 3 summarizes the implications of not adjusting for the relevant risk factor. In all models, also accounting for various confounding paths, including mediatoroutcome confounding, may be necessary to obtain valid controlled direct effects or total effects.
If authors are not satisfied with controlled direct effects and want to obtain total effects under model A (the outcome risk factor is both a selection factor and mediator), other statistical techniques could be used for this purpose. One popular analytical approach would be to use inverse probability weighting (IPW) in regression models [22, 29, 30]. To use this method, we first need to calculate the probability (\(p\)) of being a member of the immigrant dataset (\(S = 1\)) for each individual and \(R\) value. Then, we calculate the inverse of these selection probabilities (\(1/p\)) and use these as weights in regression models. Note that, however, to calculate the selection probabilities for IPW, we need data on both the outcome risk factor distribution for the immigrants in the sample of the receiving country (immigrant dataset) as well as the corresponding distribution in the home population of the immigrants (i.e., the national representative data). This requirement is, unfortunately, not always possible to meet.
In our analyses, we considered national representative data for the nonimmigrants. However, many researchers may not have access to nationally representative data of nonimmigrants but are left with a survey in which both the immigrants and nonimmigrants select themselves to be members of the immigrant dataset (\(S = 1\)). This would not change the DAG in Fig. 1bd, and the approach for estimating total effects and controlled direct effects would be the same as before.
Conclusions
In immigrant datasets, adjusting for outcome risk factors in regression models may result in either total effects or controlled direct effects. Which type of effect is estimated under a given dataset depends on the causal role of the outcome risk factor adjusted for. Because total and direct effects are two different effects and are interpreted differently, we advise researchers to clarify to the readers which types of effects are presented when adjusting for outcome risk factors in immigrant datasets. As shown in this study, this can best be accomplished by first examining the plausible model for the research problem using causal graphs and then identifying the correct effect type obtained by adjustments under these models. Only this way, the readers may achieve a consistent interpretation of effects and perform consistent comparisons between immigrants and nonimmigrants across immigrant datasets. The current paper is focused on immigrant datasets, but the content of the paper may also be relevant to other public health datasets, including datasets of health difference between males and females.
Availability of data and materials
Not applicable.
Abbreviations
 BMI:

Body mass index
 DAG:

Directed acyclic graphs
 IPW:

Inverse probability weighting
 SES:

Socioeconomic status
References
Agyemang C, van der Linden EL, Bennet L. Type 2 diabetes burden among migrants in Europe: unravelling the causal pathways. Diabetologia. 2021;64(12):2665–75.
Indseth T, Grosland M, Arnesen T, Skyrud K, Klovstad H, Lamprini V, et al. COVID19 among immigrants in Norway, notified infections, related hospitalizations and associated mortality: a registerbased study. Scand JPublic Health. 2021;49(1):48–56.
Hjerkind KV, Larsen IK, Aaserud S, Moller B, Ursin G. Cancer incidence in nonimmigrants and immigrants in Norway. Acta Oncol. 2020;59(11):1275–83.
Bastola K, Koponen P, Skogberg N, Gissler M, Kinnunen TI. Hypertensive disorders of pregnancy among women of migrant origin in Finland: A populationbased study. Acta Obstet Gynecol Scand. 2021;10:127–34.
Vik ES, Aasheim V, Schytt E, Small R, Moster D, Nilsen RM. Stillbirth in relation to maternal country of birth and other migration related factors: a populationbased study in Norway. BMC Pregnancy Childbirth. 2019;19(1):5.
Schneeberger AR, Seixas A, Schweinfurth N, Lang UE, Cajochen C, Bux DA, et al. Differences in Insomnia Symptoms between Immigrants and NonImmigrants in Switzerland attributed to Emotional Distress: Analysis of the Swiss Health Survey. Int JEnviron Res Public Health. 2019;16(2):289.
Eskild A, Sommerfelt S, Skau I, Grytten J. Offspring birthweight and placental weight in immigrant women from conflictzone countries; does length of residence in the host country matter? A population study in Norway. Acta Obstet Gynecol Scand. 2020;99(5):615–22.
Juarez SP, Small R, Hjern A, Schytt E. Caesarean Birth is Associated with Both Maternal and Paternal Origin in Immigrants in Sweden: a PopulationBased Study. Paediatr Perinat Epidemiol. 2017;31(6):509–21.
Kragelund Nielsen K, Andersen GS, Damm P, Andersen AN. Gestational Diabetes Risk in Migrants. A Nationwide, RegisterBased Study of all Births in Denmark 2004 to 2015. J Clin Endocrinol Metab. 2020;105(3):dgaa024.
Nilsen RM, Daltveit AK, Iversen MM, Sandberg MG, Schytt E, Small R, et al. Preconception Folic Acid Supplement Use in Immigrant Women (1999–2016). Nutrients. 2019;11(10):2300.
Richiardi L, Bellocco R, Zugna D. Mediation analysis in epidemiology: methods, interpretation and bias. Int J Epidemiol. 2013;42(5):1511–9.
VanderWeele TJ. Mediation Analysis: A Practitioner’s Guide. Annu Rev Public Health. 2016;37:17–32.
Hayes AF, Rockwood NJ. Regressionbased statistical mediation and moderation analysis in clinical research: Observations, recommendations, and implementation. Behav Res Ther. 2017;98:39–57.
Hernan MA. A definition of causal effect for epidemiological research. J Epidemiol Community Health. 2004;58(4):265–71.
Greiner J, Rubin D. Causal effects of perceived immutable characteristics. Rev Ec Stat. 2011;93:775–85.
VanderWeele TJ, Robinson WR. On the causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology. 2014;25(4):473–84.
Glymour MM, Spiegelman D. Evaluating public health interventions: 5. causal inference in public health researchdo sex, race, and biological factors cause health outcomes? Am J Public Health. 2017;107(1):81–5.
Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48.
Fujishiro K, Hajat A, Landsbergis PA, Meyer JD, Schreiner PJ, Kaufman JD. Explaining racial/ethnic differences in allcause mortality in the MultiEthnic Study of Atherosclerosis (MESA): Substantive complexity and hazardous working conditions as mediating factors. SSM Popul Health. 2017;3:497–505.
Hernan MA, HernandezDiaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155(2):176–84.
Greenland S. Quantifying biases in causal models: Classical confounding vs colliderstratification bias. Epidemiology. 2003;14(3):300–6.
Hernan MA, HernandezDiaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615–25.
Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, et al. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010;39(2):417–20.
Pearl J. Direct and indirect effects. Proceedings of the seventeenth conference on uncertainty in artificial intelligence: Morgan Kaufmann Publishers Inc; 2001. p. 411–20.
Vanderweele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol. 2010;172(12):1339–48.
Rijnhart JJM, Valente MJ, MacKinnon DP, Twisk JWR, Heymans MW. The Use of Traditional and Causal Estimators for Mediation Models with a Binary Outcome and ExposureMediator Interaction. Struct Equ Modeling. 2021;28(3):345–55.
Vo TT, Superchi C, Boutron I, Vansteelandt S. The conduct and reporting of mediation analysis in recently published randomized controlled trials: results from a methodological systematic review. J Clin Epidemiol. 2020;117:78–88.
Doretti M, Raggi M, Stanghellini E. Exact parametric causal mediation analysis for a binary outcome with a binary mediator. Stat Methods Appl. 2022;31(1):87–108.
Biele G, Gustavson K, Czajkowski NO, Nilsen RM, ReichbornKjennerud T, Magnus PM, et al. Bias from self selection and loss to followup in prospective cohort studies. Eur J Epidemiol. 2019;34(10):927–38.
Nohr EA, Liew Z. How to investigate and adjust for selection bias in cohort studies. Acta Obstet Gynecol Scand. 2018;97(4):407–16.
Nilsen RM, Vik ES, Rasmussen SA, Small R, Moster D, Schytt E, et al. Preeclampsia by maternal reasons for immigration: a populationbased study. BMC Pregnancy Childbirth. 2018;18(1):423.
Acknowledgements
Not applicable.
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study's conception, design, and analysis. The first draft of the manuscript was written by Roy Miodini Nilsen and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Nilsen, R.M., Klungsøyr, K. & Stigum, H. Adjusting for outcome risk factors in immigrant datasets: total or direct effects?. BMC Med Res Methodol 23, 37 (2023). https://doi.org/10.1186/s12874023018614
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874023018614