We found that graphical models can be used in the study of ICF data in order to develop a more detailed understanding of human functioning.

Graphical models could be used for visualization of functioning in patients with spinal cord injury. The associations in Figure 1 have high face validity. Even though there were no a priori hypotheses imposed, the model revealed a clustering of variables concerning relationships and interpersonal interactions. The representation of the findings as graph is convenient, since it facilitates easy detection of groupings. Thus, this method might help to promote an intuitive understanding of human functioning.

The skeleton of a Directed Acyclic Graph is a somewhat refined version of the Conditional Independence Graph (CIG) used by [8]. The skeleton has the advantage that its edges indicate a stronger kind of dependence than the edges in a CIG. This is because the skeleton has an edge, if and only if the endpoints are dependent given any subset of the remaining nodes, whereas the CIG has an edge, if and only if the endpoints are dependent given all other nodes (but no subset is tested). Thus, edges in the CIG might vanish when conditioning on a subset of the remaining variables.

After estimating the skeleton, we found several connected components which can be used for dimension reduction. Dimension reduction allows focusing the analysis only on the part of the variables that are of interest. The connected components can be seen as distinct constructs and possibly be treated as one single variable in further analysis. This is useful, since many statistical methods are well suited for the case where many observations and few variables are available. Since dimension reduction transforms a large analysis task into (possibly one or many) smaller ones, these methods might only become feasible on the smaller groups. In particular, this is useful for dealing with ICF data. For example, when constructing a unidimensional scale for the difficulty of a task corresponding to nodes in the graph, fitting a Rasch model to a set of 200 variables seems daunting, whereas fitting it to several independent groups of five to twenty variables is much more feasible. Moreover, the structure of the connected components might yield additional information on the internal structure. Further analyses might benefit from this information. We found that the differences in the dependence structures between subpopulations were relevant and could be systematically analyzed using graphical models. Apparently, context seemed to matter, since the graphical models estimated for different geographical regions did vary significantly. Since the data used in this example is hardly representative, this cannot be discussed on the content level. Nevertheless, stratification and comparison of structures between strata can be very useful when developing theories about human functioning. Since graphical models represent complex structures in a well defined, mathematical way, they represent an ideal fundament for further methodological development on the systematic comparison of structures.

When estimating bounds on causal effects of ICF categories on general health perceptions, we found that the five ICF categories that showed the strongest effects were plausible. Four of the five categories found (*d640, d450, d430, d920*) are closely related to physical activity in daily life and it is plausible that general health perception is increased by increasing physical activity. Moreover, all categories we found are addressed in at least one of the currently most widely used health status measures [24]. The findings are in line with a previous study on the same data set (see [10]) using regression analysis (RA) instead of intervention analysis (IA). Four of the top five variables (*d640*, *b280*, *d450* and *d920*) occur in both RA and IA. Moreover, *Emotional Functions (b152)* which appears important in RA is on rank 7 in IA and thus also found to be very influential. Hence, content-wise, there was a large overlap of the concepts found with RA and IA, which is quite encouraging. Drawing a conclusion, IA seems to extract different but in a broad sense related information from the data. From a conceptual point of view, we think that the approach of intervention calculus - by estimating causal effects - is preferable to the associational approach (i.e. regression analysis), since the effect of a therapy is a matter of causation and not of association. Thus, from a therapeutic perspective, a regression analysis might be misleading in a sense that it suggests variables for intervention which are not causally related to the outcome. In principle, estimating the intervention effect directly overcomes this problem and finds promising candidates for successful therapeutic intervention more efficiently. Understanding the impact of therapeutic interventions is valuable for both pragmatic therapeutic suggestion and for the understanding of human functioning in general. It is important to note that the results shown are meant to be proofs of principle. Medical conclusions or interpretations are not to be drawn, since the data sets used here were convenience samples not representative of the underlying population. Specifically, the choice of countries representing the stratification into Asia and Europe is by no means representative and was made to get subgroups with reasonable sample size. Additionally, there was no information on the potential informativeness of missing values which were imputed under the assumption of noninformative missingness. Variables were dichotomized since this was necessary for the proposed methods. Yet this encompasses a loss of information. A generalization of the methodology that includes categorical or even ordinal data would be desirable.

As with all statistical methods, errors might occur due to sampling, i.e., some edges might be missing and some edges might be superfluous. We addressed this problem by using the bootstrap. However, there is a need to develop more rigorous methods for assessing the reliability of the estimated graph.

As with many other statistical methods, graphical modeling is based on certain assumptions, whose validity is hard to check in practice. For all of our applications, we assumed the absence of hidden or selection variables. Furthermore, we assumed that it is possible to represent all true independent statements of a complex structure using a graphical model without making any error (this is often called "faithfulness" or "stability"). It is reasonable to assume that this is often the case (see [17]).

While graphical models contribute to dimension reduction, other methods might be superior in particular applications. In many situations involving graphical models, erroneous estimation of one edge is not crucial for the global result. However, if we use graphical models for dimension reduction, one misplaced edge might change the result completely. For example, if we imagine that in Figure 2 an edge was erroneously inserted between d660 and s630, the two large groups would be combined into only one larger group. In this case, we would wrongly conclude that there is a dependency between the two groups. Thus, when used for dimension reduction, graphical models are sensitive to errors.

In order to compare the structures of different regions, we fitted one graphical model per region and compared the graphical models using SHD. This comparison was based on heuristics. Further research has to be done in order to provide systematic and computationally feasible methods for detecting significant differences between graphs.

For estimating intervention effects, we additionally assumed that the true causal mechanism can be represented by a Directed Acyclic Graph, i.e. we assumed that there are no feedback loops. Furthermore, we introduced some restrictions on the dependence of the individual random variables since interactions among explanatory variables were assumed to be absent. Without making assumptions, no information on causal effects can be found. Under our assumptions, we can find sets of possible causal effects. Even when given an infinite amount of data and using our assumptions, it will in general not be possible to find a unique causal effect, but only sets of possible causal effects. The development of suitable methods for aggregation of ambiguous causal effects is desirable. Ideally, as a strong test of the underlying assumptions, one would compare the performance of our proposed method for causal inference with the outcomes of randomized experiments.