The effect of disease co-occurrence measurement on multimorbidity networks: a population-based study

Background Network analysis, a technique for describing relationships, can provide insights into patterns of co-occurring chronic health conditions. The effect that co-occurrence measurement has on disease network structure and resulting inferences has not been well studied. The purpose of the study was to compare structural differences among multimorbidity networks constructed using different co-occurrence measures. Methods A retrospective cohort study was conducted using four fiscal years of administrative health data (2015/16 – 2018/19) from the province of Manitoba, Canada (population 1.5 million). Chronic conditions were identified using diagnosis codes from electronic records of physician visits, surgeries, and inpatient hospitalizations, and grouped into categories using the Johns Hopkins Adjusted Clinical Group (ACG) System. Pairwise disease networks were separately constructed using each of seven co-occurrence measures: lift, relative risk, phi, Jaccard, cosine, Kulczynski, and joint prevalence. Centrality analysis was limited to the top 20 central nodes, with degree centrality used to identify potentially influential chronic conditions. Community detection was used to identify disease clusters. Similarities in community structure between networks was measured using the adjusted Rand index (ARI). Network edges were described using disease prevalence categorized as low (< 1%), moderate (1 to < 7%), and high (≥7%). Network complexity was measured using network density and frequencies of nodes and edges. Results Relative risk and lift highlighted co-occurrences between pairs of low prevalence health conditions. Kulczynski emphasized relationships between high and low prevalence conditions. Joint prevalence focused on highly-prevalent conditions. Phi, Jaccard, and cosine emphasized associations involving moderately prevalent conditions. Co-occurrence measurement differences significantly affected the number and structure of identified disease clusters. When limiting the number of edges to produce visually interpretable graphs, networks had significant dissimilarity in the percentage of co-occurrence relationships in common, and in their selection of the highest-degree nodes. Conclusions Multimorbidity network analyses are sensitive to disease co-occurrence measurement. Co-occurrence measures should be selected considering their intrinsic properties, research objectives, and the health condition prevalence relationships of greatest interest. Researchers should consider conducting sensitivity analyses using different co-occurrence measures. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01607-8.

Background Multimorbidity, the co-existence of two or more chronic health conditions within an individual, where none are considered more central than the others, is becoming increasingly common in populations worldwide [1][2][3]. Those living with multiple chronic conditions tend to experience poorer quality of life, have increased disability and mortality, and face many challenges accessing healthcare services including receiving conflicting medical advice and duplicative testing, experiencing drug interactions, and managing a heavy treatment burden [3][4][5][6]. Multimorbidity also places a strain on healthcare systems since individuals with multiple chronic conditions have higher healthcare utilization and costs [7][8][9]. Network analysis, the study of relationships amongst connected entities, is commonly used to examine social relationships but has only recently been proposed as a method to shed light on population-level multimorbidity patterns. Network analysis models disease co-occurrence using graph structures characterized by nodes (e.g., diseases) and connecting edges (i.e., relationships or interactions). Several recent studies applied network analysis to electronic health data, to examine associations among co-occurring health conditions at the population level [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. Network analysis is appealing for multimorbidity research, in part because of its reliance on graphical techniques to present disease associations, which can efficiently convey information in a non-technical manner to clinicians, patients, and healthcare system decision makers. Along with data visualization, network analysis also supports 1) centrality analysis for the detection of important nodes or hubs, that is, diseases that may be influential in a population or among a set of other diseases; 2) the identification of community structure, that is, clusters of highly-connected diseases; and 3) comparisons between population subgroups by contrasting network properties such as density and complexity measures. Although the practical implications of multimorbidity network analyses have not been fully explored, data-driven patient profiles built using network analysis can be used by healthcare providers and policy makers to organize healthcare delivery services to reduce the treatment burden of multimorbidity, reduce healthcare costs, and improve patient outcomes. Community structure can be used to identify sets of co-occurring conditions that may benefit from a coordinated and multidisciplinary approach to healthcare delivery, while centrality analysis can be used to target conditions that may benefit from interventions aimed at prevention.
Measuring disease association, or co-occurrence, is foundational for constructing the links that form the structure of disease networks. There are several cooccurrence measures available and studies conducted to date have used a variety of different measures to construct disease networks [10,11,[18][19][20][21][22]. While the intrinsic properties of co-occurrence measures have been examined [26,27], the effect that choice of cooccurrence measure has on disease network structure and any resulting inferences has not been well studied. Substantial variation in research methods has been observed among multimorbidity studies, which challenges the comparability of research findings [28,29]. Research comparing different methodological approaches, for studying patterns of multimorbidity, has been recommended to improve study validity and generalizability [29]. Comparing techniques for constructing multimorbidity networks could aid in determining how different techniques affect our understanding of population-level chronic disease patterns. Therefore, the purpose of this study was to assess the effect that choice of co-occurrence measure has on network analyses of co-occurring chronic conditions. Using administrative health data with excellent population coverage, separate chronic disease networks were constructed using seven co-occurrence measures. Descriptive methods were used to compare networks in terms of node centrality, community structure, and density; network edges were compared using categorized prevalence of co-occurring disease pairs.

Data sources
Study data sources included the Manitoba Health Services Insurance Plan Registry (Population Registry), the Hospital Abstracts Database, and the Medical Services Database. These data sources were linked using a unique personal health identification number. The Population Registry captures information on healthcare coverage for all insured Manitobans, and was used to determine eligibility for study inclusion. The Registry also includes demographic information used to characterize the study cohort. The Registry does not provide information about an individual's gender, which is a sociological construct; it only provided information about biological sex. Individuals may change their biological sex designation.
Chronic condition diagnoses were obtained from the Hospital Abstracts Database, which contains information on discharges from hospitals, and the Medical Services Database, which records information on ambulatory services provided in physician offices. Diagnoses within hospital discharge abstracts are coded using ICD-10-CA (International Statistical Classification of Diseases and Related Health Problems, 10th Revision with Canadian Enhancements), while Medical Services diagnoses are coded with up to five digits using ICD-9-CM (International Statistical Classification of Diseases and Related Health Problems, 9th Revision, Clinical Modification).

Cohort development
The study cohort included all Manitoba residents with complete or partial Manitoba Health insurance coverage during the study observation period. Individuals entered the study on April 1, 2015 or the date that coverage started, and were followed until the end of the study period or until their insurance coverage ceased (e.g., due to death, migration out of province). Based on information about biological sex recorded in the Population Registry, males with female-specific conditions and females with male-specific conditions were excluded (n = 15); the presence of these potential inconsistencies could suggest errors in the recording of diagnostic or demographic information, and it was not possible to verify the reasons for these potential inconsistencies. Since disease networks were formed from disease co-occurrence relationships, the network analysis was limited to individuals with diagnoses recorded for at least two chronic health conditions during the study observation period.

Disease ascertainment
Chronic conditions were ascertained using diagnoses identified from inpatient discharge records in the Hospital Abstracts Database and from physician visit records in the Medical Services Database. Surgeries recorded in both data sources were also included. Prenatal and pregnancy-related records were excluded to minimize overstating disease co-occurrence among females.
A single diagnosis code was used to ascertain whether an individual was considered as having a specified condition in the study observation period. Individual diagnosis codes were grouped into 201 Expanded Diagnostic Clusters (EDC) and 27 higher-level Major Expanded Diagnostic Clusters (MEDC) using the Johns Hopkins Adjusted Clinical Group (ACG) System [30]. Diagnoses were loaded into the Johns Hopkins ACG System as World Health Organization (WHO) ICD-9 or ICD-10 codes. Five-digit ICD-10-CA codes from the Hospital Abstracts Database were truncated to the first four digits to improve compatibility with the Johns Hopkins System, which supports the WHO ICD system but not the Canadian revision. There were 49 unique Canadian-specific ICD-10-CA codes relevant to chronic disease status that were not captured by the Johns Hopkins System. These 49 Canadian-specific diagnosis codes were first translated to WHO ICD-10 codes for inclusion. An additional 17 Canadian-specific ICD-10-CA codes were not captured; however, they were irrelevant to disease status since they indicated location of occurrence or activity engaged in during occurrence. Chronic conditions classified as separate EDC categories based on severity or presence of complications were combined into single disease categories including asthma with or without asthmaticus, hypertension with or without complications, type 1 diabetes with or without complications, and type 2 diabetes with or without complications. As well, 25 EDC categories that were non-descriptive, or referred to non-chronic medical conditions or to the neonatal period were removed. Two categories indicating severity of malignant neoplasms, already classified elsewhere, were also excluded. Since co-occurrences with frequencies less than 15 were excluded from the association analysis to minimize statistical errors, seven EDC categories with low frequencies were excluded: heart murmur, lymphadenopathy, thrombophlebitis, tuberculosis infection, sinusitis, other inflammatory conditions of skin, and other female gynecologic conditions. After 34 EDC categories were excluded, 167 EDC categories remained for the network analysis.

Disease co-occurrence measures
Disease co-occurrence was defined as two or more chronic health conditions recorded at any time during the four-year study observation period (April 1, 2015 -March 31, 2019) for the same individual. The four-year study period was selected because it was the available time frame in which diagnoses in the Medical Services dataset were coded to five digits; this detailed level of diagnostic coding was important for identifying distinct health conditions. Disease association was measured using seven different co-occurrence measures: lift (Eq. 1), relative risk (Eq. 2), phi (Eq. 3), Jaccard (Eq. 4), cosine (Eq. 5), Kulczynski (Eq. 6), and joint prevalence (Eq. 7) [31][32][33][34][35][36][37]. Phi and relative risk are two of the most commonly used measures in disease network analysis, while lift is commonly used in conjunction with association rule mining. Jaccard, cosine, and Kulczynski are null-invariant measures that have been suggested for use with sparse data such as disease status datasets [26]. Joint prevalence was included due to its ease of interpretation. These measures are defined for two health conditions (i.e., x and y; see Fig. 1) as: Relative risk [32] a(c+d) c(a+b) Jaccard [34] a (a+b+c) Joint prevalence [37] a n where a, b, c, d, and n are defined from the elements of a two-way contingency table.
Statistical significance of disease associations was assessed using the chi-square test when expected frequencies were greater than five, while Fisher's exact test was used when the chi-square assumption did not hold. Associations that were not statistically significant using α = 0.01 were excluded. Since the focus of this study was on co-occurring disease, the analysis was limited to positive associations. Given that RR is an asymmetric measure of association, the maximum of the two RR measures was used.
The association analysis was limited to disease dyads. Disease association was computed using association rule mining by applying the widely-used Apriori algorithm [38]. Minimum joint frequency (called support in association rule mining) was limited to 15 to minimize statistical errors, and the minimum confidence parameter of association rule mining was left unbounded. Data preprocessing was conducted using SAS (v9.4), while R and the arules package (v1.6-7) was used to perform the association analysis [39].

Covariates
The study cohort was characterized using age, biological sex, number of chronic conditions, residence location (urban or rural), and income quintile, an area-level measure of socioeconomic status based on average household income from the most recently-available (i.e., 2016) Statistics Canada Census [40]. Birthdate and biological sex were extracted from the most recent insurance coverage period, while socioeconomic and urban/rural status were based on the latest residence recorded during the study period. Age was calculated at exit date (i.e., the study index date) and categorized as < 20, 20-39, 40-59, 60 +.

Network analysis
Weighted, undirected pairwise disease networks were separately constructed using each of the seven co-occurrence measures. Disease networks were stratified by the number of associations (i.e., edges) included: (a) all associations, (b) strongest 50% of associations, and (c) strongest 200 associations. Networks based on the strongest 200 associations were used to examine differences in networks that have higher visual interpretability, while networks based on the strongest 50% cut-off were used to examine how network similarity changes when a larger number of associations are included. Effect size estimates were used as edge weights and were bounded between 0 and 1 for networks measured using phi, Jaccard, cosine, and Kulczynski association measures, and unbounded for lift, relative risk, and joint prevalence.
Community structure was identified using a weighted and non-overlapping community detection algorithm developed by Blondel et al. [41]. This algorithm was chosen due to its computational efficiency on large networks and because it is widely used in applied network studies. Central nodes were identified using degree centrality and the centrality analysis was limited to the top 20 central nodes. Degree centrality was chosen because it has a clear interpretation in the context of disease co-occurrence networks (i.e., number of co-occurrence relationships) and the appropriateness of other centrality measures for use with disease co-occurrence networks is uncertain. Disease networks were visualized using the Fruchterman-Reingold force-directed layout algorithm constrained to the strongest 200 associations to improve interpretability  [42]. Networks were visualized using Gephi (v0.9.2) and analyzed with Java and Gephi Toolkit (v0.9.2).

Evaluating and comparing disease networks
Disease networks constructed using different co-occurrence measures were compared on their community structure, central nodes, common edges, network complexity, and in terms of the joint prevalence and prevalence difference distributions of their network edges. Community structure similarity was calculated using the adjusted Rand index (ARI) with the R package aricode (v1.0.0) [43]. Community structure was also characterized by the number of communities identified and modularity, where higher modularity values indicate more distinctive communities. Important nodes, identified using degree centrality, were compared across networks by calculating their agreement, as a percent, on the top 20 central nodes. Edge similarity was calculated using the percent of edges in common between network pairs. Across all networks, overall similarity of community structure, and central node and edge agreement, was quantified using the median and the 25th (Q1) and 75th (Q3) percentiles.
Network edges were compared using categorized prevalence of co-occurring disease pairs. Based on the prevalence distribution across all 167 Johns Hopkins disease categories, disease prevalence was categorized as low (< 1%), moderate (1 to < 7%), and high (≥7%). The percent of edges in each category was used to describe the tendencies of the co-occurrence measures in estimating association strength. Joint prevalence and prevalence difference distributions were described using the median and Q1-Q3 range.
Network complexity was characterized using density (i.e., the ratio of the number of edges present in a network to the number of possible edges between all node pairs), and frequencies of nodes and edges.  Fig. 2). The median age of individuals with multimorbidity was considerably higher (57 years, Q1-Q3: 41-70) than individuals with one chronic condition (33 years, Q1-Q3: 18-49) or without any chronic disease (24, Q1-Q3: 11-37). There were a higher percentage of females (54.1%) and urban residents (64.1%) with multimorbidity than without (47.1% female, 61.3% urban). There were only minor differences in the distribution of socioeconomic status (income quintile) between those with and without multimorbidity.

Disease association analysis
Out of 9138 pairwise disease co-occurrences with joint frequencies ≥15, 1293 (14.1%) were excluded due to being either non-significant (i.e., p-value > 0.01) or nonpositive (i.e., phi < 0). After exclusions, 7845 associations remained for the analysis. Since network density is not modified by edge weight, network density was constant (0.57) for all seven networks constructed with different co-occurrence measures when all edges (n = 7845) were included (N nodes = 166). Networks constructed by limiting the number of associations differed in their density and number of nodes (Table 2). For pairwise networks constructed using the strongest 200 associations, the network with the least number of nodes (n = 56 for joint prevalence) had the highest network density (0.13), while the two networks with the greatest number of nodes (n = 114 for relative risk; n = 123 for Kulczynski) had the lowest network density at 0.03. As more associations were included, variation in the number of nodes and network density decreased between the networks. For the pairwise networks constructed with the strongest 50% of associations (i.e., n = 3922 associations), the number of nodes ranged from 150 to 166 and network density varied between 0.29 and 0.35.

Network visualization
Including all statistically significant pairwise associations produced dense network visualizations that were difficult to interpret ( Supplementary Fig 1, Additional File 1

Co-occurrence relationships characterized by disease prevalence
Different co-occurrence measures estimate higher association strengths for different types of relationships, in terms of the prevalence difference between disease pairs. These preferences by association measures result in certain pairwise chronic disease relationships being emphasized more than other combinations, when limiting networks to the strongest associations. Differences based on disease prevalence were more pronounced between networks when using a smaller number of the strongest associations and decreased when including 50% (n = 3922) of all statistically significant associations (Table 3 Networks based on lift and relative risk accentuated cooccurrence relationships between pairs of low prevalent (< 1%) conditions, at 72.5 and 59.0% respectively (Table 3, Fig. 5). The percentage of edges highlighting co-occurrences between two low prevalent conditions in the other five networks ranged from 0% (joint prevalence) to 9.5% (phi). Lift and relative risk also highlighted a higher proportion of relationships between moderately prevalent (1 to < 7%) and low prevalent conditions, compared with the other co-occurrence measures.
Relationships between two moderately prevalent conditions were emphasized by phi, Jaccard, and cosine based networks: 36.5, 46.0, and 30.0%, respectively. Phi, Jaccard, and cosine also emphasized relationships between highly and moderately prevalent diseases: 27.5, 28.5, and 39.5%, respectively. The majority of the edges in the Kulczynski-based network represented relationships between conditions of high and low prevalence (40.0%), and between highly prevalent and moderately prevalent conditions (28.5%). Relationships between conditions of high and low prevalence only constituted up to 4.0% of all edges in the other six networks. Measuring co-occurrence using joint prevalence resulted in the highest percentage of edges connecting highly prevalent and moderately prevalent disease nodes (69.5%). Joint prevalence and Jaccard produced networks with the most connections between two highly prevalent conditions (7.5%). Correspondingly, the joint prevalence network had the highest median joint prevalence (0.7%, Q1-Q3: 0.6-1.2%). Lift and relative risk based networks did not contain any edges between two highly prevalent disease nodes, while associations between pairs of highly prevalent conditions accounted for 3.0 to 6.5% of the edges in networks built using phi, cosine, and Kulczynski.

Network edge similarity
Disease networks constructed using different co-occurrence measures were dissimilar in terms of the edges included in the top 200 associations (Fig. 6), with edge agreement (percentage of co-occurrence relationships in common) ranging from 1.5% for lift and joint prevalence to 86.5% for lift and relative risk. Phi-and Jaccard-based networks had moderate agreement with the cosine-based network (83.0 and 79.5%). Phi and Jaccard had moderate agreement (78.0%), while the remaining network pairs had lower agreement with a range from 5.0 to 63.5%. Median agreement (37.0%, Q1-Q3: 20.0-53.5%) among the network pairs was much lower when limited to the 200 strongest associations, than when the top 50% of all statistically significant associations were used to construct the networks (68.5%, Q1-Q3: 58.7-83.9%; Supplementary Fig. 11, Additional File 1).

Fig. 4
Multimorbidity network with associations measured using phi, limited to the 200 strongest associations. Node diameter and font size are proportional to prevalence, edge weight (thickness) is proportional to effect size, and node and edge color indicate community structure (i.e., disease clusters). COPD = chronic obstructive pulmonary disease, ESRD = end-stage renal disease  (Table 2). However, networks also had considerable dissimilarity in the number of communities detected when all statistically significant associations were included (3 to 7) and when limited to the strongest 50% of all statistically significant associations (2 to 6). Modularity, a measure of how well a network separates into communities, also widely varied between networks limited to the 200 strongest associations (0.08 to 0.72) but variation decreased when  networks included the top 50% of all significant associations (0.07 to 0.43). When all statistically significant associations were included, modularity ranged from 0.07 (joint prevalence) to 0.36 (relative risk). Community structure similarity, as measured using the ARI, was strongest between phi and cosine in networks limited to the top 200 associations (ARI = 0.68) (Fig. 7), and between relative risk and lift (ARI = 0.49) and phi and cosine (ARI = 0.48) among networks limited to the top 50% of all significant associations ( Supplementary Fig. 12, Additional File 1). Overall, cooccurrence measurement differences resulted in poor community structure similarity: the median ARI was 0.08 (Q1-Q3: 0.06-0.24) for networks including the top 200 associations and the median ARI was 0.26 (Q1-Q3: 0.24-0.32) for networks consisting of the top 50% of associations.

Nodes of importance
Since degree centrality is a non-weighted measure, networks that included all statistically significant edges without limiting inclusion by effect size had identical degree distributions. When including all statistically significant associations, the five chronic condition categories with the highest degree centrality were: other endocrine disorders, depression, major depression, sleep apnea, and asthma.
The selection of the top 20 disease categories with the highest degree centrality varied amongst networks constructed using different co-occurrence measures. Networks had a median agreement of 55.0% (Q1-Q3: 25.0-75.0%) when limited to the top 200 co-occurrence relationships ( Fig. 8; Supplementary Table 1, Additional File 1) and a median agreement of 55.0% (Q1-Q3: 30.0-75.0%) when limited to the strongest 50% of all significant associations ( Supplementary  Fig. 13, Additional File 1). When limited to the top 200 co-occurrences, agreement ranged from 5% between lift and joint prevalence to 95% between Jaccard and cosine. Agreement between two of the most commonly used measures among disease network studies, relative risk and phi, agreed on only 30% of the top 20 central nodes. When 50% of all statistically significant associations were included, agreement was strongest between Kulczynski and joint prevalence (95% agreement), and weakest between lift and Kulczynski (20%) and between lift and joint prevalence (20%).

Discussion
Measuring disease co-occurrence is essential when constructing multimorbidity networks to determine the connecting links between disease nodes and the strengths of these co-occurrence relationships, which serve as edge weights. Different association measures highlight different co-occurrence relationships, in terms of disease prevalence, based on which relationships are assigned higher association estimates. In weighted disease networks where effect size estimates are used as edge weights, differences in co-occurrence measurement influence community detection algorithms and node centrality measures that use edge weights in their calculations. Unweighted measures such as network density and degree centrality are not affected by the choice of cooccurrence measures unless network links are excluded based on effect size cut-offs. When limiting the number of edges in a network by effect size, to produce a visually interpretable diagram, choice of co-occurrence measure can have a significant impact on network structure and network analysis inferences. Evaluating the accuracy or validity of a network requires a ground truth against which to compare network structure. Since there is no ground truth for a chronic disease co-occurrence network, this study used descriptive methods to highlight the impact that co-occurrence measurement has on network analysis. This study showed the majority of the highest associations measured using lift and relative risk pertained to co-occurrence relationships between pairs of low prevalence conditions. In contrast, the strongest associations in the joint prevalence network included highly-prevalent conditions, while the Kulczynski measure emphasized relationships between high and low prevalence conditions. Phi, Jaccard, and cosine emphasized associations with moderately-prevalent conditions. Comparing Jaccard and cosine, Jaccard tended to prefer co-occurrence relationships between diseases of similar prevalence, while cosine assigned slightly less emphasis to events of similar frequency.
The results from the current study concur with Hidalgo et al., who compared disease co-occurrence networks constructed using RR and φ and found the network constructed with RR to have a greater number of low prevalence conditions and the φ-based network to be characterized by more prevalent conditions [10]. In addition to describing network edges by disease prevalence, the current study also showed the impact that disease co-occurrence measurement has on community structure and node centrality-items not discussed previously in literature. Along with contrasting RR and φ, this study also compared disease networks constructed using lift, a measure commonly used in conjunction with association rule mining, and null-invariant measures (cosine, Jaccard, and Kulczynski), which have been suggested for use with sparse data such as disease status matrices [26]. The differences amongst the null-invariant measures observed in the current study agree with Wu et al., who described the preference of Jaccard for relationships between events of similar frequency, Kulczynski for relationships between frequent and rare events, and cosine as being situated between these two in terms of the relationships that receive the highest association estimates [27].
Although there has been only limited research about the effect that the choice of association measure has on disease network structure, some studies have examined the intrinsic properties of association measures including the inversion invariance and null invariance properties, which are important considerations when choosing an appropriate measure of association [26,27]. The inversion invariance property refers to stable association estimates when events (i.e., presence of disease) and non-events (i.e., values indicating absence of disease) are flipped [26]. Association measures that are inversion invariant, such as φ, assign equal importance to copresence and co-absence and their association estimates remain constant when disease status is inversed. Association measures that are not inversion invariant may be better suited for asymmetric binary data, such as disease co-occurrence data, where absence of disease outweighs the number of positive cases [26]. Null invariance refers to constant effect estimates when there is an increase in the total number of records with neither event of interest (i.e., an increase in the number of disease-free individuals). Cosine, Jaccard, and Kulczynski are null-invariant measures of association, while φ violates the null invariance property. Lift, RR, and joint prevalence are neither inversion invariant nor null invariant. Like inversion invariance, null invariance is an important consideration for disease co-occurrence analysis since disease status matrices typically contain a large proportion of null transactions (observations that do not contain any of the events of interest) [27]. However, assessing the appropriateness of an association measure is still difficult even after an examination of its intrinsic properties. By outlining the tendencies of association measures to emphasize certain types of co-occurrence relationships, our study provides an additional empirical basis to aid researchers in selecting an appropriate co-occurrence measure.
The current study has a number of strengths. Extracting diagnoses from both hospital and physician data aids in providing a comprehensive picture of chronic disease patterns in the Manitoba population [44]. Furthermore, the administrative health data used in this study had excellent population coverage since the data are based on a single public insurer that effectively captures healthcare system encounters for all Manitoba residents, with few exceptions-resulting in excellent generalizability of the observed chronic disease patterns at the population level. Utilizing precise ICD diagnostic codes (i.e., up to five digits) minimized misclassification errors and allowed for the definition of certain disease categories that cannot be distinguished from one another when only using 3-digit codes. Finally, the large number of chronic condition categories under analysis facilitated the examination of many potentially interesting disease patterns that are obscured when using a more limited number of categories based on a comorbidity index.
Despite the strengths of this study, there are some limitations. The true distribution of chronic disease in the underlying population can differ significantly from disease patterns observed within administrative claims data. Since chronic conditions were defined through contact with the healthcare system, disease information may have been inadequately captured for individuals with limited access to healthcare services or for conditions which individuals are less likely to seek treatment. Consequently, there will be missing links or underestimated edge weights for relationships involving underreported health conditions within the structure of the disease co-occurrence networks. To increase diagnostic precision, this study was constrained to the 4-year period when diagnoses in physician billing claims were coded with up to five digits; but in doing so this study did not capture diagnoses that were only recorded in earlier time periods. This reduced observation period may have resulted in understating co-occurrence for less prevalent conditions or conditions that are infrequently documented in billing claims. All diagnoses observed during the 4-year study period for a specific individual were treated as persisting during the entire time period, which may have resulted in overstating certain cooccurrence relationships since diseases that may have been in remission were still considered as co-occurring with other conditions after the point of remission. Due to the relatively large number of chronic condition categories under consideration, it was not feasible to use complex case definitions to ascertain disease status based on diagnosis code counts. Simplified case definitions based on single diagnosis codes were used to mark disease status and misclassification may have occurred due to diagnostic coding errors. Finally, the constructed disease networks can be useful for generating hypotheses and visualizing disease patterns; however, disease progression was not considered in this study and network properties, such as node centrality, should not be used for causal inference [45].
Researchers must make several methodological choices when seeking to conduct a multimorbidity network analysis. In addition to choosing a measure of association, researchers must choose from many different community detection techniques, and node centrality and network complexity measures. While this study discusses approaches to choosing an association measure, researchers seeking to conduct a disease co-occurrence network analysis will also benefit from future studies that develop guidelines on choosing from these other network methods. Administrative health data are available in many jurisdictions and the methodology