Do differences in the administrative structure of populations confound comparisons of geographic health inequalities?
© Jackson et al; licensee BioMed Central Ltd. 2010
Received: 12 March 2010
Accepted: 18 August 2010
Published: 18 August 2010
Geographical health inequalities are naturally described by the variation in health outcomes between areas (e.g. mortality rates). However, comparisons made between countries are hampered by our lack of understanding of the effect of the size of administrative units, and in particular the modifiable areal unit problem. Our objective was to assess how differences in geographic and administrative units used for disseminating data affect the description of health inequalities.
Retrospective study of standard populations and deaths aggregated by administrative regions within 20 European countries, 1990-1991. Estimated populations and deaths in males aged 0-64 were in 5 year age bands. Poisson multilevel modelling was conducted of deaths as standardised mortality ratios. The variation between regions within countries was tested for relationships with the mean region population size and the unequal distribution of populations within each country measured using Gini coefficients.
There is evidence that countries whose regions vary more in population size show greater variation and hence greater apparent inequalities in mortality counts. The Gini coefficient, measuring inequalities in population size, ranged from 0.1 to 0.5 between countries; an increase of 0.1 was accompanied by a 12-14% increase in the standard deviation of the mortality rates between regions within a country.
Apparently differing health inequalities between two countries may be due to differences in geographical structure per se, rather than having any underlying epidemiological cause. Inequalities may be inherently greater in countries whose regions are more unequally populated.
Inequalities in health exist at many levels: between individuals, neighbourhoods, socio-economic groups, regions, countries and entire continents. Attempts to reduce social inequalities in health are often focused on geographical disparities since policy is most easily directed at administrative units such as local government [1, 2]. Geographic clusters of people can also be used as a proxy for underlying socio-economic or genetic factors since individuals nearby in space and time may be more similar than individuals separated by large distances . Accurate monitoring and description of geographic health inequalities is essential if we are successfully to reduce them. Moreover, the ability to develop hypotheses regarding the social, cultural, behavioural, political or health care differences between countries means that studies comparing inter-country variability in the magnitude of inequalities in health are of key importance in identifying opportunities to achieve these reductions . Modelling the variance may be a means of gaining insight into health inequalities and developing hypotheses regarding contexts .
By exploiting the inherent hierarchical structures present in populations (people live within neighbourhoods which are nested in regions nested in countries), multilevel models provide an appropriate statistical method for describing and explaining geographic health inequalities on a range of spatial scales [6, 7]. Variation in health statistics between geographic units derived from variance components models can be used as a direct measure of inequalities. Larger variance or standard deviation implies greater variation as the difference between fixed quantiles, e.g. the 5th and 95th centiles is greater. Patterns or trends of health inequalities can be assessed by comparing this variation across countries, or within a country over time .
However, one may rightly question whether it is fair to make direct comparisons of health inequalities between countries with different internal administrative unit sizes. When analyzing health data which have been aggregated into pre-defined spatial units we are likely to experience the modifiable areal unit problem (MAUP), whereby statistical bias or variation can occur due to the arbitrary nature of the aggregation of individuals into areas [9–11]. More specifically the MAUP consists of two interrelated components; the scale effect whereby statistical bias can occur when the information is grouped at different levels of spatial resolution i.e. the bias occurs due to the differing number of areas used in the analysis; and the zoning effect whereby bias is a result of the various ways areas can be aggregated at a given scale, and is not due to the variation in area size. Routinely collected data, which are frequently used in public health, are often restricted by the boundaries of the units for which the data have been provided. Such boundaries, in for example census data, are generally not designed to delineate communities or reflect homogeneity in terms of health. Further, although these boundaries are often designed to meet constraints on population thresholds, there still remains much variation, in terms of population size, between such areas and therefore when comparing health outcomes such as mortality rates, or inequalities in mortality rates, across regions we are faced with a variation of the MAUP scale effect and must seek potential solutions.
In this paper we examine whether spatial inequalities in mortality between countries are associated with the distribution in regional population sizes and in so doing potentially provide a statistical method of adjusting for the variation in administrative unit sizes within countries. Addressing this issue will also provide clues as to whether it is fair to compare health inequalities over time when existing boundaries are changed within a country (such as during Local Government reorganisation in Great Britain 1995-1998). For example, historical differences in the formation of administrative regions between regions or countries may exacerbate or occlude existing health inequalities, while the apparent increase or decrease in inequalities over time may in part be due to geographic boundary changes rather than any underlying epidemiological reasons . This latter point has been recognised as a potential confounding factor, especially in the UK, where methods for creating consistent boundaries or adjusting denominator populations have been developed to tackle this specifically [13, 14].
Furthermore, region boundary definitions can be made for several reasons , many of which may be at odds with statistical goals of grouping common ecological factors  - this is certainly true for electoral areas in the UK . Although we do not directly examine the issue of boundary changes over time and their relationship with health inequalities, exploring what effect differing population structures at one time point have on such health inequalities will provide clues to both situations where, essentially, we are interested in answering the same question - does comparing differently structured administrative units influence our interpretation of health inequalities?
We use a European dataset of 20 countries comprising administrative areas mainly at the NUTS II level or an equivalent level below the country level (e.g. Regions in France, Counties in the UK etc.) to explore whether differences in the structure of geographic units used for reporting data affect regional inequalities in mortality rates. Despite mortality and population being measured at the standard NUTS II level in our data, the number of regions and the distribution in region population size within countries varies across Europe. As discussed, the MAUP, and in particular the scale effect, needs to be addressed as the differing levels of spatial resolution are likely to be influencing our interpretation of within-country health inequalities. Assuming that countries with larger variation in mortality rates between their internal regions indicate greater inequalities than countries with smaller variation, we model this variation by relating it to the distribution of region populations within each country. This is logical as it is likely that geographical regions with larger populations are more diverse in, for example, socio-economic or cultural characteristics than areas with smaller populations.
Our hypothesis can be summarised as follows: countries with larger variation between mortality rates in their internal regions have greater geographic inequalities in this health measure than those with lower variation; and we expect this variation to be related to the distribution of internal region sizes as ecological determinants of health will be more strongly clustered in small regions than large ones. We extend a basic variance components model and explicitly model the variance to describe how inter-regional variation (within a given country) relates to the mean region population size and to inequality in region population size of that country.
Number of administrative regions, mean, minimum and maximum region population size and Gini coefficient for each country
Number of regions
Mean Region Population
Minimum Population Region
Maximum Population Region
σ u(j) (model 1)
630930 Lower Austria
4796173 Île de France
7519917 North Rhine-Westphalia
79930 Ionian Islands
50272 Valle d'Aosta
373730 Oslo og Akershus
3686473 Moscow (city)
57000 Ceutay Melilla
6046 Appenzell-Inner Rhoden
48617 Isle of Wight
2970235 Greater London
Summary of the Gini coefficient and mean region size models and results
Equation for within country variation
β 2 (95% CrI)
β 3 (95% CrI)
Baseline random model
σ u(j) ~ U(0,1)
Gini coefficient models
1.14 (-0.2 2.37)
log (σ u(j)) ~ N(β 1 + β 2 G j , 0.0001)
1.28* (0.44 2.11)
, σ u(j) > 0.0001
0.11 (-0.02 0.22)
σ u(j) ~ N(β 1 + β 2 G j , 0.0001), σ u(j) > 0.0001
0.12* (0.02 0.22)
Mean region size models
0.19 (-0.04 0.42)
log (σ u(j)) ~ N(β 1 + β 3 R j , 0.0001)
0.18* (0.05 0.32)
, σ u(j) > 0.0001
0.02 (-0.00 0.05)
σ u(j) ~ N(β 1 + β 3 R j , 0.0001), σ u(j) > 0.0001
0.02* (0.00 0.04)
Combined Gini coefficient and mean region size models
0.85 (-0.57 2.17)
0.14 (-0.10 0.38)
, σ u(j) > 0.0001
0.09 (-0.06 0.21)
0.01 (-0.01 0.04)
All models were fitted using WinBUGs . A burn-in period of 50,000 iterations was used during which convergence was completed for all models (as assessed by the Gelman-Rubin statistics and visual analysis of trace-plots of multiple chains). Two chains were then monitored for a further 100,000 iterations from which results were obtained. These lengthy burn-in and sampling periods were required to ensure convergence of the constant β 0 (eqn 2) which was prone to small short-term fluctuations but larger long-term fluctuations. The following priors were assigned: β 0-3 ~ dflat(); σ v ~ dunif(0,1); model 1 σ v ~ dunif(0,1); all other models σ v ~ dunif(0,1). Parameter estimates appeared insensitive to alternative prior distributions for σ v or σ u .
It is worth considering how these results affect the interpretation of the models. For the linear models, a change in the Gini coefficient (ΔG) or mean region size (ΔR) for a given country is associated with an absolute increase in the intra-regional standard deviation (σ u(j)) of β 2ΔG or β 3ΔR respectively. This relationship is complicated for the log models by the exponential function. In these models it is easier to conceptualise how an absolute increase in either population size inequality (G) or mean region size (R) causes a proportional increase in the regional variation. An increase in the Gini coefficient of 0.1 is associated with a 12-14% increase (on average) in regional standard deviation (depending on whether the relationship is deterministic or fitted) in the log models (Table 2, models 2 and 3). Similarly, an increase in mean population size of the regions of 500,000 is associated with a 10% increase (on average) in regional standard deviation (Table 2, models 6 and 7). However, what actually constitutes a low or a high inter-regional level of variation is still somewhat unclear. Country-level variation dominated region-level variation in all the models: σ ν had a median value of 0.36 (95% credible interval 0.27, 0.52), compared to σ u(j) which consistently took an average value of 0.10 in all models. The null model had a range of within country standard deviations from 0.05 to 0.18, whereas the more complex models typically had a range about half this, between approximately 0.07 and 0.14. These within country standard deviations describe the extent of the geographic inequalities between regions within a country that cannot be explained by either mean region size or inequality in region size. To put these values into context, a within-country standard deviation of 0.07 (the country with the lowest degree of inequality between regions) is roughly equivalent to the standardised mortality ratio of a notional geographic unit lying on the 95th centile being 26% higher than that of a region on the 5th centile. For the country with the highest inequalities the standard deviation of 0.14 equates to a 59% excess mortality rate for a region at the 95th centile over a region at the 5th.
Several recent epidemiological studies have addressed the issue of variation in geographical units from the point of view of choosing the hierarchical level most appropriate for the observed data. The general finding is that the smaller the geographical unit used, the better the models explain the data with greater clustering of ecological factors [12, 16, 26, 27]. Although the focus of these studies differs, all are concerned with the size of geographic unit. Whereas the previous studies focused on improving the accuracy with which health attributes can be explained by selecting explanatory factors at different spatial scales, we were concerned with whether differences in geographic structure may complicate direct comparison of health inequalities .
We used Gini coefficients to describe inequalities in regional population sizes within a country as a measure of geographic structure. Inequalities in regional population size were positively correlated with inter-regional variation (σ u(j)) and including this relationship offered an improvement in model fit compared with the random effects model, although this relationship was not significant at conventional statistical levels. Restricting the variation around these relationships to be near-deterministic did yield significant coefficients (as we might expect). One possible motivation for assuming a deterministic relationship would be if there were firm theoretical reasoning to suggest a quantifiable and definite effect of population structure within the hierarchy; we are unaware of any such information in the current literature but suspect that further work in this area may provide clues as to what shape such a relationship may take. However, it would be impossible to generalise our findings to all datasets, whether our models were statistically significant or not, hence there is a need to test these ideas on a case-by-case basis in further datasets at different geographical scales. Indeed further studies including socio-economic factors as explanatory factors would be interesting to see as the patterns we observe may result from underlying causes other than just variations in region size. Undoubtedly international comparisons of socio-economic inequalities in health or mortality would be better assessed using individual data including a harmonised measure of socio-economic position - such as education - where these are available [4, 28].
We also found some evidence, but to a lesser degree than for the Gini coefficient, supporting a positive relationship between the mean region size in a country and inter-regional variation, and including this relationship offered a similar improvement in model fit to the Gini coefficient. However, the lack of countries in our dataset with mean region sizes lying between those in Germany (2.3 million) and Italy (1.3 million) makes us wary of drawing a firm conclusion on the validity of the link between health inequalities and mean region size at this scale.
When using ecologic or aggregate data we must avoid committing the ecological fallacy [29, 30]; this is closely related to the MAUP discussed earlier in so much as it is a bias caused by the aggregation of individual level data. The fallacy is an error in the interpretation of such data and assumes that relationships found at the group level also hold at the individual level. We therefore stress that, in this study, we are not drawing conclusions about individuals but are comparing average rates and inequalities at the population level. It should also be noted that there are various other methods of comparing health inequalities between countries, or indeed between smaller areas or the same areas at different time points. For example, Mackenbach et al  compared health inequalities between socioeconomic groups across 22 European countries using two regression-based measures - the relative index of inequality and the slope of index inequality. Leclerc et al  compared inequalities in mortality between England and Wales, France, and Finland using the Gini coeffient as a measure of health inequality. Whereas Leyland et al  used various measures to compare inequalities in mortality within Scotland over time, including comparing absolute differences in standardised rates between regions and socioeconomic groups at different time points, comparing socioeconomic rate ratios and examining the slope index of inequality. No matter what method is used to explore such health inequalities one must be aware that the administrative structure of the populations under examination may be influencing their interpretation of variations in health.
This study suggests that countries or regions comprising unequally sized geographic units may have a tendency to show greater health inequalities simply because of inherent differences in geographic structure. When examining health inequalities between areas it is important to be aware of the potential for such biases and we recommend that when one presents these health inequalities they should also report a simple measure of inequalities in population structure - such as the Gini coefficient - alongside their results. This may be directly applicable to the UK where historical events have led to fundamental differences in the size distribution of administrative regions in the present day, that may themselves exacerbate or occlude existing health inequalities . Furthermore, a move towards smaller more uniformly sized geographic units in the UK during Local Government Reorganisation in Great Britain between 1995 and 1998  may similarly affect accurate description and comparison of geographic health inequalities across this timeline, if there are effects similar to those reported herein. Further work in this area may provide useful information when seeking compromise between statistical, administrative and cultural conflicts over the definition of population boundaries . Whilst it may be possible to achieve consistency in region size within a country either spatially or over time [13, 14], it is unlikely that consensus will be achieved between countries and comparisons should be made with care. We expect widespread differences between countries in terms of the geographical units used for reporting data and hence the description of health inequalities may be more affected by the hierarchical structure than considered previously. A better understanding of how differences in geographic structure of a population affect the description and interpretation of health inequalities will improve spatio-temporal monitoring of health inequalities and better inform the evaluation of interventions.
The Social and Public Health Sciences Unit is jointly funded by the Medical Research Council and the Chief Scientist Office (CSO) of the Scottish Government Health Directorates (wbs U.1300.00.001.).
- Krieger N: Women and Social-Class - a Methodological Study Comparing Individual, Household, and Census Measures as Predictors of Black-White Differences in Reproductive History. Journal of Epidemiology and Community Health. 1991, 45 (1): 35-42. 10.1136/jech.45.1.35.View ArticlePubMedPubMed CentralGoogle Scholar
- Macintyre S, Maciver S, Sooman A: Area, Class and Health - Should We Be Focusing on Places or People. Journal of Social Policy. 1993, 22: 213-234. 10.1017/S0047279400019310.View ArticleGoogle Scholar
- Martin D: Output areas for 2001. The Census Data System. 2002, Chichester: John Wiley & Sons, 37-46.Google Scholar
- Mackenbach JP, Stirbu I, Roskam AJR, Schaap MM, Menvielle G, Leinsalu M, Kunst AE: Socioeconomic inequalities in health in 22 European countries. New England Journal of Medicine. 2008, 358 (23): 2468-2481. 10.1056/NEJMsa0707519.View ArticlePubMedGoogle Scholar
- Merlo J, Ohlsson H, Lynch KF, Chaix B, Subramanian SV: Individual and collective bodies: using measures of variance and association in contextual epidemiology. J Epidemiol Community Health. 2009, 63 (12): 1043-1048. 10.1136/jech.2009.088310.View ArticlePubMedGoogle Scholar
- Langford IH, Leyland AH, Rasbash J, Goldstein H: Multilevel modelling of the geographical distributions of diseases. J R Stat Soc Ser C-Appl Stat. 1999, 48: 253-268. 10.1111/1467-9876.00153.View ArticlePubMedGoogle Scholar
- Snijders TAB, Bosker R: Multilevel Analysis. 1999, London: Sage Publications LtdGoogle Scholar
- Leyland AH: Increasing inequalities in premature mortality in Great Britain. Journal of Epidemiology and Community Health. 2004, 58 (4): 296-302. 10.1136/jech.2003.007278.View ArticlePubMedPubMed CentralGoogle Scholar
- Oliver L: Shifting Boundaries, Shifting Results: The Modifiable Areal Unit Problem. 2001, Accessed 20 June 2010, [http://www.geog.ubc.ca/courses/geog570/talks_2001/scale_maup.html]Google Scholar
- Openshaw S: The modifiable areal unit problem. Geo Books. 1984, Norwich, 38:Google Scholar
- Schuurman N, Bell N, Dunn JR, Oliver L: Deprivation indices, population health and geography: An evaluation of the spatial effectiveness of indices at multiple scales. Journal of Urban Health-Bulletin of the New York Academy of Medicine. 2007, 84 (4): 591-603.View ArticlePubMedPubMed CentralGoogle Scholar
- Krieger N, Chen JT, Waterman PD, Soobader MJ, Subramanian SV, Carson R: Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: Does the choice of area-based measure and geographic level matter? The Public Health Disparities Geocoding Project. American Journal of Epidemiology. 2002, 156 (5): 471-482. 10.1093/aje/kwf068.View ArticlePubMedGoogle Scholar
- Norman P, Rees P, Boyle P: Achieving data compatability over space and time: create consistent geographical zones. International Journal of Population Geography. 2003, 9 (5): 365-386. 10.1002/ijpg.294.View ArticleGoogle Scholar
- Rees P, Brown D, Norman P, Dorling D: Are socioeconomic inequalities in mortality decreasing or increasing within some British regions? An observational study, 1990-1998. Journal of Public Health Medicine. 2003, 25 (3): 208-214. 10.1093/pubmed/fdg055.View ArticlePubMedGoogle Scholar
- Stafford M, Duke-Williams O, Shelton N: Small area inequalities in health: Are we underestimating them?. Soc Sci Med. 2008, 67 (6): 891-899. 10.1016/j.socscimed.2008.05.028.View ArticlePubMedGoogle Scholar
- Reijneveld SA, Verheij RA, de Bakker DH: The impact of area deprivation on differences in health: does the choice of the geographical classification matter?. Journal of Epidemiology and Community Health. 2000, 54 (4): 306-313. 10.1136/jech.54.4.306.View ArticlePubMedPubMed CentralGoogle Scholar
- Norman P, Purdam K, Tajar A, Simpson L: Representation and local democracy: Geographical variations in elector to councillor ratios. Political Geography. 2007, 26 (1): 57-77. 10.1016/j.polgeo.2006.10.013.View ArticleGoogle Scholar
- World Health Organisation: Atlas of Mortality in Europe: Subnational patterns, 1980/1981 and 1990/1991. 1997, Scientific publication No 75Google Scholar
- Borras JA, Fernandez E, Gonzalez JR, Negri E, Lucchini F, La Vecchia C, Levi F: Lung cancer mortality in European regions (1955-1997). Ann Oncol. 2003, 14 (1): 159-161. 10.1093/annonc/mdg016.View ArticlePubMedGoogle Scholar
- Bland M: An Introduction to Medical Statistics. 1995, Oxford: Oxford Medical PublicationsGoogle Scholar
- Pan American Health Organization: Measuring Health Inequalities: Gini Coefficient and Concentration Index. Epidemiological Bulletin. 2001, 22 (1): 3-4.Google Scholar
- Wagstaff A, Paci P, Vandoorslaer E: On the Measurement of Inequalities in Health. Social Science & Medicine. 1991, 33 (5): 545-557. 10.1016/0277-9536(91)90212-U.View ArticleGoogle Scholar
- Brown MC: Using Gini-Style Indexes to Evaluate the Spatial Patterns of Health Practitioners - Theoretical Considerations and an Application Based on Alberta Data. Social Science & Medicine. 1994, 38 (9): 1243-1256. 10.1016/0277-9536(94)90189-9.View ArticleGoogle Scholar
- Spiegelhalter DJ, Thomas A, Best NG, Lunn D: WinBUGS Version 1.4.1 User Manual. 2004, Cambridge: Medical Research Council Biostatistics UnitGoogle Scholar
- Spiegelhalter DJ, Best NG, Carlin BR, van der Linde A: Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B-Statistical Methodology. 2002, 64: 583-616. 10.1111/1467-9868.00353.View ArticleGoogle Scholar
- Krieger N, Chen JT, Waterman PD, Soobader MJ, Subramanian SV, Carson R: Choosing area based socioeconomic measures to monitor social inequalities in low birth weight and childhood lead poisoning: The Public Health Disparities Geocoding Project (US). Journal of Epidemiology and Community Health. 2003, 57 (3): 186-199. 10.1136/jech.57.3.186.View ArticlePubMedPubMed CentralGoogle Scholar
- Woods LM, Rachet B, Coleman MP: Choice of geographic unit influences socioeconomic inequalities in breast cancer survival. British Journal of Cancer. 2005, 92 (7): 1279-1282. 10.1038/sj.bjc.6602506.View ArticlePubMedPubMed CentralGoogle Scholar
- Stirbu I, Kunst AE, Bopp M, Leinsalu M, Regidor E, Esnaola S, Costa G, Martikainen P, Borrell C, Kalediene R, et al: Educational inequalities in avoidable mortality in Europe. J Epidemiol Community Health. 2010Google Scholar
- Robinson WS: Ecological Correlations and the Behavior of Individuals. American Sociological Review. 1950, 15: 351-357. 10.2307/2087176.View ArticleGoogle Scholar
- Subramanian SV, Jones K, Kaddour A, Krieger N: Revisiting Robinson: The perils of individualistic and ecologic fallacy. International Journal of Epidemiology. 2009, 38 (2): 342-360. 10.1093/ije/dyn359.View ArticlePubMedPubMed CentralGoogle Scholar
- Leclerc A, Lert F, Fabien C: Differential mortality - Some comparisons between England and Wales, Finland and France, based on inequality measures. International Journal of Epidemiology. 1990, 19 (4): 1001-1010. 10.1093/ije/19.4.1001.View ArticlePubMedGoogle Scholar
- Leyland A, Dundas R, McLoone P, Boddy FA: Cause-specific inequalities in mortality in Scotland: two decades of change. A population-based study. BMC Public Health. 2007, 7 (1): 172-10.1186/1471-2458-7-172.View ArticlePubMedPubMed CentralGoogle Scholar
- Office for National Statistics: Gazetteer of the old and new geographies of the United Kingdom. 1999, London: Office for National StatisticsGoogle Scholar
- Flowerdew R, Graham E, Feng Z: The production of an updated set of data zones to incorporate 2001 census geography and data. 2004, University of St AndrewsGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/10/74/prepub