This article has Open Peer Review reports available.
What is the value of social values? The uselessness of assessing health-related quality of life through preference measures
© Prieto and Sacristán; licensee BioMed Central Ltd. 2004
Received: 18 December 2003
Accepted: 29 April 2004
Published: 29 April 2004
The use of preference-based measures in the evaluation of health outcomes has extended considerably over the last decade. Their alleged advantage over other types of general instruments in the evaluation of health related quality of life (HRQOL), supposedly lies in the fact that preference measures incorporate values or utilities that reflects the value of social preferences through health states. The objective of this study was to determine whether the use of social preference weights or utilities makes any real difference when calculating scores for the Euroqol (EQ5-D) questionnaire, a HRQOL preference-based measure.
Responses to the EQ5-D of a sample of 10,972 patients from 10 countries enrolled in an observational study of the treatment of schizophrenia in Europe were used for this purpose. Two different methods of scoring the EQ-5D where compared: 'weighting the items' of the questionnaire through the UK official weight coefficients, and 'non-weighting the items'. Pearson's, Spearman's, and two-way mixed parametric intraclass correlation coefficients were used to estimate the association of the scores obtained in both ways.
The association between weighted and unweighted Euroqol scores was extremely high (Pearson's r = 0.91), as was the association between their ranks (Spearman's ρ = 0.93). The intraclass correlation coefficient obtained (0.89) also suggested that the concordance between the score distributions was prominent.
A non-weighted approach to score the EQ5-D is enough to explain a high proportion of variance in scores obtained through the use of utilities. The differential contribution of weights based on population preference values is therefore minimal and, in our opinion, negligible.
The use of preference-based measures in the evaluation of health outcomes has extended considerably over the last decade [1–4]. Their advantage over other types of general instruments in the evaluation of health related quality of life (HRQOL), such as the SF-36 , the Sickness Impact Profile , and the Nottingham Health Profile , supposedly lies in the fact that preference measures result in a single numerical index that reflects the value of social preferences through health states . These numerical indices are finally used to calculate the Quality Adjusted Life Years (QALYs)  required to effect cost-utility studies  (cost-utility analysis is a form of economic evaluation that focuses particular attention on the quality of the health outcomes produced by health programmes or treatments) .
Broadly speaking, the preference-based approach assumes that the social value or 'utility' of a health state is the same as the value of the quality of life of those individuals who are in it . The value or utility of a health state is expressed on a scale from 0 to 1, where 0 is the utility of the state 'dead' and 1 is the utility of the state 'perfect health'. The lower the quality of life associated with a health state, the lower is its utility score on this scale .
Preference-based measures (or 'utility measures', as they are normally called)  may take the form of multi-attribute questionnaires . Currently there are three main multi-attribute questionnaires available: the Quality of Well-Being (QWB) , the Health Utilities Index (HUI) , and the EuroQol (EQ-5D) . These questionnaires can be considered as simple classification systems based on the degree of limitation that individuals indicate for different health dimensions, such as mobility, pain, emotional aspects and social functioning. The answers provided by the patients to each of these dimensions allows the analyst to transform the scores – often referred to as "health profiles" – into a single utility number . The transformation algorithms are based on theoretical fundaments  and on previous research in which one or more valuation techniques (i.e. Standard Gamble (SG), Time Trade-Off (TTO), and Rating Scale (RS))  were used to measure directly the preferences of individuals (usually from community surveys) for different "health profiles".
Transformation algorithms basically consist of weighting each of the answers provided to the items or classification system domains, by means of a coefficient determined by the social preferences obtained empirically by means of a population sample. The final utility score is obtained from a more or less complex combination of the resulting weighted values. The description of the health state of a subject may be summarised, in theory at least, as a health index that reflects the social preferences of the population.
The alleged characteristics of the utility approach make it quite attractive as a measure of HRQOL independent of its usefulness in economic evaluation studies . Utility measures also provide a mechanism for making broad comparisons across an array of clinical settings as well as in the context of assessing population health quality . Many studies have already used utility measures to quantify the impact of interventions on HRQOL or to characterise the severity profile of any health problem . Even the SF-36, which may be the paradigm of measures developed under a non-preference based approach, has been tested for use as a utility measure .
Despite the relative merits of the theoretical arguments made by proponents of the utility approach, in our opinion the empirical issues concerning the assessment of HRQOL through preference based measures remains unresolved. The assumed advantage of weighting each item individually is that valuations for all possible combinations of single health descriptors, and thus all possible health states, are elicited. Nevertheless, the crucial question in seeking social preference weights for items is how much difference it makes to use these differential weights to calculate the composite utility score for a given health state. As some authors have theorised,  it would make a difference if the weighted and unweighted scores on the multi-attribute questionnaire did not correlate highly. On the evidence so far, the use of differential weights seldom makes an important difference [20, 21].
The aim of this study is to determine whether the use of social preference weights or utilities makes any real difference when calculating scores in the preference-based measures in the evaluation of HRQOL.
In order to carry out this study, we chose one of the more popular preference-weighted health state classification questionnaires in Europe, the Euroqol . The Euroqol group has devoted considerable attention and effort to the weighting of its items and has investigated a broad range of modelling approaches for this purpose [22, 23].
In order to test the usefulness of utilities in scoring the EuroQol Descriptive system (EQ-5D), we developed a parallel unweighted scoring rule solely based on patients' answers to EQ-5D; in this way it is possible to determine the value of any health state defined by the questionnaire without having to use any type of social preference for the items. We then compared the EQ-5D scores obtained using the unweighted scoring rule with 'official' weighted scores. These weights were determined on a random sample of the non-institutionalised adult population of the United Kingdom .
The EQ-5D descriptive system
EuroQol Descriptive system
1. No problems walking
2. Some problems walking about
3. Confined to bed
1. No problems with self-care
2. Some problems washing or dressing self
3. Unable to wash or dress self
1. No problems with performing usual activities (e.g. work, study, housework, family or leisure activities)
2. Some problems with performing usual activities
3. Unable to perform usual activities
1. No pain or discomfort
2. Moderate pain or discomfort
3. Extreme pain or discomfort
1. Not anxious or depressed
2. Moderately anxious or depressed
3. Extremely anxious or depressed
Scoring weights for the EQ-5D descriptive system
Health states defined by the EQ-5D may be eventually converted to a single summary or composite index by applying scores from a standard set of values (or preferences) derived from general population samples . Over the past few years, the EuroQol Group has been engaged in several research projects exploring this issue. Values have been elicited for different subsets of EQ-5D health states from respondents in Canada, Denmark, Finland, Germany, Japan, Netherlands, New Zealand, Slovenia, Spain, Sweden, UK, US and Zimbabwe.
EuroQol Scoring Formula based on UK Coefficients (Weights)
Unweighted scoring rule
EuroQol Scoring Formula based on Unweighted Coefficients
The study sample
Considering that the objective of the study was to compare the weighted and unweighted scores of the EQ-5D, the analysis described below could be performed on any sample of subjects answering the EQ-5D (results would be invariant). In our case, and fundamentally for availability reasons, we used answers to the EQ-5D of patients included in an ongoing 3-year, prospective, observational study of the treatment of schizophrenia in Europe. The primary objective of the study is to assess the costs and outcomes of antipsychotic treatment of schizophrenia using antipsychotics. This study is being conducted in 10 European countries (Denmark, France, Germany, Greece, Ireland, Italy, the Netherlands, Portugal, Spain and the UK) A total of 10972 patients were enrolled. Baseline data collection was conducted via a core data collection form that included an self-administered version of the EQ-5D Descriptive system. Details of the design of the study have been presented elsewhere .
Comparison of preference-weighted and unweighted EQ-5D scores
To assess the relevance of the social preference weights (utilities) when analysing EQ-5D scores, the two different methods of scoring the questionnaire, weighted and unweighted, where compared.
Basic descriptive statistics of both score distributions were provided. The comparison of the two scoring alternatives was also performed through a paired design involving weighted and unweighted scores: Spearman's (ρ), Pearson's (r) and two-way mixed parametric intraclass (ICC) correlation coefficients were used to estimate the association of the scores obtained in both ways. A graphical comparison approach (scatterplot) was additionally used to illustrate the degree of association between the scores obtained by the two methods. Analyses were done with SPSS® for Windows®, v. 10.1.3.
In the study, valid answers were obtained for all the items in the questionnaire in a total of 9,991 patients.
Descriptives and Measures of Association of Weighted and Unweighted EuroQol Scores.
Mean (95% CI)
ICC (95% CI)
The correlation coefficient estimates were excellent (Table 4). The association between weighted and unweighted EuroQol scores was extremely high (Pearson's r = 0.91), as was the association between their ranks (Spearman's ρ = 0.93). The intraclass correlation coefficient obtained (0.89) also suggests that, apart from a high association, the concordance between the weighted and unweighted score distributions was prominent.
The results of this study reveal that a simple combination of arbitrary values assigned to the items of the EQ5-D Descriptive system is enough to explain a high proportion of variance in scores obtained through the use of utilities. The differential contribution of weights based on population preference values is therefore minimal, and in our opinion, negligible.
The supposed advantages obtained from the use of the utility approach to measure HRQOL no longer stand if it is possible to generalise these results. The EQ-5D, and therefore, all preference based multi-attribute questionnaires supported by analogous scoring rules, would provide information that is conceptually comparable with information from any non-preference based HRQOL measure (such as the SF-36 or the Nottingham Health Profile (NHP)). As in the case of the non-preference based measures, scores obtained from evaluation of HRQOL through preference-based measures are fundamentally a direct reflection of the answers provided by the individuals to the items in the questionnaire. The results presented here demonstrate, yet again, that weighting answers to the items in the instruments does not imply a significant difference in the final score.
Although this fact seriously questions the conceptual fundament of evaluating HRQOL through preference measures, it does not necessarily jeopardise the validity or reliability of results obtained through the use of such instruments. In any event, it is still necessary to subject results to the scrutiny of their basic psychometric properties through standard methods 
To the contrary of the argument put forward by Brazier and Deverill in 1999 , our view is that the psychometric (read 'non-preference based') and economic approaches (read 'preference based') are not different in relation to conventional measurement criteria because they seek to measure the same concepts. Both in preference based measures and in non-preference based measures alike, primary interest lies in locating the responding individuals at different points on a theoretical linear continuum representing possible levels of HRQOL. For this purpose, total scores are computed by assuming that point values assigned to each possible response to the items form a numerical scale with the properties of order and equal units. Item scoring weights might be assigned by an arbitrary decision of the scale developer, but, as we have already mentioned, this action seldom makes an important difference. Thus, the sole purpose of preference and non-preference based measures is to "scale" the subjects based on their responses (weighted or not) to the items. Torgerson called this scaling approach "Subject-centred approach", where the systematic variation in the reactions of the subjects to the items is attributed to individual differences in the subjects . The items, also called "stimuli" in psychometric jargon, are considered as replications: adding or deleting stimuli from the same stimulus-population at random would have no effect on procedure or results other than those due to the usual sampling fluctuations .
What is indeed true is that the weights used in preference based measures are obtained through a different scaling approach. Torgerson called this "Stimulus-centred" or "Judgement approach" . In the "Stimulus-centred" approach the immediate purpose of the assessment is to scale the stimuli, which alone are assigned scale values. Valuation techniques like SG, TTO and RS form part of this modus operandi. The systematic variation in the reactions of the subjects to the items or stimuli is attributed to differences in the stimuli with respect to a designated attribute. Adding subjects chosen at random from the same population, or deleting subjects at random, would have no effect on either the procedure or the results other than the usual sampling fluctuations .
Although it is obvious that the weights used in preference measures are taken from a Stimulus-centred scaling approach, their action mechanism and the results they provide are clearly defined as "Subject-centred".
In the light of the results presented in this study, we believe that it is time to review the conceptual fundaments related to the evaluation of HRQOL through preference measures. Do these measures really differ from traditional non-preference based measures? Does it make any sense to go on using them? As Feeny argued in another context , one reason for going on using preference based measures is that these instruments provide a single summary score of outcomes that facilitate their interpretation and integration of the same in formulae to calculate the cost-effectiveness ratio in economic evaluations of health interventions. Although this argument may be attractive at first sight, it should not be accepted without further thought. The single summary score produced by preference based measures is the result of a simple combination of the different dimensions that are contained in the instrument. In the case of the EQ-5D, this implies the integration of dimensions such as anxiety/depression and physical mobility, which may not initially appear to be closely related, but may be combined for the purpose of a common, second degree dimensionality that could be called 'General Health'. But, if we permit the integration of disparate dimensions in the EQ-5D, for example, then why should we not permit it in other non-preference based profile measures such as the SF-36? In fact, the authors of the SF-36 have already empirically explored this possibility .
Another reason for going on using preference based measures, also put forward by Feeny , is the integration of the concept of mortality and morbidity in the scores for these scales (in conventional utility scales the state of being dead is assigned a score of 0 and perfect health is assigned a score of 1). On this matter our view is more radical: the scale produced by HRQOL preference based measures is an interval scale, not a ratio scale; this means that the numerical values assigned by the scale are totally arbitrary, and 0 does not imply an 'absolute lack' of HRQOL. The problem with the argument that HRQOL does have a natural zero as death, is that there can be states worse than death , and these states require a score as well. In fact, to respond to this need, the score algorithm of the EQ-5D assigns the value -0.59 to the worst state of health possible when it uses the UK weights. The integration of the concept of mortality is therefore a fallacy, which is even more untenable considering the marginal contribution of social preferences on the scores of preference based measures like the EQ-5D.
Regardless of which theoretical arguments or personal preferences are used for a given type of measure (preference based versus non-preference based), the final outcome regarding the utility of preference based measures must be a result of the evaluation of the quality of information provided by the scores. The determination of the reliability and validity of scores from this type of scales using standardised methods is therefore absolutely essential.
The results of this study also questions the multiple efforts dedicated to obtaining specific national weighting for instruments such as EQ-5D. Given the uselessness of utilities in scoring the EQ-5D, the only point in effecting this type of activity is to obtain rankings or 'league tables' that permit trans-cultural comparison of different health states. In any event, we doubt that the high cost of such effort is really worth it.
It is unquestionable that the concept of utility applied to HRQOL evaluation has played a crucial role in the development of a new discipline linked to the standardised evaluation of the impact of health on the subjective perception of individuals. The evaluation of HRQOL through preference measures has permitted the concept of Quality-Adjusted Life Year (QALY) to be extended as a measure of the value of health outcomes. QALY, in turn, has allowed the numerical representation of the value of health through a single index combining individuals' quantity and quality of life. This hallmark has been outstanding in the definition of certain type of economic analyses (i.e. Cost-Utility).
However, the findings in this study imply a new starting point. Facts show that social preferences do not substantially modify scores on scales that are simply calculated from the combination of the answers provided to their items. The supposed advantages of the preference based measures in comparison with other less sophisticated measures in health states are not so, and it is yet be determined what their differential use is, and whether it really exists.
The debate on the convenience or otherwise of using social preferences in the evaluation of health states is far from being solved. In theory, in government-financed health systems, social decisions are responsible for allocation of resources. However, the supposed objectivity of social preference measures should not neglect the fact that many conceptual, ethical and methodological problems have yet to be solved, and the majority of instruments used have not been designed for planning or allocation of resources.
The patient is becoming the core of the health system. In medicine there is now a concern for the measurement of variables that interest the patient, and it is increasingly important to have a good knowledge of the characteristics of the same, in order to be able to individualise interventions. Probably, optimised allocation of resources should include the identification of all the patient's peculiarities, including his/her own perception of health. In this context, and with the limitations described earlier, we should be asking ourselves what is the true value of society deciding on the health states of individuals (this is not based on conventional clinical variables, for example). Furthermore, if, as this study shows, social preferences do not make any real difference to the scores provided by the individuals themselves, then maybe this is the right moment to leave the "stimulus-centred approach" to one side, and to focus on the "subject-centred approach".
- Neumann PJ, Goldie SJ, Weinstein MC: Preference-based measures in economic evaluation in health care. Annu Rev Public Health. 2000, 21: 587-611. 10.1146/annurev.publhealth.21.1.587.View ArticlePubMedGoogle Scholar
- Garrat A, Schmidt L, Mackintosh A, Fitzpatrick R: Quality of life measurement: bibliographic study of patient assessed health outcomes. BMJ. 2002, 324: 1417-10.1136/bmj.324.7351.1417.View ArticleGoogle Scholar
- Sanders C, Egger M, Donovan J, Tallon D, Frankel S: Reporting on quality of life in randomized controlled trials: bibliographic study. BMJ. 1998, 317: 1191-1194.View ArticlePubMedPubMed CentralGoogle Scholar
- Kind P, Dolan P, Gudex C, Williams A: Variations in population health status: results from a United Kingdom national questionnaire survey. BMJ. 1998, 316: 736-41.View ArticlePubMedPubMed CentralGoogle Scholar
- Ware JE, Gandek B: Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) Project. J Clin Epidemiol. 1998, 51: 903-12. 10.1016/S0895-4356(98)00081-X.View ArticlePubMedGoogle Scholar
- Bergner M, Bobbitt RA, Carter WB, Gilson BS: The Sickness Impact Profile: development and final revision of a health status measure. Med Care. 1981, 19: 787-805.View ArticlePubMedGoogle Scholar
- Hunt SM, McKenna SP, McEwen J, Williams J, Papp E: The Nottingham Health Profile: subjective health status and medical consultations. Soc Sci Med. 1981, 15: 221-229. 10.1016/0271-7123(81)90005-5.Google Scholar
- Feeny D: A utility approach to the assessment of health-related quality of life. Med Care. 2000, 38 (9 Suppl): II151-II154.PubMedGoogle Scholar
- Rosser R: From health indicators to quality adjusted life years: technical and ethical issues. In Measuring the outcomes of medical care. Edited by: Hopkins A, Costain D. 1990, London: Royal College of Physicians of London, 1-17.Google Scholar
- Greenberg D, Pliskin JS: Preference-based outcome measures in cost-utility analyses. A 20-year overview. Int J Technol Assess Health Care. 2002, 18: 461-6.PubMedGoogle Scholar
- Torgerson D, Raftery J: Economics notes: measuring outcomes in economic evaluations. BMJ. 1999, 318: 1413-View ArticlePubMedPubMed CentralGoogle Scholar
- Nord E: Cost-value analysis in health care: making sense out of QALYs. 1999, Cambridge: Cambridge University PressView ArticleGoogle Scholar
- Drummond ME, O'Brien B, Stoddart GL, Torrance GW: Methods for the economic evaluation of health care programs. 1997, Oxford: Oxford University Press, 2Google Scholar
- Kaplan RM, Ganiats TG, Sieber WJ, Anderson JP: The Quality of Well-Being Scale: critical similarities and differences with SF-36. Int J Qual Health Care. 1998, 10: 509-20. 10.1093/intqhc/10.6.509.View ArticlePubMedGoogle Scholar
- Torrance GW, Furlong W, Feeny D, Boyle M: Multi-attribute preference functions. Health Utilities Index. Pharmacoeconomics. 1995, 7: 503-20.View ArticlePubMedGoogle Scholar
- Rabin R, de Charro F: EQ-5D: a measure of health status from the EuroQol Group. Ann Med. 2001, 33: 337-43.View ArticlePubMedGoogle Scholar
- Keeny R, Raiffa H: Decisions with multiple objectives: preferences and value tradeoffs. 1993, Cambridge: Cambridge University Press, 2View ArticleGoogle Scholar
- Brazier J, Roberts J, Deverill M: The estimation of a preference-based measure of health from the SF-36. J Health Econ. 2002, 21: 271-92. 10.1016/S0167-6296(01)00130-8.View ArticlePubMedGoogle Scholar
- Nunnally JC, Bernstein IH: Psychometric Theory. 1994, New York: Mcgraw-HillGoogle Scholar
- Jenkinson C: Why are we weighting? A critical examination of the use of item weights in a health status measure. Soc Sci Med. 1991, 32: 1413-6. 10.1016/0277-9536(91)90202-N.View ArticlePubMedGoogle Scholar
- Prieto L, Alonso J, Viladrich MC, Anto JM: Scaling the Spanish version of the Nottingham Health Profile: evidence of limited value of item weights. J Clin Epidemiol. 1996, 49: 31-8. 10.1016/0895-4356(95)00064-X.View ArticlePubMedGoogle Scholar
- Dolan P: Modeling valuations for EuroQol health states. Med Care. 1997, 35: 1095-108. 10.1097/00005650-199711000-00002.View ArticlePubMedGoogle Scholar
- Dolan P, Roberts J: Modelling valuations for Eq-5d health states: an alternative model using differences in valuations. Med Care. 2002, 40: 442-6. 10.1097/00005650-200205000-00009.View ArticlePubMedGoogle Scholar
- Prieto L, Novick D, Sacristan JA, Edgell ET, Alonso J: A Rasch model analysis to test the cross-cultural validity of the EuroQoL-5D in the Schizophrenia Outpatient Health Outcomes Study. Acta Psychiatr Scand Suppl. 2003, 416: 24-9. 10.1034/j.1600-0447.107.s416.6.x.View ArticlePubMedGoogle Scholar
- Brazier J, Deverill M: A checklist for judging preference-based measures of health related quality of life: learning from psychometrics. Health Econ. 1999, 8: 41-51. 10.1002/(SICI)1099-1050(199902)8:1<41::AID-HEC395>3.3.CO;2-R.View ArticlePubMedGoogle Scholar
- Torgerson WS: Theory and Methods of Scaling. 1958, New York: John Wiley and SonsGoogle Scholar
- Keller SD, Ware JE, Bentler PM, Aaronson NK, Alonso J, Apolone G, Bjorner JB, Brazier J, Bullinger M, Kaasa S, Leplege A, Sullivan M, Gandek B: Use of structural equation modeling to test the construct validity of the SF-36 Health Survey in ten countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol. 1998, 51: 1179-88. 10.1016/S0895-4356(98)00110-3.View ArticlePubMedGoogle Scholar
- Macran S, Kind P: "Death" and the valuation of health-related quality of life. Med Care. 2001, 39: 217-27. 10.1097/00005650-200103000-00003.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/4/10/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.