This study compared methodological, procedural and analytical characteristics of the twelve EQ-5D-3L TTO valuation studies. Differences existed in sample size, the number of health states valued and exclusion criteria. All except the Hungarian and Romanian valuation studies were based on the MVH protocol. All studies used the additive 10-parameter model, which represents levels 2 and 3 for each dimension except for the Slovenian study which used a constrained 6-parameter model approach that assumed the relative severity of level 2, “moderate problems”, being similar across dimensions. This method was used in the Slovenian value set due to concerns about the relatively small sample size and limited number of valued health states. Furthermore, in the Polish, Dutch and Italian studies, the translations chosen for describing the levels of severity in health states may render differences in comparison with other value sets. For instance, in the Polish value set; mobility level 3 “confined to bed,” implied being bedridden, therefore, the Polish values may be lower for health states that included level 3 of mobility. However, these differences in valuation techniques and methodologies did not hinder us from pooling the utility values that each country is using for their respective HTA. Based on the published coefficients, we were able to simulate a dataset on which we could estimate the ‘pan-European’ value set. The resulting coefficients can be applied when national values are absent. The pan-European value set would also be an optimum choice when decisions need to consider a European perspective, for instance, for reimbursement decisions at the European level. This contributes towards cross-country harmonization of outcome measures for economic evaluations [31].
As this study aims to provide a means for standardizing multi-country evaluations by combining valuation tariffs from different countries in a particular region (e.g. Europe), one obvious factor to consider is the varying population size of different European countries included in this analysis. In order to account for differences in population size, we applied population size weights, adjusted for clustering at the country level. We found that including these weights for population size complicated the modelling. This may be related to the coefficients of the German value set, which is known to have the highest values, and is weighted with the highest population size [9, 32]. This weighting therefore introduces considerable variance, while it is unclear whether these high values truly represent higher values of the German population, or that the high values are an artefact of the sampling technique employed in the study. Indeed, when catering values for the new EQ-5D-5L, a decade later, the German values converge with other values from European value sets which suggests that the first attempt with the EQ-5D-3L had methodological issues [33].
Given the reasoning above, the application of population size weights in our study should be considered as an illustrative example. When it comes to weighting value sets of different countries, other factors such as socio-demographic, societal, religious, economic and linguistic factors can be included as weights as they may further explain inter-country differences [34]. A flexible modelling technique which can easily incorporate these weights would be helpful to guide our choice of the OLS model to predict the pan-European value set for EQ-5D-3L. Nevertheless, given the small changes in the coefficients, as found in this study, it needs to be investigated whether the incorporation of weights for background variables increases the validity, or rather complicates interpretations. Therefore, application of weights in the analyses and their interpretation may need to be treated with caution.
We used EQ-5D-3L as an illustrative example in the exercise to estimate a pan European value set because of its widespread application in Europe. The same methodology can be applied for the new five level 5-level version of the EQ-5D, or with any other utility questionnaire that uses regression analytical techniques to estimate a value set.
Various previous studies have compared different EQ-5D valuations in an attempt to unify EQ-5D data and generate preference weights for regional general populations. Greiner et al. were one of the first to derive European weights using the EQ-5D – Visual Analogue Scale (VAS) data from 11 European countries [35]. Time trade-off data is preferred over VAS data as this valuation method asks respondents to make a trade-off between the attributable time and HRQoL, much in the same way as a QALY can be interpreted. Olsen et al. also compared time trade-off valuations in four Western countries and three non-Western countries. They concluded that between the four European countries, there is less variance than between value sets of Western and non-Western value sets [36]. Another study compared three EQ-5D valuations in Central and Eastern European countries and further estimated a population norm for this region [37]. These studies thus suggest that a pooled value set depicting averaged European health state values may indeed be a feasible and sensible way forward in health economics research.
Some strengths and limitations merit consideration: despite of the differences among the included valuation studies, we present a flexible approach using published coefficients, which can accommodate more value sets as soon as they become available. This is a pragmatic approach that suggests that coefficients from existing published valuation studies could be combined to generate health state preferences for any specific region, being this Europe or any other geographical area or a sub-set of countries.
One can argue that a starting point for deriving a pooled value set should be the raw data of each country’s national valuation study [14]. However, the major disadvantage with this approach is that data collection for this study depended on the willingness of authors and institutes to share the data. Moreover, data sharing could be limited by constrains enforced by the informed consent, as the data is used for different purposes as described in the informed consent and data is transferred to others than the original research team, which may initiate privacy infringing.
In this study we applied and compared OLS regression, gamma regression, and FMM to best fit the pooled saturated data. We present the pan-European value set using the OLS which was the most pragmatic choice according to goodness of fit, prediction error and model convergence. Even though, the FMM model performs slightly better than the OLS model based on the penalized likelihood criteria (AIC) the model did not achieve convergence after the application of population weights. Therefore, further research into advanced analytic techniques is needed to test various model specifications using the FMM which are beyond the scope of the current paper. Future research to test different hypothesis, for instance, that the probability of belonging to a particular group (class) could also be consequently tested.
We included a sensitivity analysis with addition of the identified interaction terms from the existing valuation studies to the OLS model. Various interaction terms are used in some of the older EQ-5D-3L valuation studies such as N3, I2, I32, D1 However, such interaction terms are not recommended to be included in models in recent valuation studies as they could increase the misprediction errors [38]. Furthermore the use of D1 interaction term has been heavily criticized as it may complicate the model [39].
The UK is no longer a part of the European Union (EU). This also entails that the UK is no longer a part of the regulations regarding therapeutic products, interventions, and evaluations of their effectiveness within the European economic area. Therefore, taking Brexit into consideration, we re-ran the OLS model as a scenario analysis with exclusion of the UK value set. The resulting pan-EU value set can be used for economic evaluations of drugs within the EU context (see Additional file: Table 7).
A limitation of this study is that the samples included in each valuation set were not entirely representative of the general population of the corresponding country [35]. Since some of the value sets are quite old, it is also questionable whether these value sets are still representative of the values of the general population, as population structure in the respective countries have changed over the years. Furthermore, societal differences such as educational status, culture, norms, wealth and on the other hand methodological differences such as elicitation methods, modelling, and quality of data may have influenced the health state valuations at individual country level. We identified that each valuation study had its unique characteristics, its own methodological framework and reasons for inclusions/exclusions. We also recognize that the quality of some value sets may be questionable. For instance, there are inconsistencies within the Portuguese value set where the value of health state 33,331 (- 0.536) is lower than the value of 33,333 ( -0.496). One approach to account for such differences would be to derive a quality score and further adjusting the analyses for it. However, we argue against this approach because the identified differences between studies might be more often properties which solely represent the thorough understanding of the respective country’s preferences rather than differences in quality.