Skip to main content

Should policy-makers and managers trust PSI? An empirical validation study of five patient safety indicators in a national health service



Patient Safety Indicators (PSI) are being modestly used in Spain, somewhat due to concerns on their empirical properties. This paper provides evidence by answering three questions: a) Are PSI differences across hospitals systematic -rather than random?; b) Do PSI measure differences among hospital-providers -as opposed to differences among patients?; and, c) Are measurements able to detect hospitals with a higher than "expected" number of cases?


An empirical validation study on administrative data was carried out. All 2005 and 2006 publicly-funded hospital discharges were used to retrieve eligible cases of five PSI: Death in low-mortality DRGs (MLM); decubitus ulcer (DU); postoperative pulmonary embolism or deep-vein thrombosis (PE-DVT); catheter-related infections (CRI), and postoperative sepsis (PS). Empirical Bayes statistic (EB) was used to estimate whether the variation was systematic; logistic-multilevel modelling determined what proportion of the variation was explained by the hospital; and, shrunken residuals, as provided by multilevel modelling, were plotted to flag hospitals performing worse than expected.


Variation across hospitals was observed to be systematic in all indicators, with EB values ranging from 0.19 (CI95%:0.12 to 0.28) in PE-DVT to 0.34 (CI95%:0.25 to 0.45) in DU. A significant proportion of the variance was explained by the hospital, once patient case-mix was adjusted: from a 6% in MLM (CI95%:3% to 11%) to a 24% (CI95%:20% to 30%) in CRI. All PSI were able to flag hospitals with rates over the expected, although this capacity decreased when the largest hospitals were analysed.


Five PSI showed reasonable empirical properties to screen healthcare performance in Spanish hospitals, particularly in the largest ones.

Peer Review reports


The Spanish National Health Service, like others, has become influenced by the Patient Safety movement. Evidence from two reports on Spanish hospitals, following other international works on adverse events [17], inspired the debate. The first one, showed an in-patient incidence of adverse events ranging from 5.6% to 16.1%, being avoidable between 17% and 41% of them [8]. The second one, found an incidence of adverse events amenable to health care up to 10.1% [9]. As a matter of fact, these findings contributed to steer the inclusion of Patient Safety Indicators (PSI) within the sets of National and Regional Quality Indicators, being modestly used by health care authorities to assess health care performance.

The Spanish National Health Service (NHS) experience is built on the insight from the Healthcare Cost and Utilization Project by the US Agency for Healthcare Research and Quality [10] and the requirements by the OECD [11]. In spite of the efforts made in building a valid tool concerns remain about whether PSI are appropriate to inform hospital performance. Beyond the need of local adaptation [12, 13], most of the caveats have pointed out to flaws in their capacity to attribute excess-cases to hospitals by detecting true incident adverse events [1423]. Less has been written on their empirical properties, mainly because of their local nature; in particular, to what extent PSI show systematic variation on adjusted-incidence (as opposed to random) and, their ability to provide precise estimates and therefore, being sensitive to detect providers over the expected. In this sense, several works on similar topics, have partially addressed some of these issues [2426].

This paper aims at testing the empirical properties of five PSI as well as their ability to respond relevant questions for concerned users; thus: a) Are differences in PSI rates across hospitals systematic?; b) Do PSI measure differences among hospital-providers as opposed to differences among patients?; and, c) Are measurements precise enough and able to detect providers with a higher (lower) than expected number of cases?


Study design, population and setting

An empirical validation study, based on administrative data, was carried out. All 2005 and 2006 publicly-funded hospital discharges were used to retrieve eligible patients. In order to reduce random noise on estimates, hospitals with less than 30 eligible cases were excluded.

Five PSI were analyzed for the purposes of this study: Death in low-mortality DRGs (MLM); decubitus ulcer (DU); postoperative pulmonary embolism or deep vein thrombosis (PE-DVT); infections due to medical care, including catheter-related infections (CRI) and postoperative sepsis (PS). The number of cases (numerators) and eligible admissions (denominators) are shown in Table 1. The election of these five indicators was based on a previous report on the validity of ARQH PSI indicators for the Spanish case [23].

Table 1 PSI adjusted-incidence and variation across hospitals

For the purpose of this study a Spanish version from the AHRQ PSI algorithms was used. PSI definitions by AHRQ -4.1 version- were subject to a local validation process, accounting for differences with respect to the US healthcare system (i.e., ICD 9th version and DRG version in use, as well as some coding characteristics) with a view of improving face validity for the Spanish context. Although it has been described elsewhere [23, 27], it might be useful to highlight that a dedicated consultation group involving clinicians and coders set about to examine and adapt as needed, both numerators and denominators for each indicator. In the particular case of MLM -an empirically built indicator- the list of low mortality DRGs was re-defined for the Spanish case. The overall correlation between original AHRQ and Spanish PSI definitions as to flag events in the hospitals under study was high across the five indicators, ranging from 0.75 in PE-DVT to 0.95 in PS.

Main endpoints

Three main endpoints were studied: a) Systematic variation defined as an Empirical Bayes value different to zero; b) Cluster effect defined as a rho statistic value different to zero; c) Sensitivity as the statistically significant difference between the observed and the expected, as provided by the residual analyses in a multilevel approach.


Adjusted-incidence (I) for each PSI -except MLM- and hospital were calculated. Crude incidence was used in the case of MLM due to its quasi-sentinel event nature. Variation in incidence was calculated using the ratio of variation between hospitals in percentile 95 and percentile 5 (RV95-5), and the ratio of variation between hospitals in percentile 75 and percentile 25 (RV75-25).

Methods intended to respond the aforementioned questions on PSI empirical properties were carried out, once variation in the incidence of adverse events was calculated. Hereinafter, we describe these methods.

Are differences across hospitals systematic rather than random?

We used an observed to expected approach, being the observed the counts of adverse events in each hospital under study, and the expected the predicted cases from a logistic regression considering as covariates the recorded age, sex and comorbidities for each patient. (An adaptation from the ARQH version [28] was used to retrieve comorbidities)

Given both observed and expected counts, the Empirical Bayes statistic (EB) was estimated following a two-step hierarchical model. The first step assumes that, conditional on the risk ri, the number of counts yi follows a Poisson distribution, yi|ri ~ Poisson (eiri), whereas in the second one, heterogeneity in rates is modelled adopting a common distribution π for the risk ri (or for its logarithm), ri ~ π (r|θ), with θ the vector of parameters of the density function. EB statistic is based on the assumption that the log-relative risks are normally and identically distributed, log (ri) ~ N(μ, σ2).

In order to assess the alternative hypothesis, confidence intervals for the observed statistics were derived. In order to avoid parametric assumptions on the distribution of observed cases, we used a non-parametric methodology -a sampling with 2,000-time re-sampling method for each one of the simulated samples. Credibility intervals from percentiles 2.5 and 97.5 were obtained [29].

Do PSI measure differences among hospital-providers as opposed to differences among patients?

Classically, risk adjustment has been used to compare providers, assuming that all patients have a homogenous propensity to have the outcome of interest, wherever the place they are treated. We could otherwise hypothesize that this propensity is more similar among patients within a hospital than among patients from different hospitals -this would be the so called cluster effect. If true, classical methods ignore this effect and mislead the true estimates of variation. Alternatively, the multilevel approach considers the cluster effect (heterogeneity across hospitals) in the variance estimation, producing sounder estimates and a better understanding on how context (i.e., hospital of treatment) affects event rates [26].

In our study, to answer the above mentioned question, the existence of cluster effect (hospital effect) was tested by using a 2-level logistic modelling, where patients were nested into hospitals. The outcome variable was the PSI of interest, and the covariate variables were age, sex and the Elixhauser's comorbidities (EC) [28]. A model was tailored for each PSI (except MLM, which is considered a quasi-sentinel event), testing EC as covariates, taking into consideration the clinical reasoning -i.e., not all EC were used in all PSI-, and the magnitude of the association (OR ≥ 2) to avoid spurious findings due to the massive samples used in the study. The multilevel model was an extension of the previously estimated individual logistic models (c statistic was used to assess their goodness of fit) [30].

The degree of similarity of PSI events among providers was tested by using the rho statistic and its confidence intervals (type 1 error of 5%). The unobserved individual error followed a logistic distribution with individual variance equal to π2/3 [26].

Finally, the Median Odds Ratio (MOR) statistic (and its confidence intervals), a measure of the variation among clusters (hospitals in our study) was estimated by comparing pairs of patients with the same covariates from two, randomly chosen, different clusters [31]. MOR provides information on how heterogeneity across hospitals increases the individual odds of experiencing the outcome of interest.

Are measurements able to detect hospitals with a higher than expected number of cases?

This is a key question in the study as PSI are infrequent events, and imprecise measures and poor sensitivity are expected.

Given the existence of cluster effect, the natural way to assess the statistically significant difference between each hospital PSI rate and the expected rate, is to compute (and plot) shrunken residuals derived from the multilevel method. Shrunken residuals would disentangle the true hospital variation from that due to random [32]..

For the purposes of this study, the residual in each hospital and its standard error were estimated. The residual (μj) would represent the difference between the observed and the expected rate (μoj), being the expected the estimated average PSI rate for all the hospitals under study. Residual graphs exhibiting each hospital effect (and its confidence interval) around the average value (constant value for all hospitals as the expected one) were plotted. Residuals were assumed to follow a Gaussian distribution, N ~ (0, 1).

Data sources

The 2005 and 2006 hospital discharges dataset (CMBD) was used to obtain numerators and denominators for each indicator -i.e. PSI inclusion and exclusion criteria. CMBD records the activity performed by all publicly funded hospitals across the country, enforced to provide this information in a yearly-basis. The register records, in a systematic and homogenous way, information from each patient discharge; specifically: age, sex, diagnosis of admission, secondary diagnoses (up to 30), length of stay, nature of the admission, discharge status and, diagnostic and therapeutic procedures performed. The register started off its activity in the mid 90s.


A total of 6.2 million discharges (between 171 and 175 hospitals depending on the indicator) were retrieved, once the new Spanish definitions were implemented. Admissions at risk ranged from 612,590 in post-operative sepsis to 2,954,018 in catheter-related infection. Adjusted-incidence ranged from 0.54 deaths per 1,000 patients admitted in low-mortality DRGs to 17.3 in postoperative sepsis per 1,000 eligible patients. (Table 1)

Are differences among hospitals systematic?

The highest variation in the rate of adverse events among hospitals was observed in MLM [RV5-95 = 12.88 (CI95%: 9.36 to 14.98); RV25-75 = 7.06 (CI95%: 4.77 to 7.88)] being the smallest variation that in PS [RV5-95 = 1.31 (CI95%: 1.24 to 1.35); RV25-75 = 1.12 (CI95%:1.10 to 1.14)]. Figure 1 allows a visual comparison of the variation across the five PSI.

Figure 1
figure 1

Variation in adjusted-incidence by PSI. Each dot represents the adjusted-incidence of adverse events in a specific hospital. Incidence is computed as a mean-centred log-incidence to allow the comparison among events with different basal incidence. Legend: y axis: log-adjusted-incidence. x axis: (left to right) Mortality in Low-Mortality DRGs, Decubitus Ulcer, Catheter-related Infection, Post-operative Pulmonary Embolism or Deep-vein Thrombosis and Post-operative Sepsis

In accordance to the Empirical Bayes statistic, variation was observed to be systematic in all indicators, ranging from 0.19 (CI95%: 0.12 to 0.28) in the case of PE-DVT to 0.34 (CI95%: 0.25 to 0.45) in DU (Table 1).

Do they measure differences among hospital-providers?

Multilevel logistic regressions were modelled to determine the effect of the hospital, once patient case-mix was adjusted. Although most of the variance was explained by patient-related factors ranging from 64% in PS to 79% in DU in accordance to the area under the curve, still a significant proportion of the variance was explained by the hospital: from a small rho value of 6% in the case of MLM (CI95%: 3% to 11%) to a high rho value of 24% (CI95%: 20% to 30%) in CRI. (Table 2)

Table 2 Multivariate analyses.

In the median case, as expressed by MOR, the variance among hospitals increased the individual risk expressed by ORs: by a 53% (MOR = 1.53 (CI95%:1.35 to 1.81) in the case of MLM, by a 79% in the risk of having DU attributable to the care received, by more than 2.6 times in the risk of experiencing a CRI, a 53% of suffering a PE-DVT after surgery and a 69% of having a PS.

Are measurements precise enough and able to detect hospitals with a higher than expected number of cases?

As observed in Figure 2, after the risk adjustment, a remarkable amount of hospitals were found to be statistically positioned above the expected -average rate of adverse events predicted for the hospitals under study. So, 19 hospitals (11% of the sample) in the case of MLM, 46 hospitals (26%) in DU, 114 hospitals (35%) in CRI, 39 hospitals (22%) in PE-DVT, and 53 hospitals (31%) in PS were flagged as "underperformers".

Figure 2
figure 2

Shrunken residuals (and standard errors) by PSI. y axis: random effect (standard error). x axis: hospitals sorted by random effect a. Mortality in Low-Mortality DRGs. Note: Random effect (and standard error) after modelling the cluster effect. No patient variables were adjusted as Mortality in Low-Mortality DRGs is considered a sentinel-like event. b. Decubitus ulcer. Note: Random effect (and standard error) after modelling the cluster effect. Patient variables adjusted in the model were: age, sex, paralysis, other neurological disorders, diabetes with chronic complications, weight loss and fluid and electrolytic disorders. c. Catheter-related infections. Note: Random effect (and standard error) after modelling the cluster effect. Patient variables adjusted in the model were: age, sex, peripheral vascular disease, paralysis, weight loss, fluid and electrolytic disorders. d. Postoperative PE or DVT. Note: Random effect (and standard error) after modelling the cluster effect. Patient variables adjusted in the model were: age, sex, pulmonary circulation disease, paralysis, lymphoma, metastatic cancer, solid tumor w/o metastasis, coagulopathy and weight loss. e. Postoperative sepsis. Note: Random effect (and standard error) after modelling the cluster effect. Patient variables adjusted in the model were: age, sex, congestive heart failure, paralysis, and weight loss


Five PSI have been considered for empirical validation in public acute-care hospitals across Spain. All of them showed systematic variability (variation beyond chance), were proven to have cluster effect, and were able to detect hospitals above the expected. Nevertheless, several questions should be drawn out to provide a nuanced statement on their usefulness.

Is the estimated variation systematic or due to chance?

Except in the case of MLM, since it is considered a quasi-sentinel event, we should know more about the basal distribution of adverse events to properly answer this question; however, we might assume, given the nature and rationale behind the safety indicators, that this distribution is expected to be close to zero.

Our approach was precisely based on testing the alternative hypothesis throughout the estimation of robust Empirical Bayes confidence intervals against zero as the null value. The precision of the estimated intervals together with the distance between the lower limit and the zero value (the closest figure corresponded to 0.12, in PE_DVT) support the hypothesis that the variation observed is systematic, rather than random.

Is the observed variation due to hospital-providers, rather than to patients?

If this was not the case, PSI would not be useful in describing what they are aimed to, which is to elicit differences attributable to health care.

Our approach sought to elicit the hospital effect by estimating the existence of variation beyond the case-mix of patients treated -throughout the namely cluster effect. As mentioned in the results, in the studied PSI a noticeable part of variation was attributed to the hospital where the patients were treated. However, it might be argued that in a multilevel approach, this finding is quite dependant on the goodness of the risk adjustment -the worse the adjustment at patient level, the higher the proportion of variance that could be eventually explained by the hospital-level. This is particularly true in the case of studies using administrative data, where the limited information available on specific patient characteristics might reduce the goodness of risk-adjustment methods.

A way to mitigate this limitation is to reduce the extra-variance due to differences in case-mix that the model is unable to capture, by modelling the largest hospitals. These are teaching hospitals with more than 450 beds, able to provide high-tech services, and ultimately, homogeneous with regard to the patient case-mix, particularly in studies where sample size is as huge as ours.

The results of this exercise showed a significant reduction on rho-statistic values, backing the hypothesis that the strategy of risk-adjustment was missing some relevant patient characteristics. Even though this finding, cluster effect remained: rho-statistic equals 0.06 (CI95%: 0.03 to 0.11) in MLM; 0.05 (CI95%: 0.03 to 0.07) in DU; 0.10 (CI95%: 0.07 to 0.14) in CRI; 0.02 (CI95%: 0.01 to 0.03) in PE-DVT; and, 0.03 (CI95%: 0.03 to 0.05) in PS.

Are results dependant on the coding practices affecting Elixhauser comorbidities?

A particular phenomenon that could also affect the cluster estimates, and ultimately the reliance on PSI, is the differential coding intensity across hospitals. In fact, the number of secondary diagnoses has been already proven to influence the international comparisons [21]. In theory, if this variation was closely related to coding intensity in hospitals, the cluster effect would suffer an important reduction when the number of secondary diagnoses was considered as a factor in the multilevel models; otherwise, it would be very much related to the patients, thus affecting the risk adjustment estimates.

For the purpose of this exploration the number of secondary diagnoses was categorized using the median value (4 secondary diagnoses) as a threshold. In general terms, when both models were compared, a clear reduction in the Elixhauser comorbidity β coefficients, together with stable rho-value estimates, were observed. (Additional file 1) Given that the number of secondary diagnoses absorbed part of the variance in the new model and beta coefficients changed, variation is also expected in the random effects estimation for each hospital. However, an excellent correlation (Pearson coefficient values) between the original random effects and the new ones was found: 0.83 in post-operative sepsis, 0.86 in post-operative PE-DVT, 0.94 in decubitus ulcer and 0.96 in Catheter-related infection. On the other hand, except in the case of decubitus ulcer the changes in the statistical nature of the random effect (i.e. hospitals found as statistically different that average turned into statistically similar, and the other way round) were null or negligible.

Are PSI precise enough to detect hospitals with rates above the expected?

Although PSI are quite infrequent events, shrunken residuals from the multilevel analysis have been proven precise enough to detect hospitals above the expected. Figure 2 showed some quite straightforward images on this capacity. Nevertheless, determining in what manner cluster effect might be influenced by either outlier hospitals or the extra-variance attributable to the mix of hospitals within the sample is also needed.

With regard to the former, the estimation barely changed once those outlier values -easily identifiable at the two ends of the distribution in Figure 2- were excluded (data not shown). Most important is the latter one. To understand this effect, new residuals were estimated and plotted in those most a priori homogeneous centres, the largest ones as described in previous paragraphs. As observed, except in the case of MLM where heterogeneity across hospitals was the underlying reason for results (just 4 out of 47 hospitals were statistically above the expected in this second analysis), in the remaining PSI, this capacity held noticeably high: 23% of the hospitals were flagged above the expected in decubitus ulcer, up to 36% in catheter-related infection, 25% in the case of postoperative pulmonary embolism or deep vein thrombosis, and up to 28% in the case of postoperative sepsis. (Additional file 2)

Should policy-makers and managers trust PSI?

Our work aimed at shedding light on some empirical properties that PSI are supposed to accomplish, in order to be useful for safety measurement and, ultimately, allow concerned users an informed quality management. Thus, representing systematic variation across providers -ruling out randomness as an alternative explanation of the differences-, and flagging hospitals as potential underperformers regardless the mix of patients they treat. However, a proper use requires debating upon two lessons learnt in this study, and reflecting upon other aspects that were not part of our work.

As for the lessons learnt with the studied PSI, due to the aforementioned flaws in adjusting patient-risks, we need to be aware that hospitals with more complexity might be signalled as false bad performers, particularly if they do not properly report secondary diagnoses. Secondly, the hospital effect (cluster effect) does exist, quite consistently throughout different statistic models; however, its magnitude clearly decreases when studying homogeneous hospital-providers. Although obvious, this message directly points towards comparing comparables, particularly, when risk adjustment is expected to be sub-optimal.

As for the reflection on other issues not addressed in this exercise, it is worth pointing out that the study of the empirical properties is just a partial view on PSI's validity. Further debate upon other validity issues ought to be pursued in order to fully trust on PSI usage. As for this purpose we have to be able to answer whether PSI measure what are supposed to measure. In this work, we have assumed construct validity since PSI were carefully developed for safety measurement purposes, [10, 11] and face validity has been granted in advance for the Spanish case, by carrying out an ad hoc face-validity project [23]. However, criterion validity -the ability for an indicator to flag true positive cases and true negative cases by comparison with a gold standard- has to be specifically addressed, in context. Fortunately, for the Spanish NHS, a recent piece of research on surgical discharges shed some light on criterion validity [33]. In general terms, the five PSI were proven to have a quite good performance in terms of positive likelihood ratio (+LR). The most conservative estimation yielded a + LR of 26.8 in decubitus ulcer, a + LR of 406.3 in catheter-related infection, a + LR of 149.3 in PE-DVT and a + LR of 25.32 in postoperative sepsis. These figures seemed high enough to adopt the use of these PSI as a screening tool; except in the case of decubitus ulcer, clearly affected by underreporting (false negative cases) and the existence of present-on-admission ulcers (false positive cases).

Some additional effort should be made on evaluating the PSI stability over time (out of the scope of this work), but in the meantime, taking the studied PSI as screening tools, assessing wisely the limits pointed out along this work in specific contexts, might help to identify those centres from which best practice lessons can be drawn out and those where intervention is clearly needed.


Five PSI showed reasonable empirical properties to screen healthcare performance in Spanish hospitals, particularly in the largest ones. However, ability to flag hospitals beyond the expected was limited in Mortality in Low-Mortality DRGs due to its larger standard errors, and risk for hospitals misclassification in decubitus ulcer remained.



Agency for health research and quality


Confidence interval


Hospital discharges dataset


Infections due to medical care: including catheter-related infections


Diagnostic-related groups


Decubitus ulcer


Empirical Bayes Statistic


Elixhauser's comorbidities

ICD 9th :

International Classification of Diseases: version 9th


Death in low-mortality DRGs


Median Odds Ratio


Spanish National Health Service


Odds ratio


Postoperative pulmonary embolism or deep vein thrombosis


Postoperative sepsis


Patient safety indicators


Ratio of variation


  1. Brennan TA, Leape LL, Laird NM, Hebert L, Localio AR, Lawthers AG, et al: Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard medical practice study I. N Engl J Med. 1991, 324: 370-376. 10.1056/NEJM199102073240604.

    Article  CAS  PubMed  Google Scholar 

  2. Thomas EJ, Studdert DM, Burstin HR, Orav EJ, Zeena T, Williams EJ, et al: Incidence and types of adverse events and negligent care in Utah and Colorado. Med Care. 2000, 38: 261-271. 10.1097/00005650-200003000-00003.

    Article  CAS  PubMed  Google Scholar 

  3. Wilson RM, Runciman WB, Gibberd RW, Harrison BT, Newby L, Hamilton JD: The Quality in Australian Health Care Study. Med J Aust. 1995, 163: 458-471.

    CAS  PubMed  Google Scholar 

  4. Vincent C, Neale G, Woloshynowych M: Adverse events in British hospitals: preliminary retrospective record review. BMJ. 2001, 322: 517-519. 10.1136/bmj.322.7285.517.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Schioler T, Lipczak H, Pedersen BL, Mogensen TS, Bech KB, Stockmarr A, et al: Incidence of adverse events in hospitals. A retrospective study of medical records. Ugeskr Laeger. 2001, 163: 5370-5378.

    CAS  PubMed  Google Scholar 

  6. Davis P, Lay-Yee R, Schug S, Briant R, Scott A, Johnson S, et al: Adverse events regional feasibility study: indicative findings. N Z Med J. 2001, 114: 203-205.

    CAS  PubMed  Google Scholar 

  7. Baker GR, Norton PG, Flintoft V, Blais R, Brown A, Cox J, et al: The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada. CAMJ. 2004, 170: 1678-1686. 10.1503/cmaj.1040498.

    Google Scholar 

  8. Aranaz JM, Limón R, Requena J, Gea MT, Núñez V, Bermúdez MI, y el Grupo de trabajo del Proyecto IDEA: Incidencia e impacto de los efectos adversos en dos hospitales. Rev Calidad Asistencial. 2005, 20 (2): 53-60. 10.1016/S1134-282X(08)74723-7.

    Article  Google Scholar 

  9. Ministerio de Sanidad y Consumo: Estudio Nacional sobre los Efectos Adversos Ligados a la Hospitalización (Informe Febrero 2006). Madrid. 2006, 169-

    Google Scholar 

  10. Agency for Healthcare Research and Quality: Patient Safety Indicators. (Accessed, July 2011)

  11. OECD: Healthcare Quality Indicators Project Overview. (accessed March 2012)

  12. Bottle A, Aylin P: Application of AHRQ patient safety indicators to English hospital data. Qual Saf Health Care. 2009, 18 (4): 303-308. 10.1136/qshc.2007.026096.

    Article  CAS  PubMed  Google Scholar 

  13. Raleigh VS, Cooper J, Bremner SA, Scobie S: Patient safety indicators for England from hospital administrative data: case-control analysis and comparison with US data. BMJ. 2008, 337: a1702-10.1136/bmj.a1702. doi:10.1136/bmj.a1702

    Article  PubMed  PubMed Central  Google Scholar 

  14. Romano PS, Mull HJ, Rivard PE, Zhao S, Henderson WG, Loveland S, Tsilimingras D, Christiansen CL, Rosen AK: Validity of selected AHRQ patient safety indicators based on VA National Surgical Quality Improvement Program data. Health Serv Res. 2009, 44 (1): 182-204. 10.1111/j.1475-6773.2008.00905.x.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Rosen AK, Itani KM: Validating the patient safety indicators in the Veterans health administration: are they ready for prime time?. J Am Coll Surg. 2011, 212 (6): 921-923. 10.1016/j.jamcollsurg.2010.12.053.

    Article  PubMed  Google Scholar 

  16. Cevasco M, Borzecki AM, O'Brien WJ, Chen Q, Shin MH, Itani KM, et al: Validity of the AHRQ patient safety indicator "Central venous catheter-related bloodstream infections". J Am Coll Surg. 2011, 212 (6): 984-990. 10.1016/j.jamcollsurg.2011.02.005.

    Article  PubMed  Google Scholar 

  17. Cevasco M, Borzecki AM, Chen Q, Zrelak PA, Shin M, Romano PS, et al: Positive predictive value of the AHRQ Patient Safety Indicator "Postoperative sepsis": implications for practice and policy. J Am Coll Surg. 2011, 212 (6): 954-961. 10.1016/j.jamcollsurg.2010.11.013.

    Article  PubMed  Google Scholar 

  18. Zhan C, Battles J, Chiang Y, Hunt D: The Validity of ICD-9-CM Codes in Identifying Postoperative Deep Vein Thrombosis and Pulmonary Embolism. Jt Comm J Quality Patient Saf. 2007, 33 (6): 326-331.

    Google Scholar 

  19. Houchens R, Elixhauser A, Romano P: How often are potential 'Patient Safety Events' Present on Admission?. Jt Comm J Quality Patient Saf. 2008, 34 (3): 154-163.

    Google Scholar 

  20. Glance LG, Osler TM, Mukamel DB, Dick AW: Impact of the present-on-admission indicator on hospital quality measurement: experience with the Agency for Healthcare Research and Quality (AHRQ) inpatient quality indicators. Med Care. 2008, 46 (2): 112-119. 10.1097/MLR.0b013e318158aed6.

    Article  PubMed  Google Scholar 

  21. Drösler SE, Romano PS, Tancredi DJ, Klazinga NS: International Comparability of Patient Safety Indicators in 15 OECD Member Countries: A Methodological Approach of Adjustment by Secondary Diagnoses. Health Serv Res. 2011, doi: 10.1111/j.1475-6773.2011.01290.x

    Google Scholar 

  22. Gallagher B, Cen L, Hannan EL: Readmission for selected infections due to medical care: expanding the definition of a patient safety indicator. Advances in Patient Safety: From Research to Implementation (Volume 2: Concepts and Methodology). Edited by: Henriksen K, Battles JB, Marks ES, et al. 2005, Rockville (MD): Agency for Healthcare Research and Quality (US)

    Google Scholar 

  23. Ministerio de Sanidad y Consumo: Validación de indicadores de calidad utilizados en el contexto internacional: indicadores de seguridad de pacientes e indicadores de hospitalización evitable. (Accessed August 2011)

  24. van Dishoeck AM, Looman CW, van der Wilden-van Lier EC, Mackenbach JP, Steyerberg EW: Displaying random variation in comparing hospital performance. BMJ Qual Saf. 2011, 20 (8): 651-657. 10.1136/bmjqs.2009.035881.

    Article  CAS  PubMed  Google Scholar 

  25. Spiegelhalter DJ: Handling over-dispersion of performance indicators. Qual Saf Health Care. 2005, 14 (5): 347-351. 10.1136/qshc.2005.013755.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Ohlsson H, Librero J, Sundquist J, Sundquist K, Merlo J: Performance evaluations and league tables: do they capture variation between organizational units? An analysis of 5 Swedish pharmacological performance indicators. Med Care. 2011, 49 (3): 327-331. 10.1097/MLR.0b013e31820325c5.

    Article  PubMed  Google Scholar 

  27. Spanish Ministry of Health and Consumers Affaires: Validation of Patient Safety Indicators (PSIs) for the Spanish National Health System. Summary 2008. (Accessed August 2011)

  28. Agency for Healthcare Research and Quality: Comorbidity Software. (Accessed June, 2011)

  29. Ibáñez B, Librero J, Bernal-Delgado E, Peiró S, López-Valcarcel BG, Martínez N, Aizpuru F: Is there much variation in variation? Revisiting statistics of small area variation in health services research. BMC Health Serv Res. 2009, 9: 60-10.1186/1472-6963-9-60.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Hosmer DW, Lemeshow S: Applied Logistic Regression. 2000, New York: John Wiley & Sons, Inc, 2

    Book  Google Scholar 

  31. Larsen K, Merlo J: Appropriate assessment of neighbourhood effects on individual health: integrating random and fixed effects in multilevel logistic regression. Am J Epidemiol. 2005, 161 (1): 81-88. 10.1093/aje/kwi017.

    Article  PubMed  Google Scholar 

  32. Merlo J, Chaix B, Yang M, Lynch J, Råstam L: A brief conceptual tutorial of multilevel analysis in social epidemiology: linking the statistical concept of clustering to the idea of contextual phenomenon. J Epidemiol Community Health. 2005, 59 (6): 443-449. 10.1136/jech.2004.023473.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Bernal-Delgado E, Abadía-Taira B, García-Armesto S, Martínez-Lizaga N, Ridao-López M, y el grupo Zoni, Atlas VPM: Validación de criterio de los indicadores de seguridad de pacientes (Patient Safety Indicators) DoW1/2011. (Accessed July 2011)

Pre-publication history

Download references


The work benefited from a Public Grant funded by the Instituto de Salud Carlos III (FIS PI06/90260). The work also received partial funding from the Ministerio de Salud, Política Social e Igualdad (formerly, Salud y Consumo) as part of a commissioned research. This article is part of the works by the Altas VPM group Authors are indebted with the Spanish Healthcare Authorities participating in the project, which allowed the use of regional hospital discharge databases. We are also particularly obliged with Ander Arrazola†, Yolanda Montes, Isabel Rodrigo, Teresa Salas and Araceli Díaz for their contribution on the face validity project that underpinned this piece of research. We are also grateful to Prof. Juan Merlo for their valuable comments upon multilevel modelling. This piece of research is embedded within the works develop under the Health Services and Health Policy Research Program, research collaborative between the Centro Superior de Investigación en Salud Pública in Valencia y el Instituto Aragonés de Ciencias de la Salud in Aragón.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Enrique Bernal-Delgado.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All the authors are guarantors of the study. All of them had full access to all the data, and take responsibility for the integrity and the accuracy of the analysis and results. EBD, SGA and SPM take responsibility on the article design, results interpretation and drafting. NML, BAT and JBP, contributed specifically to data management, and data analysis. All the authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: The effect of the number of secondary diagnoses. It shows the recalibration of each model using as a factor the number of secondary diagnoses. Tables show both the estimates before and after the adjustment. (DOC 92 KB)


Additional file 2: Shrunken residuals (and standard errors) by PSI. It shows the residuals and standard errors for the largest hospitals in the sample (n = 47). (DOC 38 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Bernal-Delgado, E., García-Armesto, S., Martínez-Lizaga, N. et al. Should policy-makers and managers trust PSI? An empirical validation study of five patient safety indicators in a national health service. BMC Med Res Methodol 12, 19 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: