Validation of diabetes mellitus and hypertension diagnosis in computerized medical records in primary health care

Background Computerized Clinical Records, which are incorporated in primary health care practice, have great potential for research. In order to use this information, data quality and reliability must be assessed to prevent compromising the validity of the results. The aim of this study is to validate the diagnosis of hypertension and diabetes mellitus in the computerized clinical records of primary health care, taking the diagnosis criteria established in the most prominently used clinical guidelines as the gold standard against which what measure the sensitivity, specificity, and determine the predictive values. The gold standard for diabetes mellitus was the diagnostic criteria established in 2003 American Diabetes Association Consensus Statement for diabetic subjects. The gold standard for hypertension was the diagnostic criteria established in the Joint National Committee published in 2003. Methods A cross-sectional multicentre validation study of diabetes mellitus and hypertension diagnoses in computerized clinical records of primary health care was carried out. Diagnostic criteria from the most prominently clinical practice guidelines were considered for standard reference. Sensitivity, specificity, positive and negative predictive values, and global agreement (with kappa index), were calculated. Results were shown overall and stratified by sex and age groups. Results The agreement for diabetes mellitus with the reference standard as determined by the guideline was almost perfect (κ = 0.990), with a sensitivity of 99.53%, a specificity of 99.49%, a positive predictive value of 91.23% and a negative predictive value of 99.98%. Hypertension diagnosis showed substantial agreement with the reference standard as determined by the guideline (κ = 0.778), the sensitivity was 85.22%, the specificity 96.95%, the positive predictive value 85.24%, and the negative predictive value was 96.95%. Sensitivity results were worse in patients who also had diabetes and in those aged 70 years or over. Conclusions Our results substantiate the validity of using diagnoses of diabetes and hypertension found within the computerized clinical records for epidemiologic studies.


Background
In recent decades, Computerized Clinical Records (CCR) are being used in the routine medical practice of primary health care (PHC) in the Spanish National Health System (NHS). Indeed, by 2007, 98.8% of general practices were computerized and 88% of the population had a primary care electronic health records [1]. There are many software systems to manage the CCR. In Spain the most frequently implemented CCR is the OMI-AP ® program.
Other countries like Canada, the United States of America, and some European Countries have a significant experience using computerized health databases obtained with PHC records [2][3][4]. The most widely used is the GPRD (General Practice Research Database), which contains information introduced prospectively since 1987 by more than 1,500 general practitioners (GPs) in the PHC of the United Kingdom and includes 7% of the population [5]. The GPRD has been widely used for research studies, with over 700 associated papers published up to date in peer-reviewed journals [5].
Electronic Health Records provide great potential for research, because of their ability to provide data for large populations. Even though the CCR can be used for research, it is important to note that the data are collected primarily for routine clinical rather than for researching purposes. Data quality and reliability must be assessed by researchers who use the resources found in the CCR in order to prevent compromising the results.
There are several approaches found in the literature but there is not an agreed upon standard approach to evaluate the quality and accuracy.
Related validation studies have attempted to show whether the cases with diagnostic codes indeed have that condition. Two recent systematic reviews of validation studies within the GPRD have shown that most (90%) of the coded diagnoses, from many diseases, are 'validated' [6,7]. In order to perform these validations, 83% of them used sources of information external to the GPRD, mostly questionnaires from GPs, hospital reports, copies of the Medical history or comparisons with disease rates obtained from other registries.
The validation of the diagnosis information included in BIFAP (Base de Datos para la Investigación Farmacoepidemiológica en Atención Primaria), populationbased database in Spain containing information of more than 2.5 million patients, was carried out by sending questionnaires to collaborating doctors requiring hospital reports and other information in paper format of the clinical history and doctor's activities reports [8]. External validations were also carried out comparing the results with other sources of information, such as the national health survey and the death registry.
In order to guarantee the quality of the studies performed with the data from the CCR and because few studies have evaluated the quality of those registries, it is necessary to check their validity in our setting. The abovementioned methodology, that used the medical records of the second health care level as the standard reference, does not seem appropriate for chronic diseases such as hypertension (HTN) and diabetes mellitus (DM), which mainly were diagnosed and followed up in the PHC.
The aim of this study is to validate the diagnosis of HTN and DM coded in the CCR of PHC, taking the diagnosis criteria established in the most prominently used clinical guidelines as the gold standard.

Design
Cross-sectional validation study of the diagnoses of DM and HTN in the CCR of PHC.

Setting
The study was carried out in the 21 health centers of the health area 4, in the northeast urban zone of the Community of Madrid, which serves a population of 777,426 people. All health centers have computerized patient records since at least for 10 years.

Sources of information
The CCR administered by the software OMI-AP ® were structured around a list of episodes (problems in the bio-psycho-social sphere, reasons for consultation, etc). The episodes are listed with an alphanumeric code, which corresponds to the International Classification of Primary Care (ICPC), and a description or clinical label. The same code can be described by using many different clinical labels.
It is possible to link these episodes with diagnostic tests, prescriptions, protocols, therapeutic interventions, referrals, temporary incapacity to work reports, and free-text annotations. The laboratory results are automatically recorded in the CCR.
In PHC, text and codes are entered by the GPs during clinical care, as part of their routine clinical practice. The CCR incorporates a user-friendly instrument to encode episodes in order to make it acceptable and useful to the GPs, who are not professional coders. This instrument is a search system based on clinical labels that assigns the code automatically. The program allows the modification of the descriptions, but not the code which could be substituted or erased if deemed necessary.
Data extraction was conducted using the ICPC code from the CCR of patients.

Inclusion and exclusion criteria
The study population comprised those patients who met the following inclusion criteria: had at least one record within the CCR in the health centers of health area 4 as of January 1 st 2010; over 18 years of age; had an ICPC code in their CCR corresponding to DM (32,377 patients with code T90) or to HTN (91,065 patients with codes K86 or K87), respectively.
Patients were not included if they met any of the following exclusion criteria: had not at least one plasma glucose measurement (7.3%) or two Blood pressure (BP) measurements (22.9%) in their CCR for the validation of DM and HTN, respectively.

Samples
Given the absence of reference information over the proportion of incorrectly classified cases (false negatives or false positives), the maximum indetermination was assumed (p = (1-p) = 0.5).
With this assumption, and in order to obtain a confidence of 95% and a precision of 5%, the required sample size was 384 patients for each variable. We increased it up to 423 to adjust for a foreseeable loss of 10% between the sampling and validation of the diagnosis (change of address, death or other reasons).
Four different samples of patients were obtained: with DM code (sample 1), without DM code (sample 2), with HTN code (sample 3), and without HTN code (sample 4) in the CCR. The first two were used to validate the episodes of DM and the latter two were used to validate the episodes of HTN. Samples 1 and 3 were obtained by simple random way whilst samples samples 2 and 4 were obtained by individual matching by age and sex with samples 1 and 3, respectively.

Methods
Diagnostic test aims to correctly classify patients and healthy for a disease or clinical condition. The validation of a diagnostic test is performed by comparing their results, both positives and negatives, with those obtained by the best instrument for measuring the phenomenon under study (gold standard).
In this study, documented diagnosis of DM and HTN were considered as the diagnostic tests. In order to perform the validation, they were compared against the gold standards.
The gold standard for DM were the diagnostic criteria established in 2003 American Diabetes Association Consensus Statement for diabetic subjects, that that were still in effect in 2010 [9]. The reference diagnostic criteria for DM were shown in Table 1.
The gold standard for HTN were the diagnostic criteria established in the Joint National Committee (JNC 7) of the United States published in 2003, which were shown in Table 2 [10].
These criteria agree with recommendations given by other major scientific societies, such as the World Health Organization [11], the European Society for the Study of Diabetes (EASD) [12], the European Society of Hypertension (ESH) [13,14], the European Society of Cardiology [14], the Canadian Diabetes Association [15] and with the recommended guidelines in Spain [16][17][18][19].
Subjects of samples 1 and 2 were considered with DM as long as they fulfilled at least one of the criteria described in Table 1. Subjects from samples 3 and 4 that fulfilled any criteria from Table 2 were considered patients with HTN.
We consulted the computerized medical records of patients to verify concordance with the above criteria. The validation algorithm is shown in the Figure 1.
The evaluation team consisted of three general practitioners, with experience using the OMI-AP program. We conducted a peer evaluation with two reviewers and a third evaluator who resolved discrepancies.

Statistical Analysis
A descriptive analysis of the study population and samples was carried out. The age was expressed by means of average percentiles 25 and 75, and the qualitative variables were summarized with their relative frequency. Table 1 Diabetes Mellitus diagnostic criteria.
In addition, a sensitivity analysis was performed determining the relative impact on predictive values of varying assumptions regarding the prevalence of DM and HTN.
The Sn is the proportion of cases with DM or HTN codes in the CCR among those that fulfilled the diagnosis criteria of Tables 1, respectively. The Sp is the proportion of cases without DM or HTN code in the CCR among those that did not fulfill the diagnosis criteria. Positive predictive value is the probability that people with DM or HTN code in the CCR meet the diagnostic criteria, while negative predictive value is the probability that people without DM or HTN code do not meet the criteria.
We checked whether the Sn and Sp were different by gender, age group and DM Through the homogeneity test based on a χ 2 statistic. In case that the application conditions of the test were not met (any expected cell count under 5), then the Fisher's exact bilateral test was used.
When diagnostic tests are applied to the population, the proportion of those testing positive (apparent prevalence) can not be used as an estimation of the prevalence of a disease in that population, because the Sn and Sp of these tests are usually less than 100%. Thus, Table 2 Hypertension diagnostic criteria.

Hypertension diagnostic criteria
• Average of two or more properly measured, systolic blood pressure readings on each of two or more office visits ≥ 140 mmHg (≥ 130 mmHg for patients with diabetes and chronic kidney disease) • Average of two or more properly measured, diastolic blood pressure readings on each of two or more office visits ≥ 90 mmHg (≥ 80 mmHg for patients with diabetes and chronic kidney disease) • On therapy with antihypertensive medications and diagnosis of Hypertension in medical records.  the proportion of individuals with a positive result includes false positive cases and excludes cases that are false negatives, so in order to estimate the true prevalence of a disease from diagnostic tests; it is required to adjust for the misclassification resulting from the Sn and Sp of the used test. In this study we have used the formula proposed by Rogan and Gladen for this adjustment [20].
The degree of overall agreement between the registered diagnosis and the reference standard, as well as the inter-observer agreement, was determined by the kappa index with their CI. According to this value, the agreement was considered slight (≤ 0.20), fair (0.21-0.40), moderate, (0.41-0.60), substantial (0.61-0.80) or almost perfect agreement (≥ 0.81) [21].
The statistic analysis of the information was performed with SPSS software (version 15.0, SPSS Inc., Chicago, Illinois), the CI of the kappa index and the predictive values were calculated with macros for SPSS of the Applied Statistics Laboratory in Universidad Autónoma de Barcelona: !KAPPA and !DT, respectively [22,23].

Ethical Aspects
In order to ensure the confidentiality, the study was developed as stipulated in the Spanish Personal Data Protection Law. The protocol of the study was approved by the ethics committee of the Hospital Carlos III in Madrid and all of the evaluators signed a confidentiality clause.

Results
The main demographic characteristics of the population of patients over 18 years old attended in the health area 4 with episodes of DM (ICPC T90) and HTN (ICPC K86 or K87) registered in the CCR, as well as the selected samples are described in Table 3.
The 7.3% of patients from sample 2 (without DM code) had to be excluded because there was not at least one fasting plasma glucose. They were 64.5% males with a mean age of 73.6 (SD 14.4) years. There were significant differences in mean age between patients excluded and not excluded.
The 22.9% of patients from sample 4 (without HTN code) had to be excluded because there were not at least two BP measurements. They were 36.1% males with a mean age of 69.1 (SD 11.6) years. There were no significant differences in mean age and female proportion between patients excluded and not excluded from the sample. The prevalence of patients diagnosed with DM was 5.02%, slightly higher in males and in patients aged 65 years or older. The prevalence of patients with diagnosis of HTN was 14.11%, higher in women, in people aged 70 years or older and in patients with diagnosis of DM.
In our study, the diagnosis of DM was confirmed in 99.5% of the cases (sensitivity) and in 99.49% of those without diagnosis of DM did not meet the diagnosis criteria (specificity). There were no significant differences when stratifying by age groups or sex.
The sensitivity of the diagnosis of HTN was 85.22%, decreasing significantly in people over 69 years old (81.76%) and in patients with diagnosis of DM (79.85). The specificity of the diagnosis of HTN was 96.95%, with no significant differences when stratifying by sex or age groups ( Table 4).
The degree of overall agreement between the diagnosis in the CCR and the standard of reference, measured as the kappa index, was almost perfect for DM ( = 0.990), and substantial agreement for the HTN ( = 0.778), as shown in Table 4. The worst result was obtained for HTN in the subgroup of patients with diagnosis of diabetes.
The degree of global inter-observers agreement, measured with the kappa index was very high, both for DM ( = 0.988) and HTN ( = 0.941), and for the different categories of sex, age groups and diagnosed diabetes. In all cases, the kappa index was higher than 0.880. Table 5 shows the apparent prevalences (diagnosed in the CCR), true prevalences (fulfillment of the diagnosis criteria) as well as the positive and negative predictive values. The true prevalences were, for both diseases, higher than the apparent prevalences. The between them was 0.89% for the DM diagnosis and 21.45% for HTN, which increased up to 32.36% in patients diagnosed with diabetes. The probability that the diagnostic criteria could be confirmed in patients with diagnosed DM was 91.23% and 99.98% in HTN.
Given that the PPV is directly proportional to the prevalence of the disease and the NPV is inversely proportional to the prevalence, we estimated the PPV and NPV for different true prevalences of HTN and DM (Table 6).

Discussion
The results of the study show a very high agreement of the diagnosis of DM in the CCR with the gold standard, and also a high sensitivity and specificity of the diagnosis of DM. The information obtained from the CCR provides a good estimation of the true prevalence of the illness, overall and in each category of sex and age groups.
For HTN, there was also a good overall agreement of the diagnosis in the CCR with the gold standard, high Sp and Sn, but lower than for diabetes. In patients with DM subgroup, the agreement was strikingly lower especially at the expense of Sn, because the under-diagnosis of HTN is much higher and the NPV, very influenced by the prevalence, is ostensibly lower. Similar results, of less magnitude, are found in the subgroup of patients aged 70 or over.
In our study the DM diagnosis was confirmed in 99.53% of the cases and HTN in 98.11%.
The systematic review of GPRD validation studies by Herrett and cols [6] shows that the different diagnoses studied were confirmed in 89% of the cases, although DM and HTN were not validated. Only a small proportion of the studies provided quantitative estimates of validity such as sensitivity and specificity.
We have not found published studies that used similar methodology to ours for the validation of HTN and DM diagnoses. For this reason, our results can only be compared with most similar validation studies, which used self-reported diagnosis by patients compared to biometric measures as reference standards.
The Sn obtained in our study for the diagnosis of DM (99.53%) is much higher than 69.7% they found in the DINO study [24], which validates the self-reported diagnosis of diabetes, HTN and hyperlipidemia of a population of 20 years and older in southern Spain. Published studies in other countries, also performed with selfreported diagnosis, present great heterogeneity, with Sn of DM between 58.9% in a Dutch study [25] and 85.2% in Taiwan [26].
Regarding the diagnosis of HTN, our Sn was a little bit lower (85.22%). In Spain, the DINO study estimated the Sn in 49.4% [24], in a subsample of the SUN study the Sn was 90.3% [27], and the EPIC Murcia cohort study obtained a sensitivity of 63.5% [28], taking medical records as the reference standard. In other countries, we found values that oscillate between 34.5% in the Dutch study aforementioned [25], and 82% found in a North-American study [29].
The high specificity obtained for the diagnosis of DM (99.49%) is consistent with the findings of other published studies, where the Sp is situated between 95.2% [30] and 99.6% [24]. The Sp obtained for the HTN (96.95%) is slightly above what was found by other authors, which oscillated between 80% [31] up to 96.8% [24].
The agreement found between the diagnosis of DM registered in the CCR and the fulfillment of the diagnosis criteria were very good (k = 0.990), above the one obtained in the DINO study (K = 0.78) [24].
For HTN, the agreement (k = 0.778) was lower than for diabetes, but higher than those found in the DINO    study (k = 0.51) [24], EPIC-Murcia (k = 0.58) [28] and in the SUN (k = 0.66) [27]. The validation indexes obtained in our study are higher than those found by other authors, possibly because we have checked the diagnosis done by physicians and not those self-reported by patients.
Our study confirms the hypothesis found in other publications in which DM diagnosis has higher validity than HTN diagnosis [25,30,32]. This could be related to the perceived higher seriousness of physicians for DM than for HTN, and due to the DM diagnostic criteria that were changed less frequently and more uniform than those for HTN. In general, the parameters of validity found enable us to realize a precise estimation of the prevalence of diabetes [33] but a sub-estimation of the prevalence of HTN.
The under-diagnosis of HTN is a well-known phenomenon whose magnitude varies greatly in published studies. In a systematic review of 44 studies from different countries published in 2009 [34], the proportion of undiagnosed HTN on a worldwide level was estimated to be 46.2% for men and 58.5% for women. Different Spanish studies offer results from 14.9% in Navarra [27], to 49.4% in Galicia [35] and 31.74% obtained in the PREDIMERC study in Madrid [36]. In our study, we found 21.45% in global and 32.36% in the subgroup of patients with diabetes.
Very little is known about the prevalence of undiagnosed HTN in diabetic patients. In Spain, the DIAPA study found that 56.8% of patients with type 2 DM had BP > 130/85 mmHg, even though they were not previously diagnosed with HTN [37]. The lack of HTN diagnosis in patients with DM could be related to the cutoff values of diagnosis, which are lower for these patients (BP ≥ 130/80 mmHg) [10], and it is possible that some GPs had been using the diagnostic criteria of the general population (BP ≥ 140/90 mmHg).
When stratifying by sex and age groups, there were significant differences in the Sn of HTN with worse results in patients with DM and in those over 69 years, despite the fact that these patients are subject to a large number of revisions and so there were more chances to detect HTN. These results are different than other studies [24,25,27,28,30,32], where better results were obtained in older and diabetic patients. One possible explanation could be that other studies were done through questionnaires with volunteer participants. The methods used may have resulted in a selection bias, since the patients who are more worried about their health and those who have worse perception of their health may be more predisposed to participate.
The true prevalence of DM in our study was 5.06%, which is very similar to the obtained with the BIFAP database (5.8%) [8] and in the Spanish National Health Survey (4.79%) [38] but lower than 8.1% in the PREDI-MERC study in Madrid [36].
The true prevalence of HTN in our study (17.14%) is similar to those obtained with the BIFAP database (16.1%) [8] and in the Spanish National Health Survey (18.89%) [38] but also lower than the findings in other studies in Spain. These studies found about 35% in the adult population [39] and 29.3% in the PREDIMERC study [36].
This differences observed in the magnitude of the prevalences could be due to the age of the patients included in PREDIMERC [36]. The patient range age was between 30 and 74 whereas our study includes all those aged 18 and over. If we had selected people aged 30 or over in our database, the true prevalence would have been 6.87% for DM and 23.72% for HTN.
Moreover, PREDIMERC was undertaken with volunteers, with an overall response rate of 56.4%, which may have produced a selection bias as was mentioned before. Furthermore, we cannot assure that false positives were due to misdiagnosis, simply that the verification of diagnostic criteria could not be met. This may have led to an underestimation of the prevalence in our study.
The prevalence of HTN in diabetic patients obtained in our study (80.22%) is close to the highest found in the studies published, which oscillate between 50% and 84% [37,40].
The study presents some limitations. These are that the information included may not have been completely exhaustive. Because of this a potential selection bias may exist. This is particularly true in view of the proportion of adults in nursing homes, chronic disease hospitals, or those treated in private practice. In addition, as the health area 4 in Madrid covers only urban population, patients living in rural areas were not represented in this study.
Alternatively, more than 95% of citizens have public health coverage with the Spanish National Health System [41], so we suppose that the proportion of assigned persons in the health area 4 who are not included in our study is low.
The high proportion of patients without available information to perform the validation (7.3% had not at least one plasma glucose level and 22.9% had not two or more BP measurements) could be due to the fact that these patients do not usually go to health centers. At least one or two years of active data are basically required in order to include clinical patient records in a study.
In both the GPRD and the BIFAP, GPs who voluntarily participate in the projects are the ones that refer the information, possibly introducing a selection bias (e.g. the pattern of patients care can differ among volunteer physicians and those who are not volunteers). Furthermore, the response rate of physicians to questionnaires can be low, as occurred in BIFAP with the validation of Upper Gastrointestinal Bleeding with a rate of the 58.4% [8]. Our database is exhaustive, containing information from the entire population attended in the health area, and registered by all the professionals, avoiding the possibility of the mentioned biases.
Published studies related to validation are aimed to show whether cases with diagnostic codes indeed have that condition. The use of the PPV as the only measurement presents the inconvenience that it depends on the prevalence of the disease, as shown in Table 6. Another weakness of these validation studies is that, with a few exceptions, they do not address the question of false negatives, which are cases of the disease who have not received a diagnostic code. There were missed cases in which the GPs did not make the diagnosis or when made diagnosis but encoded it wrong [42]. We argue that in any validation study, PPV, NPV, Sn, and Sp should be identified, as far as possible.
The validation of the diagnosis of CCR of PHC has facilitated the detection of areas of improvement in the clinical practice, such as under-diagnosis of HTN with differential classification bias for patients who also suffers diabetes.
These findings can be used to alert clinicians of subgroups for which the interventions could be more beneficial.
The use of secondary sources of information stored in computerized databases enables access data from large populations. This can facilitate the rapid identification of patients for observational studies or inclusion in interventions and may reduce the time and resources needed to obtain results. This greater efficiency constitutes one of the main advantages for using the databases as epidemiological research tools.

Conclusions
The results obtained in this validation enable the usage of both DM and HTN diagnoses codes of the computerized clinical records of PHC as a valid tool, which can be used with confidence to perform epidemiological studies.
However, the HTN diagnosis in the CCR has lower sensitivity than DM diagnosis, especially in diabetic patients. Therefore, in this group of patients, the code of HTN diagnosis in the CCR is not enough in order to detect people without HTN since there would be selected a high amount of false negative results.