This article has Open Peer Review reports available.
Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text?
© Ford et al.; licensee BioMed Central Ltd. 2013
Received: 29 November 2012
Accepted: 7 August 2013
Published: 21 August 2013
Primary care databases are a major source of data for epidemiological and health services research. However, most studies are based on coded information, ignoring information stored in free text. Using the early presentation of rheumatoid arthritis (RA) as an exemplar, our objective was to estimate the extent of data hidden within free text, using a keyword search.
We examined the electronic health records (EHRs) of 6,387 patients from the UK, aged 30 years and older, with a first coded diagnosis of RA between 2005 and 2008. We listed indicators for RA which were present in coded format and ran keyword searches for similar information held in free text. The frequency of indicator code groups and keywords from one year before to 14 days after RA diagnosis were compared, and temporal relationships examined.
One or more keyword for RA was found in the free text in 29% of patients prior to the RA diagnostic code. Keywords for inflammatory arthritis diagnoses were present for 14% of patients whereas only 11% had a diagnostic code. Codes for synovitis were found in 3% of patients, but keywords were identified in an additional 17%. In 13% of patients there was evidence of a positive rheumatoid factor test in text only, uncoded. No gender differences were found. Keywords generally occurred close in time to the coded diagnosis of rheumatoid arthritis. They were often found under codes indicating letters and communications.
Potential cases may be missed or wrongly dated when coded data alone are used to identify patients with RA, as diagnostic suspicions are frequently confined to text. The use of EHRs to create disease registers or assess quality of care will be misleading if free text information is not taken into account. Methods to facilitate the automated processing of text need to be developed and implemented.
Electronic health records (EHRs) are a major source of data for epidemiological and health services research and service planning. Recent health policy initiatives in the both the UK and the US highlight the importance of data available within electronic health record systems [1, 2]. Health policy in the UK focuses on increasing transparency of health outcomes and on quality of care, supporting greater patient choice . Clinical trials may increasingly rely on electronic health records for recruitment and assessment of outcomes [4, 5].
Electronic health records in the UK are most advanced in general practice (primary care) where for most practices the electronic health record is the entire health record. Electronic health records contain both structured data entered as codes (Read codes and in the past, Oxford Medical Information System (OXMIS) codes; similar to international classification of disease (ICD) codes used elsewhere in the world) and unstructured free text. Read codes are a hierarchical coding list used throughout UK general practice. Codes and text may be entered in the course of a consultation, by general practitioners (GPs) or other clinical staff such as practice nurses, or coding may be performed by administrative staff before or after the episode of care. In addition, the content of letters and other correspondence with specialists in secondary care and other health care providers can be added to the record as they are received by the practice. Sometimes an intended use of the electronic record system for research or audit is known in advance so that coding can be deliberately used to meet a set of rules or predefined codes. This will reduce the variability and standardise entry. The Quality and Outcomes Framework (QOF) rule-sets in UK primary care are an example of this . QOF financially incentivises GPs to record care given for certain diseases such as diabetes and heart disease in a standardised way and is similar to the recent meaningful use initiatives in the US. However, in most primary care consultations, information is recorded by GPs for clinical and administrative purposes without consideration of its use for research or audit purposes. Hence, there may be inconsistency between GPs in choosing codes for similar cases, and thus collating information from the records is a laborious and complex process necessitating the creation of long lists of codes for each clinical entity or condition [7, 8].
A comprehensive code list allows the full potential of the coded information in the records to be exploited. However, GPs may also enter information into the record as free text. The text is always associated with a code which may or may not relate to the content of the text. Using only coded information to answer research questions may miss important information which is recorded in text. Some studies have suggested coded data alone do not contain sufficient detail to evaluate clinical care or to reliably identify patient groups [9–12]. Results from our earlier study indicated that using coded data alone for case definition could potentially miss or wrongly date cases of rheumatoid arthritis .
However, using the free text in EHRs poses a number of challenges to researchers. The costs of anonymisation of text, to protect patient confidentiality, and the problems of using textual data in large-scale quantitative analyses mean that most research studies using EHRs ignore the information stored in the free text. Technologies to automate access to medical free text are in an early stage of development [4, 6, 13]. There are several possible methods for accessing the information stored in the free text, from searching manually, to automated keyword searching, to the use of more sophisticated computer algorithms such as natural language processing. Of these, keyword searching does not require researcher access to the full text and therefore avoids the need for anonymisation. It can be simply specified and quickly performed, with a keyword search giving quantitative results of how many keyword hits have been found in each patient’s record, and with which codes they were associated. However it does not allow scrutiny of text for negation, qualifiers and other context, and can only offer a rough estimation of information contained in the record. Nevertheless, as a preliminary step towards estimating the amount of information hidden in free text, it is likely to be a valuable tool. Such an approach could also be used to identify a pool of potential candidate cases that would then be reviewed manually for verification.
To describe the prevalence of RA relevant keywords in free text and check for any variation by gender.
To estimate the quantity of information being missed when coded data alone is used.
To describe which codes the keywords are associated with.
To begin to assess the extent to which keywords can augment information in codes to contribute to probabilistic case definition.
The study was approved by the Medicines and Healthcare products Regulatory Agency (MHRA) Independent Scientific Advisory Committee (protocol number 09_033R).
Indicators such as tests, referrals, prescriptions or symptoms based on codes (“indicator code groups”)
Keywords for searching in the free text records
Summary of indicator markers and keyword groups
Indicator codes – based on code
(full list available in Additional file 1)
Inflammatory arthritis diagnosis
Seronegative arthritis, polyarthropathy, arthralgia of multiple joints
Rheumatoid Factor test
RhF , latex test, Rose Waller (included regardless of result)
List of drug names drawn up from British National Formulary
Referral to Rheumatology
Rheum. disorder monitoring, Rheum. treatment change, Rheum. management plan given, Under care of rheumatologist
Joint signs and symptoms
Joint abnormal, joint swelling, reduced joint movement, joint movement painful, joint stiffness, inflammation joint, etc.
Keyword groups – based on text
Example search terms
Full list available in Additional file 2
RA. RhA, rheum arth,
Positive rheumatoid factor test
Positive RhF , rheum fac +
Polyarthitis, seronegative arthritis, inflammatory arthritis
Development of indicator code groups
We drew up hypothetical lists of indicator code groups based on clinical consultation and code-list dictionaries . These were then modified by reviewing the codes actually used in the patients with RA before the diagnostic code was found in their records. These code-lists focussed on indicator code groups considered to be specific to RA, rather than other musculoskeletal conditions. This process, described in detail elsewhere , generated six indicator code groups of interest for the current study: 1) Disease modifying anti-rheumatic drug (DMARD) prescription, 2) referral to rheumatology, 3) initial inflammatory arthritis diagnosis, 4) rheumatoid factor test, 5) synovitis, and 6) joint signs and symptoms. Code-lists for each indicator code group as well as the list of RA diagnostic codes are available in Additional file 1.
Development of keyword searches
Clinicians (rheumatology specialist & GP) drew up lists “a priori”. A rheumatologist (KAD) and two GPs (HS & GR) were asked to write down all the words specialists or GPs might use to describe a firm diagnosis of RA and a less certain diagnosis of an inflammatory type arthritis. These lists were then combined and modified to reflect the combinations of words which would be accessible in the text of the clinical and test records for the keyword search. Therefore although it was likely, as found in our previous study, that a DMARD prescription or a referral to rheumatology might be a good indicator, they were unlikely to be found within the free text in a format easily accessible or interpretable by keyword search.
Access to pre-anonymised text. We had access to 10,000 entries of pre-anonymised text from the GPRD from previous studies including the use of non-steroidal anti-inflammatory drugs (and not relating to the current study population). In total 1307 records which either had any one of the “a priori” terms used in codes or had the term “arthritis” in the text were reviewed. Terms in text that referred to an inflammatory arthritis diagnosis were added to the list created in stage one.
Use of metathesaurus. Lists were supplemented from the Unified Medical Language System Metathesaurus  and frequent spelling errors and abbreviations were added.
Four final categories were identified: 1) rheumatoid arthritis, 2) positive rheumatoid factor test, 3) inflammatory arthritis, and 4) synovitis. These are summarized in Table 1 and the full keyword specification is available in Additional file 2.
Identification of cases
From the target population of permanently registered patients in the study period of 1/1/2005 to 31/12/2008, cases were identified who had a first diagnostic code of RA within the study period, aged 30 years and over at the time of diagnosis, and who had records available from one year before the first coded diagnosis of RA to 14 days afterwards. If an event date had not been entered into the GP system, the date that the record was created was used (0.1% were imputed (10,986 events)). Events were discarded if they occurred before the start date (the latest of patient’s registration date or the date that the practice’s records were considered up-to-standard by the GPRD) or after the end date (the first of patient leaving the practice or the last date that records were received from the practice). Coded records were therefore available from one year before to 14 days after the first coded RA diagnosis.
The extracted text was searched for exact string matches, and for each string of free text within the record we had a flag for whether each of the four keyword groups were present and a word count. The associated Read code was also recorded. Dummy variables were created to indicate the presence/absence of each keyword for each event in the sample. Text extraction & keyword searching were performed on the entire record back to the first of 1 year before 1st RA code, or the 1st DMARD prescription or 1st specific marker date, even if these last two extended to earlier than one year before the first RA code. Keyword searches were undertaken as simple pattern matches where the keyword sequence of characters was identified anywhere in the total free text record irrespective of word boundaries. The search was case insensitive.
The data were prepared using Stata version 11 (Statacorp LP, Texas). For each indicator code group, any relevant code in any record table resulted in a positive hit. This was indicated in the database using categorical dummy variable for each indicator code group. The earliest code within any indicator code group or the earliest occurrence within a keyword group was used to determine the time interval prior to RA code. The Read codes associated with text strings containing keywords were examined by tabulating the frequency of codes used for different categories of keywords. The 20 most frequent codes from each category were then combined and ranked.
The prevalence of indicator code groups and keywords were calculated in men and women and compared using chi-squared tests. The time interval between the first incidence of any indicator code group or keyword and the first coded diagnosis of RA was calculated. Since the time-intervals were skewed, medians and non-parametric tests (Mann–Whitney U) were used to compare groups. Bonferroni corrections were applied for multiple comparisons.
In total 6,387 newly diagnosed cases of RA were identified between 2005 and 2008 and were included in analyses, comprising 2,007 men and 4,380 women. Men were older (median age 62 years [inter-quartile range, IQR 51–72]) than women (60 years [IQR 49–71]; p < 0.001 for age difference).
Prevalence of indicator code groups
Prevalence of indicator markers and keywords in the records of rheumatoid arthritis patients in the year preceding diagnosis, and time interval before RA diagnosis
MEN N = 2,007
WOMEN N = 4,380
Prevalence of marker
Prevalence of marker
(days before RA code)
(days before RA code)
Inflammatory arthritis diagnosis
DMARD prescription (not including steroids)
Referral to Rheum
Joint sign or symptom
Prevalence of keyword
Prevalence of keyword
(days before RA code)
(days before RA code )
Positive RhF test
Prevalence of keywords in free text
As shown in Table 2, keywords for rheumatoid arthritis were found in 29% of patients (N = 1832). Keywords indicating a positive rheumatoid factor test were present in 45% (N = 2,944). In 18.3% of patients (N = 1168) there were words suggesting an inflammatory arthritis in their records and the same number (N = 1168; 18.3%) had keywords indicating synovitis. There were no gender differences in the prevalence of indicator code groups or keywords. Some patients had more than one keyword or more than one hit for each keyword. Of the sample of 6,387, 26.1% (N = 1668) had one keyword, 10.8% (N = 689) had 2 keywords, and 5.8% of patients had 3 or more keywords (N = 372).
Timing of keywords in relation to codes
The indicator code groups under investigation appeared around 1 to 3 months before the RA diagnostic code was found on the record (median interval before RA code for inflammatory arthritis = 71 days (IQR = 18–164); rheumatoid factor test = 46 days (IQR = 7–147); synovitis = 78 days (IQR = 26–180)). The code category found furthest in time from the RA code was joint signs and symptoms, found a median of 133 days before the diagnostic code (IQR = 52–254). Keywords for rheumatoid arthritis were found a median of 32 days before the RA diagnostic code was added (IQR = 0–122). The intervals between keywords and RA code were similar to intervals between the indicator codes and RA code. For example the median time before RA diagnosis for a keyword suggesting inflammatory arthritis was 78.5 days (IQR = 21–184), for a positive rheumatoid factor test was 48 days (IQR = 7–147), and for synovitis was 57 days (IQR = 7–160). Intervals were similar in men and women with no statistically significant differences once corrections were made for multiple comparisons.
Association of keywords with codes
List of the most frequent Read codes used in conjunction with free text containing keywords
Rheumatoid arthritis rank
Inflammatory arthritis rank
Positive RhF test rank
Letter from specialist
Seen in rheumatology clinic
Incoming mail NOS
Pain in joint - arthralgia
Had a chat to patient
Rheumatoid factor screening test
History / symptoms
Seen by rheumatologist
Advice to patient - subject
MED3 - doctor’s statement
Arthralgia of unspecified site
Nursing care blood sample taken
Seen in hospital out-pat.
R.A. latex test
Incoming mail processing
Serum rheumatoid antigen level
Letter from consultant
Synovitis or tenosynovitis NOS
Synovitis and tenosynovitis
Examination of patient
Wrist joint pain
Seen in orthopaedic clinic
MED3 issued to patient
Comparison of information in codes and keywords
Comparison of information available from codes and keywords in year preceding diagnosis in RA patients
Rh Factor test
Combination of codes and keywords as predictors for case definition
Combinations of 2 or more codes and keywords and with RA keyword and DMARD prescription
1 or more codes
2 or more codes
3 or more codes
1 or more codes or keyword
2 or more codes or keyword
3 or more codes or keyword
1 or more codes and 1 or more keyword
1 or more codes and 2 or more keywords
2 or more codes and 1 or more keyword
Combinations with RA keyword
RA keyword and 1 or more codes
RA keyword and 2 or more codes
RA keyword and 3 or more codes
RA keyword and 1 or more other keywords
RA keyword and 2 or more other keywords
RA keyword and 3 other keywords
RA keyword and 1 or more codes or keywords
RA keyword and 2 or more codes or keywords
RA keyword and 3 or more codes or keywords
Combinations with DMARD prescription
DMARD and 1 or more codes
DMARD and 2 or more codes
DMARD and 3 or more codes
DMARD and 1 or more keyword
DMARD and 2 or more keywords
DMARD and 3 keywords
DMARD and 1 or more codes or keywords
DMARD and 2 or more codes or keywords
DMARD and 3 or more codes or keywords
This study population of 6,387 RA patients provides one of the largest studies of the early presentation of RA in general practice using EHRs. Our results suggest that that the process of RA diagnosis takes time and information may be available in free text before a diagnosis is recorded as a Read code. The indicator code groups under investigation (DMARD, referral to rheumatology, joint sign or symptom, synovitis, inflammatory arthritis diagnosis and rheumatoid factor test) were found in between 3% (synovitis) and 55% (rheumatoid factor test) of patients. A previous paper discussed the findings regarding indicator code groups finding they were widespread in RA patient records prior to the diagnostic code but were unlikely to be adequate for describing the full picture of the early presentation of RA or for making up a probabilistic case definition in the absence of an RA diagnostic code .
Findings from the current study suggest that data stored in free text can add to our understanding of the early presentation of RA. By searching for keywords, it was found that additional information was hidden in the text. For example, keywords relating to inflammatory arthritis were present in an additional 14% of patients where coded information relating to inflammatory arthritis was absent; keywords relating to synovitis were found in an additional 17% where synovitis codes were absent, and keywords for rheumatoid factor test were found in an extra 12% of cases where codes for a test were absent. The rheumatoid factor test figures are complicated by the fact that only positive results were searched for in text. The text could have reported additional tests for which no result was recorded, or which were negative, but which were not picked up in the keyword search. This extra information occurred most often close to the time of diagnosis but was present throughout the study period. Time intervals between indicator code groups and the first RA diagnostic code were similar to intervals between the keywords and the RA code, as would be expected in the recording of the same type of information.
The Read codes associated with keywords were not readily predictable. Of the top 35 codes which had keywords in the free text associated with them, only 9 were our pre-identified RA specific indicator codes. Instead, keywords were often associated with administrative codes for referrals and letters or communications from specialists. This makes sense within the context of a disease which presents in primary care but because of diagnostic uncertainty generally results in a referral followed by confirmation of diagnosis and development of a management plan within secondary care. This association of text information with communication type codes also been found in studies of other diseases, for example ovarian cancer . Much of the free text regarding these conditions is likely to be found in letters between GP and specialists which are appended to the record under more general codes.
Strengths and limitations of our study
This study offers one of the biggest sample sizes of RA patients in the literature and allowed a detailed look at the diagnostic process in primary care which is missing from the literature. There are few publications, for example, on the proportion of musculoskeletal patients referred over time from primary to secondary care [9, 22]. It is also among the first to try to quantify the amount of additional relevant information available in free text. However, a major limitation of this work is that we did not look at the text directly, due to the costs of anonymisation, and therefore were not able to allow for negation or other qualifiers surrounding keywords. It is therefore feasible that some of the occurrences of the keywords are for an absence, such as no evidence of synovitis, or the term relates to another person, for instance mother had a polyarthropathy. We may therefore be over-estimating the extent of relevant information held in text. One study for example  found that specificity of case finding dropped from 98.2% to 38.3% when negation terms were not included in the text search. It should be noted, however, that the presence of the keyword indicates that an inflammatory arthritis is being considered or discussed with the patient, and the clustering around the time of diagnosis suggests that many of these terms will apply to the patients. Even if only half of the keywords occurring in patients without any indicator markers were related to the actual presence of, for example, synovitis in the patient, this would still increase the prevalence of synovitis by more than 8%. Despite the lack of qualifiers and negation, automated keyword searching could also be a useful tool for selecting a smaller set of cases whose records could then be manually scrutinised for specific terms.
The selection of codes for the indicators and the keywords for the searches is critical to the validity of this work. The development of the indicator markers was a rigorous process that has been described in full elsewhere . Similarly we tried to triangulate the information we used when preparing the keyword lists in order to allow for as many alternative expressions and misspellings as possible. One possible explanation for the extra information in text is that we selected the wrong codes for the disease indicators, thereby missing important coded information. However, from the association between keywords and communication/letter codes as well as sick note codes (e.g. MED3 – doctor’s statement) it seems that information is often put in text alongside a more generic code. The process of entering communication received from hospitals is not managed in a standard way by GP practices. Sometimes letters are scanned and added to the records as a pdf file and therefore are not searchable in the database. In other cases the entire letter is entered into the free text section and can be searched. Another issue is that the transmission of free text from the practice to the GPRD can be suppressed by the GP using a double backslash at the start of the entry. This is unlikely to affect letters, but results in an unknown amount of free text relating to clinical consultations being withheld, again affecting estimates of the amount information available. There are therefore likely to be practice-level differences in the availability of the free text which will again lead to an under-estimation of the keywords but also has implications for technologies to increase access to textual data. It would also be worth extending the keyword list to include other indicators such as DMARDs and referrals and further work will include these in searches of free text.
For free text information entered by GPs in the course of their consultation, there is likely to be a wide range of ways to express similar concepts and it is known that many entries have spelling errors or use abbreviations. We only picked up the most frequent misspellings and abbreviations in the keyword specifications. This would lead to an under-estimate of the occurrence of keywords in the record. A full exploration of the free text by hand is planned and will help us to understand more about how information is entered by GPs in the course of their consultations, including understanding more fully the range of abbreviations used and the different ways that signs and symptoms may be described. Qualifiers and negation will be taken into account during this process, resulting in a highly accurate estimation of the information held in free text about RA presentation and symptoms.
A further limitation of this study is that we have not yet investigated how often these keywords occur in control data, that is, in patients with no RA diagnostic code. There is a theoretical possibility that the distribution of these keywords would be the same in control cases as it is for RA cases. Future work will address this possibility by comparing rates of indicators and keywords in control data to ascertain their predictive value for finding cases of RA.
How results fit with other literature
Other authors have also highlighted the potential deficits from coded data in epidemiological studies . Using live clinical data such as the GPRD for epidemiological studies requires mass application of case identification criteria, rather than examining each case individually. This can lead to high, or unknown, rates of misclassification of cases , which bias the outcome of studies, especially those examining rates of certain tests or treatment. Studies which define cases using only diagnostic codes may miss cases where the diagnostic information is held in free text or coded several weeks after the diagnosis has been received. A further issue is the unknown quality of consultation recording and coding which is poorly established in the literature. It appears this may vary both between practitioners and practices but also between diseases . GPs may regularly use the codes most readily available in the system even if they are inappropriate, and express the clinical details in free text descriptions . Free text has been used for case finding and to assess quality of care in complex conditions such as diabetes and cancer [24, 25]. Several authors have shown that including data from free text increases case ascertainment for both acute conditions such as respiratory infections and chronic diseases such as angina [26–28] as well as RA  and can enhance estimates of symptoms in cancer presentation by 40% .
Ethnographic studies have the potential to help understand how social practices shape the records we used for research . We need field studies on the use electronic record systems, in order to understand why coding and free text are used as they are. Records are not created by a single person but rather by collaborative work practices that are carried out for complex reasons . There is additionally a tension between the use of records by health-care providers who value flexibility and expressivity, and those of researchers who value structure and categorisation . Early findings from the human-computer interaction work-strand of our project show that doctors often choose not to use specific diagnostic codes early in the disease process. Sometimes there is clinical uncertainty, but sometimes coding structures do not facilitate the recording of precise clinical findings and doctors need “exit strategies” to be able to report unexpected clinical exceptions . Doctors’ concerns are more centred on creating records that are useful to them and their team at the point of care, rather than on creating records that will be accessible for secondary uses. There are a number of influences that affect the degree of coding used and choice of codes and these operate at policy, local, system and individual levels.
Implications of our findings and further work
We deliberately chose a complex non-incentivised condition which posed a considerable challenge to recording in code, so our findings may not be generalisable to other more clear-cut or incentivised conditions. A systematic review of quality of coding suggested that completeness of coding may be related to distinctiveness of diagnosis . Our results lead to speculation that cases may be missed if coded data alone were used to identify patients with possible rheumatoid arthritis, before a definitive diagnosis is recorded. For epidemiological studies, an estimate of false negatives (that is, patients with the disease but not identified by the case finding algorithm) is useful to give an indication of bias within the study . Including free text in case finding algorithms may increase the potential for identifying patients without diagnostic codes in these studies, thereby reducing bias. If so, it becomes imperative that systematic ways of automatically extracting and assessing information in free text are developed.
We found no evidence of differences between men and women in the balance of coded and textual data or in the timing of recording. Hence, although data based on codes may be incomplete, in this initial investigation there was no evidence of biased recording by gender or timing. This needs to be explored for other patient characteristics. The possibility of systematic differences in the way information across social groups or different co-morbidities is recorded remains and would have important implications for secondary use of such clinical databases.
The greatest hurdle to the more widespread use of text is the technological challenge to automate or semi-automate processing. We have laid the basis for methods that will allow us to further investigate extracting information concealed in free text. It is of interest that much of the keyword information was found in letters from specialists and other referral communication type text. Letters are much easier to process using computer algorithms than GPs’ clinical notes due to fewer idiosyncrasies and abbreviations in the language used, although consultation notes will still need to be scrutinised for extracting information such as presenting symptoms . In future work we will add negation detection algorithms and model the context in which the keyword occurs, as well as expanding the indicators which are searched for in free text. We have obtained promising initial results in pilot experiments into deriving abbreviations and synonyms of indicators, using unsupervised machine learning techniques . Other groups have had success with various text-processing algorithms in identifying RA cases and have even found these algorithms are portable between settings [18, 19]. We will also investigate methods to automate the process of augmenting the initial keyword list using sample data and resources like UMLS. Once full information has been extracted from the free text, we will apply statistical methods such as cluster analysis to combinations of coded and textual information to estimate which are the best to use for probabilistic case definition for RA. These search algorithms can then be tested on “control” data where no diagnostic code for RA exists, to verify their ability to find cases using contextual information. These methodologies may extend to other complex, non-incentivised diseases and may be useful for case definition in general for studies using EHRs.
The results of the current study suggest that additional information is available in free text and that this would make a useful supplement to coded information in probabilistic case definition. The use of EHR data in creating disease registers or to assess quality of care may be subject to bias if free text information is not taken into account in case-finding algorithms. Scrutiny of the full free text currently comes at a high cost in terms of anonymisation and researcher time. Automating the extraction of information from free text may help to provide additional information to maximize the utility of EHRs for research purposes.
The study is supported by the Wellcome Trust, grant number 086105/Z/08/Z the PREP (Patient Record Enhancement Project) study.
- Blumenthal D: Launching HITECH. NEnglJ Med. 2010, 362 (5): 382-385. 10.1056/NEJMp0912825.View ArticleGoogle Scholar
- Blumenthal D, Tavenner M: The “meaningful Use” regulation for electronic health records. NEnglJ Med. 2010, 363 (6): 501-504. 10.1056/NEJMp1006114.View ArticleGoogle Scholar
- Department of Health: Liberating the NHS: An Information Revolution. A consultation on proposals. 2010, London, UK: Department of HealthGoogle Scholar
- Atreja A, Achkar J, Jain A, Harris C, Lashner B: Using technology to promote gastrointestinal outcomes research: a case for electronic health records. Am J Gastroenterol. 2008, 103 (9): 2171-2178. 10.1111/j.1572-0241.2008.01890.x.View ArticlePubMedGoogle Scholar
- van Staa T-P, Goldacre B, Gulliford M, Cassell J, Pirmohamed M, Taweel A, Delaney B, Smeeth L: Pragmatic randomised trials using routine electronic health records: putting them to the test. BMJ. 2012, 344: e55-10.1136/bmj.e55.View ArticlePubMed CentralGoogle Scholar
- British Medical Association: Quality and Outcomes Framework guidance for GMS contract 2011/12. 2011, London, UK: British Medical AssociationGoogle Scholar
- Nicholson A, Ford E, Davies K, Smith H, Rait G, Tate R, Peterson I, Cassell J: Optimising Use of electronic health records to describe the presentation of rheumatoid arthritis in primary care: a strategy for developing code lists. PLoS ONE. 2013, 8 (2): e54878-10.1371/journal.pone.0054878.View ArticlePubMedPubMed CentralGoogle Scholar
- Dave S, Petersen I: Creating medical and drug code lists to identify cases in primary care databases. PharmacoepidemiolDrug Saf. 2009Google Scholar
- Jordan K, Porcheret M, Kadam UT, Croft P: The use of general practice consultation databases in rheumatology research. Rheumatology. 2006, 45 (2): 126-128.View ArticlePubMedGoogle Scholar
- Manuel DG, Rosella LC, Stukel TA: Importance of accurately identifying disease in studies using electronic health records. BMJ. 2010, 341: c4226-10.1136/bmj.c4226.View ArticlePubMedGoogle Scholar
- Jordan K, Porcheret M, Croft P: Quality of morbidity coding in general practice computerized medical records: a systematic review. Fam Pract. 2004, 21 (4): 396-412. 10.1093/fampra/cmh409.View ArticlePubMedGoogle Scholar
- Rhodes ET, Gonzalez TV, Laffel LMB, Ludwig DS: Accuracy of administrative coding for type 2 diabetes in children, adolescents, and young adults. Diabetes Care. 2007, 30 (1): 141-143. 10.2337/dc06-1142.View ArticlePubMedGoogle Scholar
- Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF: Extracting information from textual documents in the electronic health record: a review of recent research. YearbMed Inform. 2008, 128-144.Google Scholar
- Nice: Rheumatoid arthritis. The management of rheumatoid arthritis in adults. Clinical guideline 79. 2009, London: NICEGoogle Scholar
- Tate AR, Martin AG, Murray-Thomas T, Anderson SR, Cassell JA: Determining the date of diagnosis–is it a simple matter? The impact of different approaches to dating diagnosis on estimates of delayed care for ovarian cancer in UK primary care. BMC Med Res Methodol. 2009, 9: 42-10.1186/1471-2288-9-42.View ArticlePubMedPubMed CentralGoogle Scholar
- Pascoe SW, Neal RD, Heywood PL, Allgar VL, Miles JN, Stefoski-Mikeljevic J: Identifying patients with a cancer diagnosis using general practice medical records and cancer registry data. Fam Pract. 2008, 25 (4): 215-220. 10.1093/fampra/cmn023.View ArticlePubMedGoogle Scholar
- Hillestad R, Bigelow J, Bower A, Girosi F, Meili R, Scoville R, Taylor R: Can ElectronicMedical record systems transform health care? Potential health benefits, savings. And costs. Health Aff. 2005, 24 (5): 1103-1107. 10.1377/hlthaff.24.5.1103.View ArticleGoogle Scholar
- Carroll RJ, Eyler AE, Denny JC: Naïve electronic health record phenotype identification for rheumatoid arthritis. AMIA Annu Symp Proc. 2011, 2011: 189-196.PubMedPubMed CentralGoogle Scholar
- Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA, Boomershine CS, Lasko TA, Xu H, et al: Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc. 2012, 19: e162-e169. 10.1136/amiajnl-2011-000583.View ArticlePubMedPubMed CentralGoogle Scholar
- Fact Sheet UMLS® Metathesaurus®. http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html,
- Tate AR, Martin AGR, Ali A, Cassell JA: Using free text information to explore how and when GPs code a diagnosis of ovarian cancer: an observational study using primary care records of patients with ovarian cancer. BMJ open. 2011, 1: e000025-10.1136/bmjopen-2010-000025.View ArticlePubMedPubMed CentralGoogle Scholar
- Linsell L, Dawson J, Zondervan K, Randall T, Rose P, Carr A, Fitzpatrick R: Prospective study of elderly people comparing treatments following first primary care consultation for a symptomatic hip or knee. Fam Pract. 2005, 22 (1): 118-125.View ArticlePubMedGoogle Scholar
- Hanauer DA, Englesbe MJ, Cowan Jr JA, Campbell DA: Informatics and the american college of surgeons national surgical quality improvement program: automated processes could replace manual record review. J Am Coll Surg. 2009, 208: 37-41. 10.1016/j.jamcollsurg.2008.08.030.View ArticlePubMedGoogle Scholar
- Voorham J, Denig P: Computerized extraction of information on the quality of diabetes care from free text in electronic patient records of general practitioners. J Am Med Inform Assoc. 2007, 14 (3): 349-354. 10.1197/jamia.M2128.View ArticlePubMedPubMed CentralGoogle Scholar
- Hanauer DA, Miela G, Chinnaiyan AM, Chang AE, Blayney DW: The registry case finding engine: an automated tool to identify cancer cases from unstructured, free-text pathology reports and clinical notes. J Am CollSurg. 2007, 205 (5): 690-697.Google Scholar
- Jordan K, Jinks C, Croft P: Health care utilisation: measurement using primary care records and patient recall both showed bias. J Clin Epidemiol. 2006, 59: 791-797. 10.1016/j.jclinepi.2005.12.008.View ArticlePubMedGoogle Scholar
- Pakhomov SS, Hemingway H, Weston SA, Jacobsen SJ, Rodeheffer R, Roger VL: Epidemiology of angina pectoris: role of natural language processing of the medical record. Am Heart J. 2007, 153 (4): 666-673. 10.1016/j.ahj.2006.12.022.View ArticlePubMedPubMed CentralGoogle Scholar
- DeLisle S, South B, Anthony JA, Kalp E, Gundlapallli A, Curriero FC, Glass GE, Samore M, Perl TM: Combining free text and structured electronic medical record entries to detect acute respiratory infections. PLoS ONE. 2010, 5 (10): e13377-10.1371/journal.pone.0013377.View ArticlePubMedPubMed CentralGoogle Scholar
- Liao KP, Cai T, Gainer V, Goryachev S, Zeng-Treitler Q, Raychaudhuri S, Szolovits P, Churchill S, Murphy S, Kohane I, et al: Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken). 2010, 62 (8): 1120-1127. 10.1002/acr.20184.View ArticleGoogle Scholar
- Koeling R, Tate AR, Carroll JA: Automatically estimating the incidence of symptoms recording in GP free text notes. Proceedings of MIXHS’11 2011. 2011, Glasgow, Scotland, UKGoogle Scholar
- Greenhalgh T, Swinglehurst D: Studying technology use as social practice: the untapped potential of ethnography. BMC Med. 2011, 9 (1): 45-10.1186/1741-7015-9-45.View ArticlePubMedPubMed CentralGoogle Scholar
- Swinglehurst D, Greenhalgh T, Myall M, Russell J: Ethnographic study of ICT-supported collaborative work routines in general practice. BMC Health Serv Res. 2010, 10 (1): 348-10.1186/1472-6963-10-348.View ArticlePubMedPubMed CentralGoogle Scholar
- Rosenbloom ST, Denny JC, Xu H, Lorenzi NM, Stead WW, Johnson KB: Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc. 2011, 18: 181-186. 10.1136/jamia.2010.007237.View ArticlePubMedPubMed CentralGoogle Scholar
- Zheng K, Hanauer DA, Padman R, Johnson MP, Hussain AA, Ye W, Zhou X, Diamond HS: Handling anticipated exceptions in clinical care: investigating clinical use of ‘exit strategies’ in an electronic health records system. J Am Med Inform Assoc. 2011, 18: 883-889. 10.1136/amiajnl-2011-000118.View ArticlePubMedPubMed CentralGoogle Scholar
- Carroll J, Koeling R, Puri S: Lexical aquisition for clinical text mining using distributional similarity. Proceedings of the 13th International Conference on Text Processing and Computational Linguistics (CICLing): 2012; IIT. 2012, Delhi, India: Springer Lecture Notes in Computer Science, 232-246.Google Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/13/105/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.