Piloting an automated clinical trial eligibility surveillance and provider alert system based on artificial intelligence and standard data models

Background To advance new therapies into clinical care, clinical trials must recruit enough participants. Yet, many trials fail to do so, leading to delays, early trial termination, and wasted resources. Under-enrolling trials make it impossible to draw conclusions about the efficacy of new therapies. An oft-cited reason for insufficient enrollment is lack of study team and provider awareness about patient eligibility. Automating clinical trial eligibility surveillance and study team and provider notification could offer a solution. Methods To address this need for an automated solution, we conducted an observational pilot study of our TAES (TriAl Eligibility Surveillance) system. We tested the hypothesis that an automated system based on natural language processing and machine learning algorithms could detect patients eligible for specific clinical trials by linking the information extracted from trial descriptions to the corresponding clinical information in the electronic health record (EHR). To evaluate the TAES information extraction and matching prototype (i.e., TAES prototype), we selected five open cardiovascular and cancer trials at the Medical University of South Carolina and created a new reference standard of 21,974 clinical text notes from a random selection of 400 patients (including at least 100 enrolled in the selected trials), with a small subset of 20 notes annotated in detail. We also developed a simple web interface for a new database that stores all trial eligibility criteria, corresponding clinical information, and trial-patient match characteristics using the Observational Medical Outcomes Partnership (OMOP) common data model. Finally, we investigated options for integrating an automated clinical trial eligibility system into the EHR and for notifying health care providers promptly of potential patient eligibility without interrupting their clinical workflow. Results Although the rapidly implemented TAES prototype achieved only moderate accuracy (recall up to 0.778; precision up to 1.000), it enabled us to assess options for integrating an automated system successfully into the clinical workflow at a healthcare system. Conclusions Once optimized, the TAES system could exponentially enhance identification of patients potentially eligible for clinical trials, while simultaneously decreasing the burden on research teams of manual EHR review. Through timely notifications, it could also raise physician awareness of patient eligibility for clinical trials. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-023-01916-6.


Introduction
This document explains and provides guidelines for annotation of common inclusionary and exclusionary patient clinical trial eligibility criteria found in clinical notes. It also serves to maintain consistency across all annotators. The natural language processing tool will learn from the expert annotations and then automatically extract this information from electronic health records of various types. Annotations will be done to help localize spans of text that contain evidence for numerical scores and narrative evidence to support them. The annotation will be done using an online annotation tool: INCEpTION.
A PDF version of the rules, definitions, and examples in this project are available by clicking on the 'Guildelines' button at the top of any INCEpTION page.

Annotation
Text Class 1 Filler text precedes the real annotation which can also be followed by fluff.
Class 2 Some annotations are associated with attributes -Attribute: Value

Project Organization
In the provided examples, highlighted text provides examples of what to annotate. Click on the highlighted span to see how it was annotated (and to see any attributes associated with it). Good examples may appear in other formats not documented below but nontheless should be annotated. Implied evidence should not be included (e.g., '104.6F' implies a fever but does not explicitly state it).
The primary task for this project is to learn about the concepts definitions relevant for annotation. The layer containing all of these classes is called 'Entities'. There is a second layer called 'Relations' for drawing links between annotation (e.g., between a laboratory test name and its resultant value).
Some classes of information have drop-down menus for selecting a more specific type. Other classes of information require flipping a toggle box from 'No' to 'Yes'. We'll review each class of information in the rest of this project with examples already annotated. Below these examples is a 'Workshop' section to provide a space for you to try to match the annotation. If you find any examples ambiguous, please notify us and we can help clarify matters.
Annotation will be done in two phases. During the first phase, annotation is focused on finding instances of a concept type in the text. During the second phase, annotation is focused on normalizing instances in the text to an external standard concept. For instance, in Phase 1, your job would be to flag the span of text 'Prozac' as a Medication Name in the string 'Pt takes Prozac'. In Phase 2, your job would be to verify that the annotation for 'Prozac' was correctly mapped to the RxCUI 58827 in the RxNorm ontology. This document focuses on the concepts that you will be annotating in Phase 1.

Condition or Disease
Annotate the full mention of a disease, problem, or comorbidity. There may be Negated instances of comorbidities. Annotate them as Negated using the flag, as described in section on Attributes.
These concept will often be associated with the Conditional flag (when only present under certain circumstances like exercising), the Historical flag (when no longer present), Negated flag (when patient denies it), and the Not Patient flag (when a neighbor, spouse, friend, etc. is described as having it).

Annotation Text Normalized Concept
Pt has a history of diabetes… -Type: Condition or Disease Normalized Concept and sleep apnea. -Type: Condition or Disease

Investigation Name
Investigations and laboratory tests have two components to annotate: the name and the result or value. Annotate each component individually and then click-and-drag a relation arch between the two. The order (or direction) of the arrow does not matter. It is possible for an investigation to be mentioned without an appropriate result or value (e.g., when a lab is being ordered).
Body temperatures reported in the note should be annotated in this way unless it is described explicitly as a fever.

Annotation
Text Normalized Concept reports a temperature of 103 -Type: Investigation Name (Nothing) reports a fever of 103

Investigation Result/Value
Investigation results can be either categorical (e.g., 'positive', 'negative') or numerica (e.g., '120/80', '1.2 mg/dL'). In the latter case, include the units, when present, in the annotation. Remember to click-and-drag a Relation link between any given result annotation and the span of text mentioning the investigation name.

Annotation
Text Normalized Concept reports a temperature of 103 -Type: Investigation Result/Value

Medication Name
When a medication has multiple possible names (e.g., generic and brand names), annotate each one as a separate instance. This guideline is in contrast to a term and its acronym cooccuring together, which should be annotated as a single term (e.g., 'Gastroesophageal reflux disease (GERD)').
Some medication mentions are actually in terms of the medication being an allergen, as in the 'Levaquin' example below. Annotate these like other medications and then also slide the toggle box to a green 'Yes' beside Medication Allergy when annotating the name.
Medications that are no longer being taken should be flagged Historical, as described in the Historical. Medications that are being prescribed should be flagged Uncertain, as described in Uncertain section.

Annotation
Text Normalized Concept Acetaminophen (Tylenol or store brand) -Type: Medication Name

Normalized Concept
Gastroesophageal reflux disease (GERD) occurs… -Type: Condition or Disease

Procedure
As with Conditions and Medications (above), a Procedure may need to be further annotated with context flag like Negated, when a procedure was not performed.

Other Allergy
Any other type of allergy mentioned for a patient (i.e., non-medication induced) should be indicated with this toggle box. The allergen may not even be annotated as one of the other types normalized concepts mentioned above (e.g., for seasonal allergies or pet allergies).

Conditional
Problems that only occur under certain conditions (e.g., 'problems breathing while exercising') should be flagged as Conditional.

Generic
Flag concepts with Generic when they are mentioned in a generic way, not about any specific individual (e.g., Diabetes clinic).

Historical
Information from the patient's past and no longer present should be flagged as Historical.

Negated
Mentions with explicit negation should be flagged as Negated (e.g., 'patient denies abdominal pain')

Not Patient
Flag concepts with Not Patient when they apply to a specific individual who is not the patient (e.g., their spouse, roommate, or neighbor).

Uncertain
Annotated concepts should be flagged as Uncertain when they are possible or hypothetical mentions (e.g., 'I'm worried that…'), prescribed medications, or ordered (but not yet performed) investigations and procedures.

Annotation
Text Medication Azelastine 137 mcg (0.1 %) nasal aerosol… There -Type: Name are no refills with this prescription.