Skip to main content

Table 6 Shared dataset NLP challenges since 2015

From: Spontaneously generated online patient experience data - how and why is it being used in health research: an umbrella scoping review

Event

Data Source

Task

No tweets / posts

Best result

Methods used

Data availability

2015 CLPsych

Twitter

Binary classification of users based on depression / PTSD.

1. Depression vs control

2. PTSD vs control

3. Depression vs PTSD

7.857 million

Average precision 80%

SVM /TD-IDF weighting

With IRB approval & privacy agreement

2016 CLPsych

ReachOut forum

Classify triage level (1–4) for professional support

65,024

F1–42%

Variety of classifiers

With IRB approval & privacy agreement

2017 CLPsych

ReachOut forum

Classify triage level (1–4) for professional support

157.963

F1–46.7%

Variety of classifiers

With IRB approval & privacy agreement

2016 SMM

Twitter

1. Classify ADRs.

2. Map to UMLS (NER)

3. Concept normalisation

10,822

F1–42%

F1–61%

No result

Random forest (ngram)

CRF

Yes

2017 SMM

Twitter

1. Classify ADRs.

2. Classify drug intake.

3. Concept normalisation

15,717 training

9961 testing

1. F1–43.5% 2. F1–69.3%

3. Acc −88.5%

SVM CNN LR/DeepLearn

Yes

2017 NTCIR-13

Twitter

Label disease / symptoms

2560 (English, Japanese & Chinese)

Exact match accuracy of 88%

Hierarchical attention networks (HAN) plus CNNs

Training data only

  1. Adapted from [39, 44]