A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance

Table 3 Model architectures

Model	Number of Filters/Units/Encoders	Embedding Dimension	Max Sequence Length	Dropout	Activation Function	Optimizer	Total Parameters
CNN	8	200	557	0.3	ReLU	Adam	5.51 M
RNN	8	200	557	0.3	ReLU	Adam	5.50 M
GRU	8	200	557	0.3	ReLU	Adam	5.50 M
LSTM	8	200	557	0.3	ReLU	Adam	5.50 M
Bi-LSTM	8	200	557	0.3	ReLU	Adam	5.51 M
Transformer Encoder	1 encoder (2 heads)	200	557	0.3	ReLU	Adam	5.94 M
BERT-Base	12 encoders (12 heads)	768	512	0.3 (fine-tune layer)	ReLU (fine-tune layer)	Adam (fine-tune layer)	110 M

ISSN: 1471-2288