Command Palette
Search for a command to run...
Accurate clinical and biomedical Named entity recognition at scale
{David Veysel; Talby Kocaman}
Abstract
We introduce an agile, production-grade clinical and biomedical Named entity recognition (NER) algorithm based on a modified BiLSTM-CNN-Char DL architecture built on top of Apache Spark. Our NER implementation establishes new state-of-the-art accuracy on 7 of 8 well-known biomedical NER benchmarks and 3 clinical concept extraction challenges: 2010 i2b2/VA clinical concept extraction, 2014 n2c2 de-identification, and 2018 n2c2 medication extraction. Moreover, clinical NER models trained using this implementation outperform the accuracy of commercial entity extraction solutions, AWS Medical Comprehend and Google Cloud Healthcare API by a large margin (8.9% and 6.7% respectively), without using memory-intensive language models.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| named-entity-recognition-ner-on-bc5cdr | BertForTokenClassification (Spark NLP) | F1: 90.89 |
| named-entity-recognition-on-anatem | BertForTokenClassification (Spark NLP) | F1: 91.65 |
| named-entity-recognition-on-bc4chemd | BertForTokenClassification (Spark NLP) | F1: 94.39 |
| named-entity-recognition-on-bionlp13-cg | BertForTokenClassification (Spark NLP) | F1: 87.83 |
| named-entity-recognition-on-species800 | BertForTokenClassification (Spark NLP) | F1: 82.59 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.