Command Palette
Search for a command to run...
Chemical detection and indexing in PubMed full text articles using deep learning and rule-based methods
{Sérgio Matos João Rafael Almeida João Figueira Silva Rui Antunes Tiago Almeida}
Abstract
Identifying chemicals in biomedical scientific literature is a crucial task for drug development research. The BioCreative NLM-Chem challenge promoted the development of automatic systems that can identify chemicals in full-text articles and decide which chemical concepts are relevant to be indexed. This work describes the participation of the BIT.UA team from the University of Aveiro, where we propose a three-stage automatic pipeline that individually tackles (i) chemical mention detection, (ii) entity normalization and (iii) indexing. We adopted a deep learning solution based on a biomedical BERT variant for chemical identification. For normalization we used a rule-based approach and a hybrid version that explores a dense retrieval mechanism. Similarly, for indexing we also followed two distinct approaches: a rule-based, and a TF-IDF based method. Our best official results are consistently above the official median and benchmark in the three subtasks, with respectively 0.8454, 0.8136, and 0.4664 F1-scores.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| chemical-indexing-on-bc7-nlm-chem | Rule-based | F1-score (strict): 0.4664 |
| entity-linking-on-bc7-nlm-chem | Sieve-based | F1-score (strict): 0.8136 |
| named-entity-recognition-on-bc7-nlm-chem | PubMedBERT+MLP+CRF | F1-score (strict): 0.8454 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.