5 months ago

SciBERT: A Pretrained Language Model for Scientific Text

Iz Beltagy; Kyle Lo; Arman Cohan

Abstract

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks. The code and pretrained models are available at https://github.com/allenai/scibert/.

Code Repositories

hoangcuongnguyen2001/scibert-for-technique-classification

Mentioned in GitHub

kuldeep7688/BioMedicalBertNer

pytorch

Mentioned in GitHub

charles9n/bert-sklearn

pytorch

Mentioned in GitHub

allenai/scibert

Official

pytorch

Mentioned in GitHub

georgetown-cset/ai-relevant-papers

pytorch

Mentioned in GitHub

tetsu9923/scireviewgen

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
citation-intent-classification-on-scicite	SciBERT	Macro-F1: 86.32
dependency-parsing-on-genia-las	SciBERT (Base Vocab)	F1: 91.26
dependency-parsing-on-genia-las	SciBERT (SciVocab)	F1: 91.41
dependency-parsing-on-genia-uas	SciBERT (SciVocab)	F1: 92.46
dependency-parsing-on-genia-uas	SciBERT (Base Vocab)	F1: 92.32
named-entity-recognition-ner-on-bc5cdr	SciBERT (SciVocab)	F1: 88.94
named-entity-recognition-ner-on-bc5cdr	SciBERT (Base Vocab)	F1: 88.11
named-entity-recognition-ner-on-jnlpba	SciBERT (Base Vocab)	F1: 75.77
named-entity-recognition-ner-on-ncbi-disease	SciBERT (Base Vocab)	F1: 86.88
named-entity-recognition-ner-on-ncbi-disease	SciBERT (SciVocab)	F1: 86.45
named-entity-recognition-ner-on-scierc	SciBERT (SciVocab)	F1: 67.57
named-entity-recognition-ner-on-scierc	SciBERT (Base Vocab)	F1: 65.24
participant-intervention-comparison-outcome	SciBERT (SciVocab)	F1: 71.18
participant-intervention-comparison-outcome	SciBERT (Base Vocab)	F1: 70.82
relation-extraction-on-chemprot	SciBert (Finetune)	F1: 83.64
relation-extraction-on-chemprot	SciBERT (Base Vocab)	F1: 73.7
relation-extraction-on-jnlpba	SciBERT (SciVocab)	F1: 76.09
relation-extraction-on-scierc	SciBERT (SciVocab)	F1: 74.64
relation-extraction-on-scierc	SciBERT (Base Vocab)	F1: 74.42
sentence-classification-on-acl-arc	SciBERT	F1: 70.98
sentence-classification-on-paper-field	SciBERT (Base Vocab)	F1: 64.02
sentence-classification-on-paper-field	SciBERT (SciVocab)	F1: 65.71
sentence-classification-on-pubmed-20k-rct	SciBERT (Base Vocab)	F1: 86.81
sentence-classification-on-scicite	SciBERT	F1: 84.9
sentence-classification-on-sciencecite	SciBERT (SciVocab)	F1: 84.99
sentence-classification-on-sciencecite	SciBERT (Base Vocab)	F1: 84.43

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

SciBERT: A Pretrained Language Model for Scientific Text

Iz Beltagy; Kyle Lo; Arman Cohan

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters