HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

SciBERT: A Pretrained Language Model for Scientific Text

Iz Beltagy; Kyle Lo; Arman Cohan

SciBERT: A Pretrained Language Model for Scientific Text

Abstract

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks. The code and pretrained models are available at https://github.com/allenai/scibert/.

Code Repositories

kuldeep7688/BioMedicalBertNer
pytorch
Mentioned in GitHub
charles9n/bert-sklearn
pytorch
Mentioned in GitHub
allenai/scibert
Official
pytorch
Mentioned in GitHub
georgetown-cset/ai-relevant-papers
pytorch
Mentioned in GitHub
tetsu9923/scireviewgen
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
citation-intent-classification-on-sciciteSciBERT
Macro-F1: 86.32
dependency-parsing-on-genia-lasSciBERT (Base Vocab)
F1: 91.26
dependency-parsing-on-genia-lasSciBERT (SciVocab)
F1: 91.41
dependency-parsing-on-genia-uasSciBERT (SciVocab)
F1: 92.46
dependency-parsing-on-genia-uasSciBERT (Base Vocab)
F1: 92.32
named-entity-recognition-ner-on-bc5cdrSciBERT (SciVocab)
F1: 88.94
named-entity-recognition-ner-on-bc5cdrSciBERT (Base Vocab)
F1: 88.11
named-entity-recognition-ner-on-jnlpbaSciBERT (Base Vocab)
F1: 75.77
named-entity-recognition-ner-on-ncbi-diseaseSciBERT (Base Vocab)
F1: 86.88
named-entity-recognition-ner-on-ncbi-diseaseSciBERT (SciVocab)
F1: 86.45
named-entity-recognition-ner-on-sciercSciBERT (SciVocab)
F1: 67.57
named-entity-recognition-ner-on-sciercSciBERT (Base Vocab)
F1: 65.24
participant-intervention-comparison-outcomeSciBERT (SciVocab)
F1: 71.18
participant-intervention-comparison-outcomeSciBERT (Base Vocab)
F1: 70.82
relation-extraction-on-chemprotSciBert (Finetune)
F1: 83.64
relation-extraction-on-chemprotSciBERT (Base Vocab)
F1: 73.7
relation-extraction-on-jnlpbaSciBERT (SciVocab)
F1: 76.09
relation-extraction-on-sciercSciBERT (SciVocab)
F1: 74.64
relation-extraction-on-sciercSciBERT (Base Vocab)
F1: 74.42
sentence-classification-on-acl-arcSciBERT
F1: 70.98
sentence-classification-on-paper-fieldSciBERT (Base Vocab)
F1: 64.02
sentence-classification-on-paper-fieldSciBERT (SciVocab)
F1: 65.71
sentence-classification-on-pubmed-20k-rctSciBERT (Base Vocab)
F1: 86.81
sentence-classification-on-sciciteSciBERT
F1: 84.9
sentence-classification-on-scienceciteSciBERT (SciVocab)
F1: 84.99
sentence-classification-on-scienceciteSciBERT (Base Vocab)
F1: 84.43

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SciBERT: A Pretrained Language Model for Scientific Text | Papers | HyperAI