HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

BioSentVec: creating sentence embeddings for biomedical texts

Qingyu Chen; Yifan Peng; Zhiyong Lu

BioSentVec: creating sentence embeddings for biomedical texts

Abstract

Sentence embeddings have become an essential part of today's natural language processing (NLP) systems, especially together advanced deep learning methods. Although pre-trained sentence encoders are available in the general domain, none exists for biomedical texts to date. In this work, we introduce BioSentVec: the first open set of sentence embeddings trained with over 30 million documents from both scholarly articles in PubMed and clinical notes in the MIMIC-III Clinical Database. We evaluate BioSentVec embeddings in two sentence pair similarity tasks in different text genres. Our benchmarking results demonstrate that the BioSentVec embeddings can better capture sentence semantics compared to the other competitive alternatives and achieve state-of-the-art performance in both tasks. We expect BioSentVec to facilitate the research and development in biomedical text mining and to complement the existing resources in biomedical word embeddings. BioSentVec is publicly available at https://github.com/ncbi-nlp/BioSentVec

Code Repositories

ncbi-nlp/BioSentVec
Official
Mentioned in GitHub
ESBigeard/paper_graph
tf
Mentioned in GitHub
ncbi-nlp/BioWordVec
Official
Mentioned in GitHub
ncbi-nlp/BLUE_Benchmark
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
sentence-embeddings-for-biomedical-texts-onUniversal Sentence Encoder
Pearson Correlation: 0.345
sentence-embeddings-for-biomedical-texts-onBioSentVec (MIMIC-III)
Pearson Correlation: 0.350
sentence-embeddings-for-biomedical-texts-onBioSentVec (PubMed + MIMIC-III)
Pearson Correlation: 0.795
sentence-embeddings-for-biomedical-texts-onBioSentVec (PubMed)
Pearson Correlation: 0.817
sentence-embeddings-for-biomedical-texts-on-2BioSentVec (PubMed + MIMIC-III)
Pearson Correlation: 0.767
sentence-embeddings-for-biomedical-texts-on-2BioSentVec (MIMIC-III)
Pearson Correlation: 0.759
sentence-embeddings-for-biomedical-texts-on-2Universal Sentence Encoder
Pearson Correlation: 0.714
sentence-embeddings-for-biomedical-texts-on-2BioSentVec (PubMed)
Pearson Correlation: 0.750

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BioSentVec: creating sentence embeddings for biomedical texts | Papers | HyperAI