HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

Qizhi Pei; Wei Zhang; Jinhua Zhu; Kehan Wu; Kaiyuan Gao; Lijun Wu; Yingce Xia; Rui Yan

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

Abstract

Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery. However, current models exhibit several limitations, such as the generation of invalid molecular SMILES, underutilization of contextual information, and equal treatment of structured and unstructured knowledge. To address these issues, we propose $\mathbf{BioT5}$, a comprehensive pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. $\mathbf{BioT5}$ utilizes SELFIES for $100%$ robust molecular representations and extracts knowledge from the surrounding context of bio-entities in unstructured biological literature. Furthermore, $\mathbf{BioT5}$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information. After fine-tuning, BioT5 shows superior performance across a wide range of tasks, demonstrating its strong capability of capturing underlying relations and properties of bio-entities. Our code is available at $\href{https://github.com/QizhiPei/BioT5}{Github}$.

Code Repositories

QizhiPei/BioT5
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
molecule-captioning-on-chebi-20BioT5
BLEU-2: 63.5
BLEU-4: 55.6
METEOR: 65.6
ROUGE-1: 69.2
ROUGE-2: 55.9
ROUGE-L: 63.3
Text2Mol: 60.3
text-based-de-novo-molecule-generation-onBioT5
BLEU: 86.7
Exact Match: 41.3
Frechet ChemNet Distance (FCD): .43
Levenshtein: 15.097
MACCS FTS: 88.6
Morgan FTS: 73.4
Parameter Count: 252000000
RDK FTS: 80.1
Text2Mol: 57.6
Validity: 100

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations | Papers | HyperAI