HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Translation between Molecules and Natural Language

Carl Edwards; Tuan Lai; Kevin Ros; Garrett Honke; Kyunghyun Cho; Heng Ji

Translation between Molecules and Natural Language

Abstract

We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since $\textbf{MolT5}$ pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that $\textbf{MolT5}$-based models are able to generate outputs, both molecules and captions, which in many cases are high quality.

Code Repositories

blender-nlp/MolT5
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
molecule-captioning-on-chebi-20MolT5-Base
BLEU-2: 54.0
BLEU-4: 45.7
METEOR: 56.9
ROUGE-1: 63.4
ROUGE-2: 48.5
ROUGE-L: 57.8
Text2Mol: 54.7
molecule-captioning-on-chebi-20MolT5-Large
BLEU-2: 59.4
BLEU-4: 50.8
METEOR: 61.4
ROUGE-1: 65.4
ROUGE-2: 51.0
ROUGE-L: 59.4
Text2Mol: 58.2
molecule-captioning-on-chebi-20MolT5-Small
BLEU-2: 51.9
BLEU-4: 43.6
METEOR: 55.1
ROUGE-1: 62.0
ROUGE-2: 46.9
ROUGE-L: 56.3
Text2Mol: 54.0
molecule-captioning-on-l-m-24MolT5-Small
BLEU-2: 70.9
BLEU-4: 51.2
METEOR: 70.1
ROUGE-1: 74.5
ROUGE-2: 55.8
ROUGE-L: 54.4
molecule-captioning-on-l-m-24MolT5-Base
BLEU-2: 73.8
BLEU-4: 53.5
METEOR: 71.8
ROUGE-1: 75.0
ROUGE-2: 55.9
ROUGE-L: 53.9
molecule-captioning-on-l-m-24MolT5-Large
BLEU-2: 76.9
BLEU-4: 55.6
METEOR: 74.3
ROUGE-1: 77.7
ROUGE-2: 58.0
ROUGE-L: 55.7
text-based-de-novo-molecule-generation-onMolT5-Large
BLEU: 85.4
Exact Match: 30.2
Frechet ChemNet Distance (FCD): 1.20
Levenshtein: 16.07
MACCS FTS: 83.4
Morgan FTS: 68.4
Parameter Count: 770000000
RDK FTS: 74.6
Text2Mol: 55.4
Validity: 90.5
text-based-de-novo-molecule-generation-onMolT5-small
BLEU: 75.5
Exact Match: 7.9
Frechet ChemNet Distance (FCD): 2.49
Levenshtein: 25.988
MACCS FTS: 70.3
Morgan FTS: 51.7
Parameter Count: 60000000
RDK FTS: 56.8
Text2Mol: 48.2
Validity: 72.1
text-based-de-novo-molecule-generation-onMolT5-Large-HV
BLEU: 81.0
Exact Match: 31.4
Frechet ChemNet Distance (FCD): 0.44
Levenshtein: 16.758
MACCS FTS: 87.2
Morgan FTS: 72.2
Parameter Count: 770000000
RDK FTS: 78.6
Text2Mol: 59.0
Validity: 99.6
text-based-de-novo-molecule-generation-onMolT5-base
BLEU: 76.9
Exact Match: 8.1
Frechet ChemNet Distance (FCD): 2.18
Levenshtein: 24.458
MACCS FTS: 72.1
Morgan FTS: 52.9
Parameter Count: 220000000
RDK FTS: 58.8
Text2Mol: 49.6
Validity: 77.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Translation between Molecules and Natural Language | Papers | HyperAI