Command Palette
Search for a command to run...
Ruoxi Sun; Hanjun Dai; Li Li; Steven Kearnes; Bo Dai

Abstract
Retrosynthesis -- the process of identifying a set of reactants to synthesize a target molecule -- is of vital importance to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. In this paper, we propose a framework that unifies sequence- and graph-based methods as energy-based models (EBMs) with different energy functions. This unified perspective provides critical insights about EBM variants through a comprehensive assessment of performance. Additionally, we present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction by constraining the agreement between the two directions. This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| single-step-retrosynthesis-on-uspto-50k | Dual-TF (reaction class as prior) | Top-1 accuracy: 65.7 Top-10 accuracy: 85.9 Top-3 accuracy: 81.9 Top-5 accuracy: 84.7 |
| single-step-retrosynthesis-on-uspto-50k | Dual-TF (reaction class unknown) | Top-1 accuracy: 53.6 Top-10 accuracy: 77.0 Top-3 accuracy: 70.7 Top-5 accuracy: 74.6 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.