Command Palette
Search for a command to run...
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development
Kexin Huang; Tianfan Fu; Wenhao Gao; Yue Zhao; Yusuf Roohani; Jure Leskovec; Connor W. Coley; Cao Xiao; Jimeng Sun; Marinka Zitnik

Abstract
Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics. To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools and community resources, including 33 data functions and types of meaningful data splits, 23 strategies for systematic model evaluation, 17 molecule generation oracles, and 29 public leaderboards. All resources are integrated and accessible via an open Python library. We carry out extensive experiments on selected datasets, demonstrating that even the strongest algorithms fall short of solving key therapeutics challenges, including real dataset distributional shifts, multi-scale modeling of heterogeneous data, and robust generalization to novel data points. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation. TDC is an open-science initiative available at https://tdcommons.ai.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| molecular-property-prediction-on-bbbp-1 | AttentiveFP | ROC-AUC: 85.5 |
| molecular-property-prediction-on-bbbp-1 | AttrMasking | ROC-AUC: 89.2 |
| tdc-admet-benchmarking-group-on-tdcommons | AttentiveFP | TDC.AMES: 0.814 TDC.BBB_Martins: 0.855 TDC.Bioavailability_Ma: 0.632 TDC.CYP2C9_Inhibition_Veith: 0.749 TDC.CYP2C9_Substrate_CarbonMangels: 0.375 TDC.CYP2D6_Inhibition_Veith: 0.646 TDC.CYP2D6_Substrate_CarbonMangels: 0.574 TDC.CYP3A4_Inhibition_Veith: 0.851 TDC.CYP3A4_Substrate_CarbonMangels: 0.576 TDC.Caco2_Wang: 0.401 TDC.Clearance_Hepatocyte_AZ: 0.289 TDC.Clearance_Microsome_AZ: 0.365 TDC.DILI: 0.886 TDC.HIA_Hou: 0.974 TDC.Half_Life_Obach: 0.085 TDC.LD50_Zhu: 0.678 TDC.Lipophilicity_AstraZeneca: 0.572 TDC.PPBR_AZ: 9.373 TDC.Pgp_Broccatelli: 0.892 TDC.Solubility_AqSolDB: 0.776 TDC.VDss_Lombardo: 0.241 TDC.hERG: 0.825 |
| tdc-admet-benchmarking-group-on-tdcommons | AttrMasking | TDC.AMES: 0.842 TDC.BBB_Martins: 0.892 TDC.Bioavailability_Ma: 0.577 TDC.CYP2C9_Inhibition_Veith: 0.829 TDC.CYP2C9_Substrate_CarbonMangels: 0.381 TDC.CYP2D6_Inhibition_Veith: 0.721 TDC.CYP2D6_Substrate_CarbonMangels: 0.704 TDC.CYP3A4_Inhibition_Veith: 0.902 TDC.CYP3A4_Substrate_CarbonMangels: 0.582 TDC.Caco2_Wang: 0.546 TDC.Clearance_Hepatocyte_AZ: 0.413 TDC.Clearance_Microsome_AZ: 0.585 TDC.DILI: 0.919 TDC.HIA_Hou: 0.978 TDC.Half_Life_Obach: 0.151 TDC.LD50_Zhu: 0.685 TDC.Lipophilicity_AstraZeneca: 0.547 TDC.PPBR_AZ: 10.075 TDC.Pgp_Broccatelli: 0.929 TDC.Solubility_AqSolDB: 1.026 TDC.VDss_Lombardo: 0.559 TDC.hERG: 0.778 |
| tdc-admet-benchmarking-group-on-tdcommons | GCN | TDC.AMES: 0.818 TDC.BBB_Martins: 0.842 TDC.Bioavailability_Ma: 0.566 TDC.CYP2C9_Inhibition_Veith: 0.735 TDC.CYP2C9_Substrate_CarbonMangels: 0.344 TDC.CYP2D6_Inhibition_Veith: 0.616 TDC.CYP2D6_Substrate_CarbonMangels: 0.617 TDC.CYP3A4_Inhibition_Veith: 0.840 TDC.CYP3A4_Substrate_CarbonMangels: 0.590 TDC.Caco2_Wang: 0.599 TDC.Clearance_Hepatocyte_AZ: 0.366 TDC.Clearance_Microsome_AZ: 0.532 TDC.DILI: 0.859 TDC.HIA_Hou: 0.936 TDC.Half_Life_Obach: 0.239 TDC.LD50_Zhu: 0.649 TDC.Lipophilicity_AstraZeneca: 0.541 TDC.PPBR_AZ: 10.194 TDC.Pgp_Broccatelli: 0.895 TDC.Solubility_AqSolDB: 0.907 TDC.VDss_Lombardo: 0.457 TDC.hERG: 0.738 |
| tdc-admet-benchmarking-group-on-tdcommons | MLP-RDKit2D | TDC.AMES: 0.823 TDC.BBB_Martins: 0.889 TDC.Bioavailability_Ma: 0.672 TDC.CYP2C9_Inhibition_Veith: 0.742 TDC.CYP2C9_Substrate_CarbonMangels: 0.360 TDC.CYP2D6_Inhibition_Veith: 0.616 TDC.CYP2D6_Substrate_CarbonMangels: 0.677 TDC.CYP3A4_Inhibition_Veith: 0.829 TDC.CYP3A4_Substrate_CarbonMangels: 0.639 TDC.Caco2_Wang: 0.393 TDC.Clearance_Hepatocyte_AZ: 0.382 TDC.Clearance_Microsome_AZ: 0.586 TDC.DILI: 0.875 TDC.HIA_Hou: 0.972 TDC.Half_Life_Obach: 0.184 TDC.LD50_Zhu: 0.678 TDC.Lipophilicity_AstraZeneca: 0.574 TDC.PPBR_AZ: 9.994 TDC.Pgp_Broccatelli: 0.918 TDC.Solubility_AqSolDB: 0.827 TDC.VDss_Lombardo: 0.561 TDC.hERG: 0.841 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.