5 months ago

Abstract

Proteins play a crucial role in biological processes and understanding the complex mechanisms within living organisms. Accurate protein representation is vital for advancements in drug development, synthetic biology, and understanding disease mechanisms. Current methodologies primarily focus on one-dimensional (1D) structures, emphasizing amino acid sequences. However, this approach neglects the essential three-dimensional (3D) structural details that determine protein functionality and interactions. Initial attempts to leverage both the 1D and 3D modalities have shown limited benefits compared to single modality representations. In this work, we introduce FusionProt, a novel protein representation learning framework designed to learn unified representations of both the 1D and 3D structures of proteins simultaneously. FusionProt incorporates an innovative learnable fusion token that serves as an adaptive bridge, enabling an iterative exchange of information between a protein language model and the protein’s 3D structure graph. This token is integrated into the training process of both modalities, enabling seamless propagation of information and facilitating comprehensive representation through iterative learning cycles. Empirical evaluations across various biological tasks demonstrate that FusionProt achieves state-of-the-art performance, offering a significant advancement in protein representation learning.

Source PDF