HyperAIHyperAI

Command Palette

Search for a command to run...

20 days ago

Scaling Language-Centric Omnimodal Representation Learning

Chenghao Xiao Hou Pong Chan Hao Zhang Weiwen Xu Mahani Aljunied Yu Rong

Scaling Language-Centric Omnimodal Representation Learning

Abstract

Recent multimodal embedding approaches leveraging multimodal large languagemodels (MLLMs) fine-tuned with contrastive learning (CL) have shown promisingresults, yet the underlying reasons behind their superiority remainunderexplored. This work argues that a crucial advantage of MLLM-basedapproaches stems from implicit cross-modal alignment achieved during generativepretraining, where the language decoder learns to exploit multimodal signalswithin a shared representation space for generating unimodal outputs. Throughanalysis of anisotropy and kernel similarity structure, we empirically confirmthat latent alignment emerges within MLLM representations, allowing CL to serveas a lightweight refinement stage. Leveraging this insight, we propose aLanguage-Centric Omnimodal Embedding framework, termed LCO-Emb. Extensiveexperiments across diverse backbones and benchmarks demonstrate itseffectiveness, achieving state-of-the-art performance across modalities.Furthermore, we identify a Generation-Representation Scaling Law (GRSL),showing that the representational capabilities gained through contrastiverefinement scales positively with the MLLM's generative capabilities. Thissuggests that improving generative abilities evolves as an effective paradigmfor enhancing representation quality. We provide a theoretical explanation ofGRSL, which formally links the MLLM's generative quality to the upper bound onits representation performance, and validate it on a challenging, low-resourcevisual-document retrieval task, showing that continual generative pretrainingbefore CL can further enhance the potential of a model's embeddingcapabilities. Codes, models, and resources are available athttps://github.com/LCO-Embedding/LCO-Embedding.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Scaling Language-Centric Omnimodal Representation Learning | Papers | HyperAI