HyperAIHyperAI

Command Palette

Search for a command to run...

14 days ago

Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Chenghao Zhang Guanting Dong Xinyu Yang Zhicheng Dou

Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented
  Generation

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm forenhancing large language models (LLMs) by retrieving relevant documents from anexternal corpus. However, existing RAG systems primarily focus on unimodal textdocuments, and often fall short in real-world scenarios where both queries anddocuments may contain mixed modalities (such as text and images). In thispaper, we address the challenge of Universal Retrieval-Augmented Generation(URAG), which involves retrieving and reasoning over mixed-modal information toimprove vision-language generation. To this end, we propose Nyx, a unifiedmixed-modal to mixed-modal retriever tailored for URAG scenarios. To mitigatethe scarcity of realistic mixed-modal data, we introduce a four-stage automatedpipeline for generation and filtering, leveraging web documents to constructNyxQA, a dataset comprising diverse mixed-modal question-answer pairs thatbetter reflect real-world information needs. Building on this high-qualitydataset, we adopt a two-stage training framework for Nyx: we first performpre-training on NyxQA along with a variety of open-source retrieval datasets,followed by supervised fine-tuning using feedback from downstreamvision-language models (VLMs) to align retrieval outputs with generativepreferences. Experimental results demonstrate that Nyx not only performscompetitively on standard text-only RAG benchmarks, but also excels in the moregeneral and realistic URAG setting, significantly improving generation qualityin vision-language tasks.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation | Papers | HyperAI