HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

Zihan Liu; Wei Ping; Rajarshi Roy; Peng Xu; Chankyu Lee; Mohammad Shoeybi; Bryan Catanzaro

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

Abstract

In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-augmented generation (RAG) and conversational question answering (QA). To enhance generation, we propose a two-stage instruction tuning method that significantly boosts the performance of RAG. For effective retrieval, we introduce a dense retriever optimized for conversational QA, which yields results comparable to the alternative state-of-the-art query rewriting models, while substantially reducing deployment costs. We also present the ChatRAG Bench, which encompasses ten datasets covering comprehensive evaluations on RAG, table-related QA, arithmetic calculations, and scenarios involving unanswerable questions. Our ChatQA-1.0-70B (score: 54.14), built on Llama2, a weaker foundation model than GPT-4, can slightly outperform GPT-4-0613 (score: 53.90) and GPT-4-Turbo-2024-04-09 (score: 54.03) on the ChatRAG Bench, without relying on any synthetic data from OpenAI GPT models. Notably, the Llama3-ChatQA-1.5-70B model surpasses the accuracy of GPT-4-Turbo-2024-04-09, achieving a 4.4% improvement. To advance research in this field, we open-sourced the model weights, instruction tuning data, ChatRAG Bench, and retriever for the community: https://chatqa-project.github.io/.

Benchmarks

BenchmarkMethodologyMetrics
question-answering-on-natural-questionsChatQA-1.5-llama3-70b (Zero-Shot, KILT)
EM: 47.0
question-answering-on-natural-questionsChatQA-1.5-llama3-8b (Zero-Shot, KILT)
EM: 42.7
question-answering-on-triviaqaChatQA-1.5-llama3-70b (Zero-Shot, DPR)
EM: 69.0
question-answering-on-triviaqaChatQA-1.5-llama3-8B (Zero-Shot, KILT)
EM: 81.0
question-answering-on-triviaqaChatQA-1.5-llama3-70b (Zero-Shot, KILT)
EM: 85.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ChatQA: Surpassing GPT-4 on Conversational QA and RAG | Papers | HyperAI