Command Palette
Search for a command to run...
Zihan Liu; Wei Ping; Rajarshi Roy; Peng Xu; Chankyu Lee; Mohammad Shoeybi; Bryan Catanzaro

Abstract
In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-augmented generation (RAG) and conversational question answering (QA). To enhance generation, we propose a two-stage instruction tuning method that significantly boosts the performance of RAG. For effective retrieval, we introduce a dense retriever optimized for conversational QA, which yields results comparable to the alternative state-of-the-art query rewriting models, while substantially reducing deployment costs. We also present the ChatRAG Bench, which encompasses ten datasets covering comprehensive evaluations on RAG, table-related QA, arithmetic calculations, and scenarios involving unanswerable questions. Our ChatQA-1.0-70B (score: 54.14), built on Llama2, a weaker foundation model than GPT-4, can slightly outperform GPT-4-0613 (score: 53.90) and GPT-4-Turbo-2024-04-09 (score: 54.03) on the ChatRAG Bench, without relying on any synthetic data from OpenAI GPT models. Notably, the Llama3-ChatQA-1.5-70B model surpasses the accuracy of GPT-4-Turbo-2024-04-09, achieving a 4.4% improvement. To advance research in this field, we open-sourced the model weights, instruction tuning data, ChatRAG Bench, and retriever for the community: https://chatqa-project.github.io/.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| question-answering-on-natural-questions | ChatQA-1.5-llama3-70b (Zero-Shot, KILT) | EM: 47.0 |
| question-answering-on-natural-questions | ChatQA-1.5-llama3-8b (Zero-Shot, KILT) | EM: 42.7 |
| question-answering-on-triviaqa | ChatQA-1.5-llama3-70b (Zero-Shot, DPR) | EM: 69.0 |
| question-answering-on-triviaqa | ChatQA-1.5-llama3-8B (Zero-Shot, KILT) | EM: 81.0 |
| question-answering-on-triviaqa | ChatQA-1.5-llama3-70b (Zero-Shot, KILT) | EM: 85.6 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.