Question Answering On Natural Questions

评估指标

EM

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Atlas (full, Wiki-dec-2018 index)64.0Atlas: Few-shot Learning with Retrieval Augmented Language Models
Atlas (full, Wiki-dec-2021+CC index)60.4Atlas: Few-shot Learning with Retrieval Augmented Language Models
DPA-RAG59.19Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
FiE58.4FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering-
R2-D2 (full)55.9R2-D2: A Modular Baseline for Open-Domain Question Answering
ReAtt54.7Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer
FiD-KD (full)54.7Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
RankRAG-llama3-70b (Zero-Shot, KILT)54.2RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
EMDR^252.5End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering
FID (full)51.4Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
RankRAG-llama3-8b (Zero-Shot, KILT)50.6RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
RankRAG-llama3-70b (Zero-Shot, DPR)50.0RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)47.0ChatQA: Surpassing GPT-4 on Conversational QA and RAG-
RankRAG-llama3-8b (Zero-Shot, DPR)46.1RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
code-davinci-002 175B + REPLUG LSR (few-shot)45.5REPLUG: Retrieval-Augmented Black-Box Language Models
RETRO + DPR (full)45.5Improving language models by retrieving from trillions of tokens
Atlas (few-shot, k=64, Wiki-Dec-2018 index)45.1Atlas: Few-shot Learning with Retrieval Augmented Language Models
code-davinci-002 175B + REPLUG (few-shot)44.7REPLUG: Retrieval-Augmented Black-Box Language Models
RAG44.5Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
ChatQA-1.5-llama3-8b (Zero-Shot, KILT)42.7ChatQA: Surpassing GPT-4 on Conversational QA and RAG-
0 of 47 row(s) selected.
Question Answering On Natural Questions | SOTA | HyperAI超神经