| Atlas (full, Wiki-dec-2018 index) | 64.0 | Atlas: Few-shot Learning with Retrieval Augmented Language Models |  | 
| Atlas (full, Wiki-dec-2021+CC index) | 60.4 | Atlas: Few-shot Learning with Retrieval Augmented Language Models |  | 
| RankRAG-llama3-70b (Zero-Shot, KILT) | 54.2 | RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | - | 
| RankRAG-llama3-8b (Zero-Shot, KILT) | 50.6 | RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | - | 
| RankRAG-llama3-70b (Zero-Shot, DPR) | 50.0 | RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | - | 
| ChatQA-1.5-llama3-70b (Zero-Shot, KILT) | 47.0 | ChatQA: Surpassing GPT-4 on Conversational QA and RAG | - | 
| RankRAG-llama3-8b (Zero-Shot, DPR) | 46.1 | RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | - | 
| code-davinci-002 175B + REPLUG LSR (few-shot) | 45.5 | REPLUG: Retrieval-Augmented Black-Box Language Models |  | 
| Atlas (few-shot, k=64, Wiki-Dec-2018 index) | 45.1 | Atlas: Few-shot Learning with Retrieval Augmented Language Models |  | 
| code-davinci-002 175B + REPLUG (few-shot) | 44.7 | REPLUG: Retrieval-Augmented Black-Box Language Models |  | 
| ChatQA-1.5-llama3-8b (Zero-Shot, KILT) | 42.7 | ChatQA: Surpassing GPT-4 on Conversational QA and RAG | - |