HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
SOTA
Question Answering

Question Answering

Question Answering is an important task in the field of natural language processing, aimed at automatically answering questions posed by users through computer systems. This task can be subdivided into subtasks such as community question answering and knowledge base question answering, with evaluation metrics primarily including EM (Exact Match) and F1 scores. Currently, popular benchmark datasets include SQuAD, HotPotQA, bAbI, TriviaQA, and WikiQA. In recent years, models like T5 and XLNet have performed exceptionally well in this area, advancing the accuracy and practicality of question answering systems.

GPT-3 175B (0-shot)

PaLM 540B (finetuned)

Natural Questions

Atlas (full, Wiki-dec-2018 index)

Memory Networks (ensemble)

PubMedBERT uncased

DRAGON + BioLinkBERT

TANDA-RoBERTa (ASNQ, WikiQA)

LLaMA 65B (zero-shot)

Quora Question Pairs

DeBERTa (large)

CNN / Daily Mail

QDGAT (ensemble)

OpenAI/o3-mini-2025-01-31-high

Natural Questions (long)

XLNet (single model)

TANDA DeBERTa-V3-Large + ALL

PaLM 2 (few-shot, CoT, SC)

Masque (NarrativeQA + MS MARCO)

BERT Large Augmented (single model)

FLAN 137B (zero-shot)

Children's Book Test

PubMedBERT uncased

ScanQA (w/ auxiliary loss)

Neo-6B (QA + WS)

ELASTIC (RoBERTa-large)

NExT-QA (Open-ended VideoQA)

GPT-4o-2024-08-06-128k

AI2 Kaggle Dataset

BioLinkBERT (large)

catbAbI QA-mode

Fast Weight Memory

catbAbI LM-mode

Fast Weight Memory

Complex-CronQuestions

BART fine-tuned on FairytaleQA

FiQA-2018 (BEIR)

HotpotQA (BEIR)

Custom Legal-BERT

Vector Database (ChromaDB)

Mathematics Dataset

Fusion Retriever+ETC

Aristo Kaggle Allen AI 8th grade questions

Gated-Attention Reader

G-DAUG-Combo + RoBERTa-Large

COMPLEXQUESTIONS

GeoQuestions1089

Claude-3.5-Sonnet (ReAct)

MedTurkQuAD: Medical Turkish Question-Answering Dataset

MuLD (HotpotQA)

MuLD (NarrativeQA)

FlowQA (single model)

VNHSGE-Chemistry

VNHSGE-Geography

VNHSGE-Literature

VNHSGE Mathematics

WikiTableQuestions

TabSQLify (col+row)

ChAII - Hindi and Tamil Question Answering

COCO Visual Question Answering (VQA) real images 1.0 open ended

ComplexWebQuestions

EfficientQA dev

EfficientQA test

JD Product Question Answer

MapEval-Textual

syntax, frame, coreference, and word embedding features

MedMobile (3.8B)

T5-small+prolog

MRQA out-of-domain

RoBERTa-large Tagger + LIQUID (Ensemble)

Longformer Encoder Decoder (base)

multimodal+LXMERT+ConstrainedMaxPooling

SimpleQuestions

squad_adversarial

squadshifts amazon

squadshifts new_wiki

squadshifts nyt

squadshifts reddit

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
SOTA
Question Answering

Question Answering

Question Answering is an important task in the field of natural language processing, aimed at automatically answering questions posed by users through computer systems. This task can be subdivided into subtasks such as community question answering and knowledge base question answering, with evaluation metrics primarily including EM (Exact Match) and F1 scores. Currently, popular benchmark datasets include SQuAD, HotPotQA, bAbI, TriviaQA, and WikiQA. In recent years, models like T5 and XLNet have performed exceptionally well in this area, advancing the accuracy and practicality of question answering systems.

GPT-3 175B (0-shot)

PaLM 540B (finetuned)

Natural Questions

Atlas (full, Wiki-dec-2018 index)

Memory Networks (ensemble)

PubMedBERT uncased

DRAGON + BioLinkBERT

TANDA-RoBERTa (ASNQ, WikiQA)

LLaMA 65B (zero-shot)

Quora Question Pairs

DeBERTa (large)

CNN / Daily Mail

QDGAT (ensemble)

OpenAI/o3-mini-2025-01-31-high

Natural Questions (long)

XLNet (single model)

TANDA DeBERTa-V3-Large + ALL

PaLM 2 (few-shot, CoT, SC)

Masque (NarrativeQA + MS MARCO)

BERT Large Augmented (single model)

FLAN 137B (zero-shot)

Children's Book Test

PubMedBERT uncased

ScanQA (w/ auxiliary loss)

Neo-6B (QA + WS)

ELASTIC (RoBERTa-large)

NExT-QA (Open-ended VideoQA)

GPT-4o-2024-08-06-128k

AI2 Kaggle Dataset

BioLinkBERT (large)

catbAbI QA-mode

Fast Weight Memory

catbAbI LM-mode

Fast Weight Memory

Complex-CronQuestions

BART fine-tuned on FairytaleQA

FiQA-2018 (BEIR)

HotpotQA (BEIR)

Custom Legal-BERT

Vector Database (ChromaDB)

Mathematics Dataset

Fusion Retriever+ETC

Aristo Kaggle Allen AI 8th grade questions

Gated-Attention Reader

G-DAUG-Combo + RoBERTa-Large

COMPLEXQUESTIONS

GeoQuestions1089

Claude-3.5-Sonnet (ReAct)

MedTurkQuAD: Medical Turkish Question-Answering Dataset

MuLD (HotpotQA)

MuLD (NarrativeQA)

FlowQA (single model)

VNHSGE-Chemistry

VNHSGE-Geography

VNHSGE-Literature

VNHSGE Mathematics

WikiTableQuestions

TabSQLify (col+row)

ChAII - Hindi and Tamil Question Answering

COCO Visual Question Answering (VQA) real images 1.0 open ended

ComplexWebQuestions

EfficientQA dev

EfficientQA test

JD Product Question Answer

MapEval-Textual

syntax, frame, coreference, and word embedding features

MedMobile (3.8B)

T5-small+prolog

MRQA out-of-domain

RoBERTa-large Tagger + LIQUID (Ensemble)

Longformer Encoder Decoder (base)

multimodal+LXMERT+ConstrainedMaxPooling

SimpleQuestions

squad_adversarial

squadshifts amazon

squadshifts new_wiki

squadshifts nyt

squadshifts reddit

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)