Command Palette
Search for a command to run...
Question Answering
Question Answering is an important task in the field of natural language processing, aimed at automatically answering questions posed by users through computer systems. This task can be subdivided into subtasks such as community question answering and knowledge base question answering, with evaluation metrics primarily including EM (Exact Match) and F1 scores. Currently, popular benchmark datasets include SQuAD, HotPotQA, bAbI, TriviaQA, and WikiQA. In recent years, models like T5 and XLNet have performed exceptionally well in this area, advancing the accuracy and practicality of question answering systems.
SQuAD2.0
SQuAD1.1
RuBERT
HotpotQA
Beam Retrieval
PIQA
GPT-3 175B (0-shot)
BoolQ
Gemma-7B
COPA
PaLM 540B (finetuned) 
TriviaQA
SpanBERT
SQuAD1.1 dev
T5-11B
Natural Questions
Atlas (full, Wiki-dec-2018 index)
OpenBookQA
WebQuestions
Memory Networks (ensemble)
TruthfulQA
CoA
MultiRC
DeBERTa-1.5B
CronQuestions
PubMedQA
PubMedBERT uncased
MedQA
DRAGON + BioLinkBERT
WikiQA
TANDA-RoBERTa (ASNQ, WikiQA)
SIQA
LLaMA 65B (zero-shot)
StoryCloze
BLOOMZ
DaNetQA
TimeQuestions
Quora Question Pairs
DeBERTa (large)
CNN / Daily Mail
DROP Test
QDGAT (ensemble)
NewsQA
OpenAI/o3-mini-2025-01-31-high
bAbi
STM
Natural Questions (long)
DensePhrases
SQuAD2.0 dev
XLNet (single model)
TrecQA
TANDA DeBERTa-V3-Large + ALL
StrategyQA
PaLM 2 (few-shot, CoT, SC)
MultiTQ
NarrativeQA
Masque (NarrativeQA + MS MARCO)
WikiHop
BigBird-etc
OBQA
FLAN 137B (zero-shot)
Bamboogle
TIQ
CoQA
BERT Large Augmented (single model)
Children's Book Test
NSE
TempQuestions
QAap
FEVER
KILT: ELI5
FQuAD
QASent
Attentive LSTM
BioASQ
PubMedBERT uncased
Quasart-T
RACE
YahooCQA
sMIM (1024) +
SQA3D
ScanQA (w/ auxiliary loss)
Story Cloze
Neo-6B (QA + WS)
FinQA
ELASTIC (RoBERTa-large)
NQ (BEIR)
DROP
FriendsQA
NExT-QA (Open-ended VideoQA)
PeerQA
GPT-4o-2024-08-06-128k
SemEvalCQA
HyperQA
HybridQA
MAFiD
AI2 Kaggle Dataset
FiQA-2018 (BEIR)
BLURB
BioLinkBERT (large)
QuALITY
catbAbI LM-mode
Fast Weight Memory
FairytaleQA
BART fine-tuned on FairytaleQA
HotpotQA (BEIR)
BM25+CE
MS MARCO
CheGeKa
catbAbI QA-mode
Fast Weight Memory
RuOpenBookQA
MultiQ
NaturalQA
DPR
EgoTaskQA
Complex-CronQuestions
SubGTR
Molweni
OTT-QA
Fusion Retriever+ETC
CaseHOLD
Custom Legal-BERT
ReClor
XLNet-large
TweetQA
ByT5
VNHSGE-English
DuoRC
Vector Database (ChromaDB)
ConditionalQA
FiD
SCDE
ConvFinQA
SberQuAD
Mathematics Dataset
TP-Transformer
CliCR
Gated-Attention Reader
Torque
ECONET
VNHSGE-History
MedTurkQuAD: Medical Turkish Question-Answering Dataset
VNHSGE-Geography
VNHSGE-Literature
WikiTableQuestions
TabSQLify (col+row)
Reverb
VNHSGE-Civic
Bing Chat
COMPLEXQUESTIONS
WebQA
VNHSGE-Physics
MuLD (NarrativeQA)
QuAC
FlowQA (single model)
WikiSQL
MuLD (HotpotQA)
MCTest-500
PubChemQA
BioMedGPT-10B
UniProtQA
CODAH
G-DAUG-Combo + RoBERTa-Large
GeoQuestions1089
GeoQA2
AGI Eval
MapEval-API
Claude-3.5-Sonnet (ReAct)
VNHSGE-Biology
MRQA
VNHSGE Mathematics
Aristo Kaggle Allen AI 8th grade questions
Cardal
TempQA-WD
VNHSGE-Chemistry
SQuAD
PopQA
SelfRAG-7b
SimpleQuestions
BBH
MapEval-Textual
JaQuAD
BERT-Japanese
MRQA out-of-domain
RGX
WebQuestionsSP
ChatGPT
StepGame
TP-MANN
WebSRC
SWAG
DeBERTaV3large
TAT-QA
TagOp
ChAII - Hindi and Tamil Question Answering
MuCoT
JD Product Question Answer
PAAG
QASPER
Longformer Encoder Decoder (base)
MCTest-160
syntax, frame, coreference, and word embedding features
AviationQA
KGT5
EfficientQA test
RecipeQA
multimodal+LXMERT+ConstrainedMaxPooling
GraphQuestions
ChatGPT
MedMCQA Dev
MedMobile (3.8B)
MetaQA
T5-small+prolog
HellaSwag
COCO Visual Question Answering (VQA) real images 1.0 open ended
EfficientQA dev
ComplexWebQuestions
TOME-2
KQA Pro
MMLU
MultiSpanQA
RoBERTa-large Tagger + LIQUID (Ensemble)
SchizzoSQUAD
squad_adversarial
squadshifts nyt
squadshifts amazon
squadshifts reddit
squad_v2
adversarial_qa
squadshifts new_wiki