3 months ago

Variational Open-Domain Question Answering

Valentin Liévin Andreas Geert Motzfeldt Ida Riis Jensen Ole Winther

Abstract

Retrieval-augmented models have proven to be effective in natural language processing tasks, yet there remains a lack of research on their optimization using variational inference. We introduce the Variational Open-Domain (VOD) framework for end-to-end training and evaluation of retrieval-augmented models, focusing on open-domain question answering and language modelling. The VOD objective, a self-normalized estimate of the Rényi variational bound, approximates the task marginal likelihood and is evaluated under samples drawn from an auxiliary sampling distribution (cached retriever and/or approximate posterior). It remains tractable, even for retriever distributions defined on large corpora. We demonstrate VOD's versatility by training reader-retriever BERT-sized models on multiple-choice medical exam questions. On the MedMCQA dataset, we outperform the domain-tuned Med-PaLM by +5.3% despite using 2.500$\times$ fewer parameters. Our retrieval-augmented BioLinkBERT model scored 62.9% on the MedMCQA and 55.0% on the MedQA-USMLE. Last, we show the effectiveness of our learned retriever component in the context of medical semantic search.

Code Repositories

VodLM/vod

Official

pytorch

Mentioned in GitHub

findzebra/fz-openqa

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
multiple-choice-question-answering-mcqa-on-21	VOD (BioLinkBERT)	Dev Set (Acc-%): 0.583 Test Set (Acc-%): 0.629
question-answering-on-medqa-usmle	VOD (BioLinkBERT)	Accuracy: 55.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette