HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

A Statutory Article Retrieval Dataset in French

Antoine Louis Gerasimos Spanakis

A Statutory Article Retrieval Dataset in French

Abstract

Statutory article retrieval is the task of automatically retrieving law articles relevant to a legal question. While recent advances in natural language processing have sparked considerable interest in many legal tasks, statutory article retrieval remains primarily untouched due to the scarcity of large-scale and high-quality annotated datasets. To address this bottleneck, we introduce the Belgian Statutory Article Retrieval Dataset (BSARD), which consists of 1,100+ French native legal questions labeled by experienced jurists with relevant articles from a corpus of 22,600+ Belgian law articles. Using BSARD, we benchmark several state-of-the-art retrieval approaches, including lexical and dense architectures, both in zero-shot and supervised setups. We find that fine-tuned dense retrieval models significantly outperform other systems. Our best performing baseline achieves 74.8% R@100, which is promising for the feasibility of the task and indicates there is still room for improvement. By the specificity of the domain and addressed task, BSARD presents a unique challenge problem for future research on legal information retrieval. Our dataset and source code are publicly available.

Code Repositories

maastrichtlawtech/bsard
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
information-retrieval-on-bsardSiamese Bi-Encoder (RoBERTa)
Recall@100: 71.63
Recall@200: 78.38
Recall@500: 83.77
information-retrieval-on-bsardBM25
Recall@100: 51.33
Recall@200: 56.78
Recall@500: 64.71
information-retrieval-on-bsardTwo-tower Bi-Encoder (RoBERTa)
Recall@100: 74.78
Recall@200: 78.04
Recall@500: 83.39

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Statutory Article Retrieval Dataset in French | Papers | HyperAI