HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Matěj Kocián Jakub Náplava Daniel Štancl Vladimír Kadlec

Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Abstract

Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.

Code Repositories

seznam/dareczech
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
document-ranking-on-dareczechSiamese Small-E-Czech (Electra-small)
P@10: 45.26
document-ranking-on-dareczechQuery-doc RobeCzech (Roberta-base)
P@10: 46.73
document-ranking-on-dareczechQuery-doc Small-E-Czech (Electra-small)
P@10: 46.30

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset | Papers | HyperAI