Command Palette
Search for a command to run...
Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset
Matěj Kocián Jakub Náplava Daniel Štancl Vladimír Kadlec

Abstract
Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| document-ranking-on-dareczech | Siamese Small-E-Czech (Electra-small) | P@10: 45.26 |
| document-ranking-on-dareczech | Query-doc RobeCzech (Roberta-base) | P@10: 46.73 |
| document-ranking-on-dareczech | Query-doc Small-E-Czech (Electra-small) | P@10: 46.30 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.