Command Palette
Search for a command to run...
Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings
Iker García-Ferrero; Rodrigo Agerri; German Rigau

Abstract
Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages. In this paper we perform an in-depth study of the two main techniques employed so far for cross-lingual zero-resource sequence labelling, based either on data or model transfer. Although previous research has proposed translation and annotation projection (data-based cross-lingual transfer) as an effective technique for cross-lingual sequence labelling, in this paper we experimentally demonstrate that high capacity multilingual language models applied in a zero-shot (model-based cross-lingual transfer) setting consistently outperform data-based cross-lingual transfer approaches. A detailed analysis of our results suggests that this might be due to important differences in language use. More specifically, machine translation often generates a textual signal which is different to what the models are exposed to when using gold standard data, which affects both the fine-tuning and evaluation processes. Our results also indicate that data-based cross-lingual transfer approaches remain a competitive option when high-capacity multilingual language models are not available.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| cross-lingual-ner-on-conll-2003 | XLM-RoBERTa-large | Dutch: 82.3 German: 74.5 Spanish: 79.5 |
| cross-lingual-ner-on-conll-dutch | XLM-R large | F1: 79.7 |
| cross-lingual-ner-on-conll-german | XLM-R large | F1: 74.5 |
| cross-lingual-ner-on-conll-spanish | XLM-R large | F1: 79.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.