Command Palette
Search for a command to run...
European Parliament Proceedings Parallel Corpus 1996-2011 Statistical Machine Translation Corpus
The European Parliament Proceedings Parallel Corpus 1996-2011 dataset is a corpus for statistical machine translation. The Europarl parallel corpus is derived from the proceedings of the European Parliament and includes versions in 21 European languages:
- Romani (French, Italian, Spanish, Portuguese, Romanian)
 - Germanic languages (English, Dutch, German, Danish, Swedish)
 - Slavik (Bulgarian, Czech, Polish, Slovak, Slovenian)
 - Finni-Ugric (Finnish, Hungarian, Estonian)
 - Baltic (Latvian, Lithuanian)
 - Greek
 
The European Parliament Proceedings Parallel Corpus 1996-2011 dataset was originally published by the School of Informatics at the University of Edinburgh, Scotland in 2005, with the main publisher being Philipp Koehn.
The 7th edition of this dataset was released in 2012. The related paper is "Europarl: A Parallel Corpus for Statistical Machine Translation"
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.