Command Palette
Search for a command to run...
Dat Quoc Nguyen Thanh Vu Anh Tuan Nguyen

Abstract
We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet under the MIT License to facilitate future research and applications on Tweet data. Our BERTweet is available at https://github.com/VinAIResearch/BERTweet
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| named-entity-recognition-on-wnut-2016 | BERTweet | F1: 52.1 |
| named-entity-recognition-on-wnut-2017 | BERTweet | F1: 56.5 |
| part-of-speech-tagging-on-ritter | BERTweet | Acc: 90.1 |
| part-of-speech-tagging-on-tweebank | BERTweet | Acc: 95.2 |
| sentiment-analysis-on-tweeteval | BERTweet | ALL: 67.9 Emoji: 33.4 Emotion: 79.3 Irony: 82.1 Offensive: 79.5 Sentiment: 73.4 Stance: 71.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.