Command Palette
Search for a command to run...
BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews
Mohsinul Kabir Obayed Bin Mahfuz Syed Rifat Raiyan Hasan Mahmud Md Kamrul Hasan

Abstract
The analysis of consumer sentiment, as expressed through reviews, can provide a wealth of insight regarding the quality of a product. While the study of sentiment analysis has been widely explored in many popular languages, relatively less attention has been given to the Bangla language, mostly due to a lack of relevant data and cross-domain adaptability. To address this limitation, we present BanglaBook, a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We provide a detailed statistical analysis of the dataset and employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features, emphasizing the necessity for additional training resources in this domain. Additionally, we conduct an in-depth error analysis by examining sentiment unigrams, which may provide insight into common classification errors in under-resourced languages like Bangla. Our codes and data are publicly available at https://github.com/mohsinulkabir14/BanglaBook.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| sentiment-analysis-on-banglabook | Logistic Regression (word 2-gram + word 3-gram) | Weighted Average F1-score: 0.8964 |
| sentiment-analysis-on-banglabook | Random Forest (word 1-gram) | Weighted Average F1-score: 0.9043 |
| sentiment-analysis-on-banglabook | Bangla-BERT (base-uncased) | Weighted Average F1-score: 0.9064 |
| sentiment-analysis-on-banglabook | XGBoost (word 2-gram + word 3-gram) | Weighted Average F1-score: 0.8651 |
| sentiment-analysis-on-banglabook | Random Forest (word 2-gram + word 3-gram) | Weighted Average F1-score: 0.9106 |
| sentiment-analysis-on-banglabook | LSTM (GloVe) | Weighted Average F1-score: 0.0991 |
| sentiment-analysis-on-banglabook | Multinomial NB (word 2-gram + word 3-gram) | Weighted Average F1-score: 0.8663 |
| sentiment-analysis-on-banglabook | Multinomial NB (BoW) | Weighted Average F1-score: 0.8564 |
| sentiment-analysis-on-banglabook | Bangla-BERT (large) | Weighted Average F1-score: 0.9331 |
| sentiment-analysis-on-banglabook | Logistic Regression (char 2-gram + char 3-gram) | Weighted Average F1-score: 0.8978 |
| sentiment-analysis-on-banglabook | SVM (word 1-gram) | Weighted Average F1-score: 0.8519 |
| sentiment-analysis-on-banglabook | SVM (word 2-gram + word 3-gram) | Weighted Average F1-score: 0.9053 |
| sentiment-analysis-on-banglabook | XGBoost (char 2-gram + char 3-gram) | Weighted Average F1-score: 0.8723 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.