Command Palette
Search for a command to run...
Beno James P.

Abstract
Bidirectional transformers excel at sentiment analysis, and Large LanguageModels (LLM) are effective zero-shot learners. Might they perform better as ateam? This paper explores collaborative approaches between ELECTRA and GPT-4ofor three-way sentiment classification. We fine-tuned (FT) four models (ELECTRABase/Large, GPT-4o/4o-mini) using a mix of reviews from Stanford SentimentTreebank (SST) and DynaSent. We provided input from ELECTRA to GPT as:predicted label, probabilities, and retrieved examples. Sharing ELECTRA Base FTpredictions with GPT-4o-mini significantly improved performance over eithermodel alone (82.50 macro F1 vs. 79.14 ELECTRA Base FT, 79.41 GPT-4o-mini) andyielded the lowest cost/performance ratio (\$0.12/F1 point). However, when GPTmodels were fine-tuned, including predictions decreased performance. GPT-4oFT-M was the top performer (86.99), with GPT-4o-mini FT close behind (86.70) atmuch less cost (\$0.38 vs. \$1.59/F1 point). Our results show that augmentingprompts with predictions from fine-tuned encoders is an efficient way to boostperformance, and a fine-tuned GPT-4o-mini is nearly as good as GPT-4o FT at 76%less cost. Both are affordable options for projects with limited resources.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| sentiment-analysis-on-dynasent | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1: 81.53 |
| sentiment-analysis-on-dynasent | ELECTRA Large Fine-Tuned | Macro F1: 76.29 |
| sentiment-analysis-on-dynasent | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) | Macro F1: 77.94 |
| sentiment-analysis-on-dynasent | GPT-4o + ELECTRA Large FT | Macro F1: 77.69 |
| sentiment-analysis-on-dynasent | ELECTRA Base Fine-Tuned | Macro F1: 71.83 |
| sentiment-analysis-on-dynasent | GPT-4o Fine-Tuned (Minimal) | Macro F1: 89 |
| sentiment-analysis-on-dynasent | GPT-4o-mini + ELECTRA Large FT (Prompt, Label, Probabilities) | Macro F1: 79.72 |
| sentiment-analysis-on-dynasent | GPT-4o-mini + ELECTRA Base FT | Macro F1: 76.19 |
| sentiment-analysis-on-dynasent | GPT-4o-mini Fine-Tuned | Macro F1: 86.9 |
| sentiment-analysis-on-dynasent | GPT-4o (Prompt) | Macro F1: 80.22 |
| sentiment-analysis-on-dynasent | GPT-4o-mini (Prompt) | Macro F1: 77.35 |
| sentiment-analysis-on-sentiment-merged | GPT-4o-mini + ELECTRA Base FT (Prompt, Label) | Macro F1: 82.74 |
| sentiment-analysis-on-sentiment-merged | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) | Macro F1: 83.49 |
| sentiment-analysis-on-sentiment-merged | GPT-4o Fine-Tuned (Minimal) | Macro F1: 86.99 |
| sentiment-analysis-on-sentiment-merged | GPT-4o (Prompt) | Macro F1: 80.14 |
| sentiment-analysis-on-sentiment-merged | GPT-4o + ELECTRA Large FT (Prompt, Label) | Macro F1: 81.57 |
| sentiment-analysis-on-sentiment-merged | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1: 83.09 |
| sentiment-analysis-on-sentiment-merged | ELECTRA Large Fine-Tuned | Macro F1: 82.36 |
| sentiment-analysis-on-sentiment-merged | ELECTRA Base Fine-Tuned | Macro F1: 79.29 |
| sentiment-analysis-on-sentiment-merged | GPT-4o-mini Fine-Tuned | Macro F1: 86.77 |
| sentiment-analysis-on-sentiment-merged | GPT-4o-mini (Prompt) | Macro F1: 79.52 |
| sentiment-analysis-on-sst-3 | GPT-4o-mini + ELECTRA Base FT | Macro F1: 71.72 |
| sentiment-analysis-on-sst-3 | ELECTRA Base Fine-Tuned | Macro F1: 69.95 |
| sentiment-analysis-on-sst-3 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) | Macro F1: 70.99 |
| sentiment-analysis-on-sst-3 | GPT-4o + ELECTRA Large FT | Macro F1: 72.94 |
| sentiment-analysis-on-sst-3 | GPT-4o-mini (Prompt) | Macro F1: 70.67 |
| sentiment-analysis-on-sst-3 | GPT-4o Fine-Tuned (Minimal) | Macro F1: 73.99 |
| sentiment-analysis-on-sst-3 | GPT-4o (Prompt) | Macro F1: 72.2 |
| sentiment-analysis-on-sst-3 | ELECTRA Large Fine-Tuned | Macro F1: 70.90 |
| sentiment-analysis-on-sst-3 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1: 71.98 |
| sentiment-analysis-on-sst-3 | GPT-4o-mini Fine-Tuned | Macro F1: 75.68 |
| sentiment-analysis-on-sst-3 | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1: 72.06 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.