8 months ago

Abstract

Bidirectional transformers excel at sentiment analysis, and Large LanguageModels (LLM) are effective zero-shot learners. Might they perform better as ateam? This paper explores collaborative approaches between ELECTRA and GPT-4ofor three-way sentiment classification. We fine-tuned (FT) four models (ELECTRABase/Large, GPT-4o/4o-mini) using a mix of reviews from Stanford SentimentTreebank (SST) and DynaSent. We provided input from ELECTRA to GPT as:predicted label, probabilities, and retrieved examples. Sharing ELECTRA Base FTpredictions with GPT-4o-mini significantly improved performance over eithermodel alone (82.50 macro F1 vs. 79.14 ELECTRA Base FT, 79.41 GPT-4o-mini) andyielded the lowest cost/performance ratio ($0.12/F1 point). However, when GPTmodels were fine-tuned, including predictions decreased performance. GPT-4oFT-M was the top performer (86.99), with GPT-4o-mini FT close behind (86.70) atmuch less cost ($0.38 vs. $1.59/F1 point). Our results show that augmentingprompts with predictions from fine-tuned encoders is an efficient way to boostperformance, and a fine-tuned GPT-4o-mini is nearly as good as GPT-4o FT at 76%less cost. Both are affordable options for projects with limited resources.

Source PDF