HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction

Alexander Brinkmann; Roee Shraga; Christian Bizer

ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction

Abstract

E-commerce platforms require structured product data in the form of attribute-value pairs to offer features such as faceted product search or attribute-based product comparison. However, vendors often provide unstructured product descriptions, necessitating the extraction of attribute-value pairs from these texts. BERT-based extraction methods require large amounts of task-specific training data and struggle with unseen attribute values. This paper explores using large language models (LLMs) as a more training-data efficient and robust alternative. We propose prompt templates for zero-shot and few-shot scenarios, comparing textual and JSON-based target schema representations. Our experiments show that GPT-4 achieves the highest average F1-score of 85% using detailed attribute descriptions and demonstrations. Llama-3-70B performs nearly as well, offering a competitive open-source alternative. GPT-4 surpasses the best PLM baseline by 5% in F1-score. Fine-tuning GPT-3.5 increases the performance to the level of GPT-4 but reduces the model's ability to generalize to unseen attribute values.

Code Repositories

wbsg-uni-mannheim/extractgpt
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
attribute-value-extraction-on-ae-110kft-GPT-3.5-json-val
F1-score: 86
attribute-value-extraction-on-ae-110kGPT-4-json-val-10-dem
F1-score: 87.5
attribute-value-extraction-on-oa-mineft-GPT-3.5-json-val
F1-score: 84.5
attribute-value-extraction-on-oa-mineGPT-4-json-val-10-dem
F1-score: 82.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction | Papers | HyperAI