HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

LLaVA-Chef: A Multi-modal Generative Model for Food Recipes

Fnu Mohbat; Mohammed J. Zaki

LLaVA-Chef: A Multi-modal Generative Model for Food Recipes

Abstract

In the rapidly evolving landscape of online recipe sharing within a globalized context, there has been a notable surge in research towards comprehending and generating food recipes. Recent advancements in large language models (LLMs) like GPT-2 and LLaVA have paved the way for Natural Language Processing (NLP) approaches to delve deeper into various facets of food-related tasks, encompassing ingredient recognition and comprehensive recipe generation. Despite impressive performance and multi-modal adaptability of LLMs, domain-specific training remains paramount for their effective application. This work evaluates existing LLMs for recipe generation and proposes LLaVA-Chef, a novel model trained on a curated dataset of diverse recipe prompts in a multi-stage approach. First, we refine the mapping of visual food image embeddings to the language space. Second, we adapt LLaVA to the food domain by fine-tuning it on relevant recipe data. Third, we utilize diverse prompts to enhance the model's recipe comprehension. Finally, we improve the linguistic quality of generated recipes by penalizing the model with a custom loss function. LLaVA-Chef demonstrates impressive improvements over pretrained LLMs and prior works. A detailed qualitative analysis reveals that LLaVA-Chef generates more detailed recipes with precise ingredient mentions, compared to existing approaches.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
recipe-generation-on-allrecipescomLLaVA-Chef
BLEU: 6.0
Perplexity: 2.6
recipe-generation-on-foodcomLLaVA-Chef
BLEU-1: 29
BLEU-4: 6
BPE Perplexity: 2.6
D-1: 0
D-2: 0
Rouge-L: 18.4
recipe-generation-on-now-youre-cookingLLaVA-Chef
Perplexity: 2.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LLaVA-Chef: A Multi-modal Generative Model for Food Recipes | Papers | HyperAI