HyperAIHyperAI

Command Palette

Search for a command to run...

Token Preference Optimization

Date

9 months ago

Token Preference Optimization (TPO) is a novel method proposed by Alibaba Group and Mohamed bin Zayed University of Artificial Intelligence in January 2025 to reduce the hallucination problem of large visual language models (LVLMs).Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation".

TPO aims to achieve token-level distribution correction without fine-grained manual annotation by introducing a self-calibrated visual anchor reward mechanism, allowing the model to pay more attention to visual information and reduce hallucinations. It can automatically identify "visual anchor tokens" that are highly correlated with the input visual embedding and adaptively distribute rewards based on their dependence on visual information. Compared with traditional sentence-level rewards, TPO can more finely adjust the generated content and reduce hallucination problems.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Token Preference Optimization | Wiki | HyperAI