HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Thyme: Think Beyond Images

Thyme: Think Beyond Images

Abstract

Following OpenAI's introduction of the thinking with images'' concept,recent efforts have explored stimulating the use of visual information in thereasoning process to enhance model performance in perception and reasoningtasks. However, to the best of our knowledge, no open-source work currentlyoffers a feature set as rich as proprietary models (O3), which can performdiverse image manipulations and simultaneously enhance logical reasoningcapabilities through code. In this paper, we make a preliminary attempt in thisdirection by introducing Thyme (Think Beyond Images), a novel paradigm forenabling MLLMs to transcend existingthink with images'' approaches byautonomously generating and executing diverse image processing andcomputational operations via executable code. This approach not onlyfacilitates a rich, on-the-fly set of image manipulations (e.g., cropping,rotation, contrast enhancement) but also allows for mathematical computations,all while maintaining high autonomy in deciding when and how to apply theseoperations. We activate this capability through a two-stage training strategy:an initial SFT on a curated dataset of 500K samples to teach code generation,followed by a RL phase to refine decision-making. For the RL stage, we manuallycollect and design high-resolution question-answer pairs to increase thelearning difficulty, and we propose GRPO-ATS (Group Relative PolicyOptimization with Adaptive Temperature Sampling), an algorithm that appliesdistinct temperatures to text and code generation to balance reasoningexploration with code execution precision. We conduct extensive experimentalanalysis and ablation studies. Comprehensive evaluations on nearly 20benchmarks show that Thyme yields significant and consistent performance gains,particularly in challenging high-resolution perception and complex reasoningtasks.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Thyme: Think Beyond Images | Papers | HyperAI