HyperAIHyperAI

Command Palette

Search for a command to run...

19 hours ago

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Jiawei Gu Yunzhuo Hao Huichen Will Wang Linjie Li Michael Qizhe Shieh Yejin Choi Ranjay Krishna Yu Cheng

ThinkMorph: Emergent Properties in Multimodal Interleaved
  Chain-of-Thought Reasoning

Abstract

Multimodal reasoning requires iterative coordination between language and vision, yet it remains unclear what constitutes a meaningful interleaved chain of thought. We posit that text and image thoughts should function as complementary, rather than isomorphic, modalities that mutually advance reasoning. Guided by this principle, we build ThinkMorph, a unified model fine-tuned on 24K high-quality interleaved reasoning traces spanning tasks with varying visual engagement. ThinkMorph learns to generate progressive text-image reasoning steps that concretely manipulate visual content while maintaining coherent verbal logic. It delivers large gains on vision-centric benchmarks (averaging 34.7% over the base model) and generalizes to out-of-domain tasks, matching or surpassing larger and proprietary VLMs. Beyond performance, ThinkMorph exhibits emergent multimodal intelligence, including unseen visual manipulation skills, adaptive switching between reasoning modes, and better test-time scaling through diversified multimodal thoughts.These findings suggest promising directions for characterizing the emergent capabilities of unified models for multimodal reasoning.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning | Papers | HyperAI