3 months ago

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Yiming Ren Zhiqiang Lin Yu Li Gao Meng Weiyun Wang Junjie Wang Zicheng Lin Jifeng Dai Yujiu Yang Wenhai Wang

Abstract

Controllable captioning is essential for precise multimodal alignment andinstruction following, yet existing models often lack fine-grained control andreliable evaluation protocols. To address this gap, we present the AnyCapProject, an integrated solution spanning model, dataset, and evaluation. Weintroduce AnyCapModel (ACM), a lightweight plug-and-play framework thatenhances the controllability of existing foundation models for omni-modalcaptioning without retraining the base model. ACM reuses the original captionsfrom base models while incorporating user instructions and modality features togenerate improved captions. To remedy the data scarcity in controllablemultimodal captioning, we build AnyCapDataset (ACD), covering three modalities,28 user-instruction types, and 300\,k high-quality data entries. We furtherpropose AnyCapEval, a new benchmark that provides more reliable evaluationmetrics for controllable captioning by decoupling content accuracy andstylistic fidelity. ACM markedly improves caption quality across a diverse setof base models on AnyCapEval. Notably, ACM-8B raises GPT-4o\'s content scoresby 45\% and style scores by 12\%, and it also achieves substantial gains onwidely used benchmarks such as MIA-Bench and VidCapBench.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Yiming Ren Zhiqiang Lin Yu Li Gao Meng Weiyun Wang Junjie Wang Zicheng Lin Jifeng Dai Yujiu Yang Wenhai Wang1 more

Abstract

Build AI with AI

Hyper Newsletters

Yiming Ren Zhiqiang Lin Yu Li Gao Meng Weiyun Wang Junjie Wang Zicheng Lin Jifeng Dai Yujiu Yang Wenhai Wang