Command Palette
Search for a command to run...

Abstract
Image editing has achieved remarkable progress recently. Modern editingmodels could already follow complex instructions to manipulate the originalcontent. However, beyond completing the editing instructions, the accompanyingphysical effects are the key to the generation realism. For example, removingan object should also remove its shadow, reflections, and interactions withnearby objects. Unfortunately, existing models and benchmarks mainly focus oninstruction completion but overlook these physical effects. So, at this moment,how far are we from physically realistic image editing? To answer this, weintroduce PICABench, which systematically evaluates physical realism acrosseight sub-dimension (spanning optics, mechanics, and state transitions) formost of the common editing operations (add, remove, attribute change, etc). Wefurther propose the PICAEval, a reliable evaluation protocol that usesVLM-as-a-judge with per-case, region-level human annotations and questions.Beyond benchmarking, we also explore effective solutions by learning physicsfrom videos and construct a training dataset PICA-100K. After evaluating mostof the mainstream models, we observe that physical realism remains achallenging problem with large rooms to explore. We hope that our benchmark andproposed solutions can serve as a foundation for future work moving from naivecontent editing toward physically consistent realism.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.