5 months ago

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Gonzalo Martin Garcia Karim Abou Zeid Christian Schmidt Daan de Geus Alexander Hermans Bastian Leibe

Abstract

Recent work showed that large diffusion models can be reused as highlyprecise monocular depth estimators by casting depth estimation as animage-conditional image generation task. While the proposed model achievedstate-of-the-art results, high computational demands due to multi-stepinference limited its use in many scenarios. In this paper, we show that theperceived inefficiency was caused by a flaw in the inference pipeline that hasso far gone unnoticed. The fixed model performs comparably to the bestpreviously reported configuration while being more than 200times faster. Tooptimize for downstream task performance, we perform end-to-end fine-tuning ontop of the single-step model with task-specific losses and get a deterministicmodel that outperforms all other diffusion-based depth and normal estimationmodels on common zero-shot benchmarks. We surprisingly find that thisfine-tuning protocol also works directly on Stable Diffusion and achievescomparable performance to current state-of-the-art diffusion-based depth andnormal estimation models, calling into question some of the conclusions drawnfrom prior works.

Code Repositories

VisualComputingInstitute/diffusion-e2e-ft

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
monocular-depth-estimation-on-nyu-depth-v2	Marigold + E2E FT(zero-shot)	Delta u003c 1.25: 0.966 absolute relative error: 0.052
surface-normals-estimation-on-ibims-1	Marigold + E2E FT(zero-shot)	% u003c 11.25: 69.9 Mean: 15.8
surface-normals-estimation-on-nyu-depth-v2-1	Marigold + E2E FT(zero-shot)	% u003c 11.25: 61.4 Mean Angle Error: 16.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette