HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

Moritz Reuss Jyothish Pari Pulkit Agrawal Rudolf Lioutikov

Efficient Diffusion Transformer Policies with Mixture of Expert
  Denoisers for Multitask Learning

Abstract

Diffusion Policies have become widely used in Imitation Learning, offeringseveral appealing properties, such as generating multimodal and discontinuousbehavior. As models are becoming larger to capture more complex capabilities,their computational demands increase, as shown by recent scaling laws.Therefore, continuing with the current architectures will present acomputational roadblock. To address this gap, we propose Mixture-of-DenoisingExperts (MoDE) as a novel policy for Imitation Learning. MoDE surpasses currentstate-of-the-art Transformer-based Diffusion Policies while enablingparameter-efficient scaling through sparse experts and noise-conditionedrouting, reducing both active parameters by 40% and inference costs by 90% viaexpert caching. Our architecture combines this efficient scaling withnoise-conditioned self-attention mechanism, enabling more effective denoisingacross different noise levels. MoDE achieves state-of-the-art performance on134 tasks in four established imitation learning benchmarks (CALVIN andLIBERO). Notably, by pretraining MoDE on diverse robotics data, we achieve 4.01on CALVIN ABC and 0.95 on LIBERO-90. It surpasses both CNN-based andTransformer Diffusion Policies by an average of 57% across 4 benchmarks, whileusing 90% fewer FLOPs and fewer active parameters compared to default DiffusionTransformer architectures. Furthermore, we conduct comprehensive ablations onMoDE's components, providing insights for designing efficient and scalableTransformer architectures for Diffusion Policies. Code and demonstrations areavailable at https://mbreuss.github.io/MoDE_Diffusion_Policy/.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
zero-shot-generalization-on-calvinMoDE
Avg. sequence length: 4.01

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Papers | HyperAI