HyperAIHyperAI

Command Palette

Search for a command to run...

MoMa Architecture

Date

a year ago

The MoMa framework (full name: Mixture of Modality-Aware Experts) was proposed by Meta in the paper “MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts" proposed a new modality-aware mixture of experts (MoE) architecture designed for pre-training mixed-modality, early-fusion language models.

MoMa processes arbitrary sequences of images and text by partitioning expert modules into modality-specific groups. These groups specialize in processing specified tokens while adopting learned routing within each group to maintain semantically informed adaptivity. The results show that pre-training efficiency is significantly improved through this modality-specific parameter allocation. Under a 1 trillion token training budget, the MoMa 1.4B model with 4 text experts and 4 image experts achieves FLOP savings: 3.7x overall savings compared to the computationally equivalent dense baseline, with 2.6x savings for text and 5.2x savings for image processing, measured by pre-training loss. This outperforms the standard expert selection MoE with 8 mixed-modality experts, which achieves 3x overall FLOP savings (3x for text and 2.8x for image). Combining MoMa with Mixed-with-Deep (MoD) further reduces pre-training FLOPs to 4.2x overall (text: 3.4x, image: 5.3x), although this combination degrades causal inference performance due to increased sensitivity to router accuracy. These results suggest that MoMa has the potential to significantly improve the efficiency of mixed-modality, early-fusion language model pre-training, paving the way for more resource-efficient and powerful multimodal AI systems.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MoMa Architecture | Wiki | HyperAI