Command Palette
Search for a command to run...
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Abstract
In this work, we pursue a unified paradigm for multimodal pretraining tobreak the scaffolds of complex task/modality-specific customization. We proposeOFA, a Task-Agnostic and Modality-Agnostic framework that supports TaskComprehensiveness. OFA unifies a diverse set of cross-modal and unimodal tasks,including image generation, visual grounding, image captioning, imageclassification, language modeling, etc., in a simple sequence-to-sequencelearning framework. OFA follows the instruction-based learning in bothpretraining and finetuning stages, requiring no extra task-specific layers fordownstream tasks. In comparison with the recent state-of-the-art vision &language models that rely on extremely large cross-modal datasets, OFA ispretrained on only 20M publicly available image-text pairs. Despite itssimplicity and relatively small-scale training data, OFA achieves new SOTAs ina series of cross-modal tasks while attaining highly competitive performanceson uni-modal tasks. Our further analysis indicates that OFA can alsoeffectively transfer to unseen tasks and unseen domains. Our code and modelsare publicly available at https://github.com/OFA-Sys/OFA.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-captioning-on-coco-captions | OFA | BLEU-4: 44.9 CIDER: 154.9 METEOR: 32.5 SPICE: 26.6 |
| object-categorization-on-grit | OFA_Large | Categorization (ablation): 22.6 |
| self-supervised-image-classification-on-1 | OFA (Large) | Number of Params: 473M Top 1 Accuracy: 85.6% |
| text-summarization-on-gigaword | OFA | ROUGE-1: 39.81 ROUGE-2: 20.66 ROUGE-L: 37.11 |
| visual-entailment-on-snli-ve-test | OFA | Accuracy: 91.2 |
| visual-entailment-on-snli-ve-val | OFA | Accuracy: 91.0 |
| visual-question-answering-on-grit-1 | OFA | VQA (ablation): 72.4 |
| visual-question-answering-on-vqa-v2-test-dev-1 | OFA | Accuracy: 82.0 |
| visual-question-answering-on-vqa-v2-test-std-1 | OFA | number: 71.44 other: 73.35 overall: 81.98 yes/no: 94.66 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.