HyperAIHyperAI

Command Palette

Search for a command to run...

Moondream3-preview: Modular Visual Language Understanding Model

License

1. Tutorial Introduction

Moondream3 is a visual language model based on a hybrid expert architecture proposed by the Moondream team in September 2025. It has 9 billion parameters (2 billion of which are activation parameters). This model provides state-of-the-art visual reasoning capabilities, supports a maximum context length of 32K, and can efficiently process high-resolution images. Moondream3 uses the innovative MoE FFN and SigLIP visual encoders, and is suitable for tasks such as image question answering, image annotation, and object detection. Related technical literature is "Moondream 3 Preview: Frontier-level reasoning at a blazing speed".

This tutorial uses a single RTX 5090 graphics card as the resource, and the project output only supports English.

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.

How to use

1. Caption

2. Visual Question Answering

3. Object Detection

4. Point Detection

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Moondream3-preview: Modular Visual Language Understanding Model | Tutorials | HyperAI