Command Palette
Search for a command to run...
Moondream3-preview: Modular Visual Language Understanding Model
1. Tutorial Introduction
Moondream3 is a visual language model based on a hybrid expert architecture proposed by the Moondream team in September 2025. It has 9 billion parameters (2 billion of which are activation parameters). This model provides state-of-the-art visual reasoning capabilities, supports a maximum context length of 32K, and can efficiently process high-resolution images. Moondream3 uses the innovative MoE FFN and SigLIP visual encoders, and is suitable for tasks such as image question answering, image annotation, and object detection. Related technical literature is "Moondream 3 Preview: Frontier-level reasoning at a blazing speed".
This tutorial uses a single RTX 5090 graphics card as the resource, and the project output only supports English.
2. Project Examples

3. Operation steps
1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model
If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.
How to use
1. Caption

2. Visual Question Answering

3. Object Detection

4. Point Detection

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.