Command Palette
Search for a command to run...
Moondream3-preview: Modular Visual Language Understanding Model
An error occurred in the Server Components render. The specific message is omitted in production builds to avoid leaking sensitive details. A digest property is included on this error instance which may provide additional details about the nature of the error.
Failed to load notebook details1. Tutorial Introduction
Moondream3, proposed by the Moondream team in September 2025, is a visual language model based on a hybrid expert architecture, boasting 9 billion parameters (2 billion of which are activation parameters). This model provides state-of-the-art visual inference capabilities, supports a maximum context length of 32K, and can efficiently process high-resolution images. Moondream3 employs innovative MoE FFN and SigLIP visual encoders, making it suitable for tasks such as image question answering, image annotation, and object detection. Related technical literature includes... Moondream 3 Preview: Frontier-level reasoning at a blazing speed .
This tutorial uses a single RTX 5090 graphics card as the resource, and the project output only supports English.
2. Project Examples

3. Operation steps
1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model
If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.
How to use
1. Caption

2. Visual Question Answering

3. Object Detection

4. Point Detection

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.