1. Tutorial Introduction

Moondream3, proposed by the Moondream team in September 2025, is a visual language model based on a hybrid expert architecture, boasting 9 billion parameters (2 billion of which are activation parameters). This model provides state-of-the-art visual inference capabilities, supports a maximum context length of 32K, and can efficiently process high-resolution images. Moondream3 employs innovative MoE FFN and SigLIP visual encoders, making it suitable for tasks such as image question answering, image annotation, and object detection. Related technical literature includes... Moondream 3 Preview: Frontier-level reasoning at a blazing speed .

This tutorial uses a single RTX 5090 graphics card as the resource, and the project output only supports English.

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.

How to use

1. Caption

2. Visual Question Answering

3. Object Detection

4. Point Detection

HyperAI

Run this Notebook Discuss on Discord

Date

4 months ago

Size

13.36 MB

1. Tutorial Introduction

This tutorial uses a single RTX 5090 graphics card as the resource, and the project output only supports English.

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.

How to use

1. Caption

2. Visual Question Answering

3. Object Detection

4. Point Detection

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook Discuss on Discord

Date

4 months ago

Size

13.36 MB

1. Tutorial Introduction

This tutorial uses a single RTX 5090 graphics card as the resource, and the project output only supports English.

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.

How to use

1. Caption

2. Visual Question Answering

3. Object Detection

4. Point Detection

Related Notebooks

JarvisArt-Preview Smart Photo Retouching Proxy

a month ago

Krea-realtime-video: Real-time Video Generation Model

3 months ago

ROCKET-2: 3D Game Zero-Shot Transfer

2 months ago

OCRFlux-3B: Intelligent Text Recognition Toolkit

3 months ago

SAM3: Visual Segmentation Model

2 months ago

Depth-Anything-3: Restoring Visual Space From Any Perspective

2 months ago

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

2 months ago

Chandra: High-precision Document OCR

2 months ago

MOSS: Text-to-Spoken Dialogue Generation

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Moondream3-preview: Modular Visual Language Understanding Model

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Build AI with AI

HyperAI Newsletters

Command Palette

Moondream3-preview: Modular Visual Language Understanding Model

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Related Notebooks

JarvisArt-Preview Smart Photo Retouching Proxy

Krea-realtime-video: Real-time Video Generation Model

ROCKET-2: 3D Game Zero-Shot Transfer

OCRFlux-3B: Intelligent Text Recognition Toolkit

SAM3: Visual Segmentation Model

Depth-Anything-3: Restoring Visual Space From Any Perspective

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

Chandra: High-precision Document OCR

MOSS: Text-to-Spoken Dialogue Generation

Build AI with AI

HyperAI Newsletters

Command Palette

Moondream3-preview: Modular Visual Language Understanding Model

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Related Notebooks

JarvisArt-Preview Smart Photo Retouching Proxy

Krea-realtime-video: Real-time Video Generation Model

ROCKET-2: 3D Game Zero-Shot Transfer

OCRFlux-3B: Intelligent Text Recognition Toolkit

SAM3: Visual Segmentation Model

Depth-Anything-3: Restoring Visual Space From Any Perspective

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

Chandra: High-precision Document OCR

MOSS: Text-to-Spoken Dialogue Generation

Build AI with AI

HyperAI Newsletters

Related Notebooks

JarvisArt-Preview Smart Photo Retouching Proxy

Krea-realtime-video: Real-time Video Generation Model

ROCKET-2: 3D Game Zero-Shot Transfer

OCRFlux-3B: Intelligent Text Recognition Toolkit

SAM3: Visual Segmentation Model

Depth-Anything-3: Restoring Visual Space From Any Perspective

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

Chandra: High-precision Document OCR

MOSS: Text-to-Spoken Dialogue Generation

Related Notebooks

JarvisArt-Preview Smart Photo Retouching Proxy

Krea-realtime-video: Real-time Video Generation Model

ROCKET-2: 3D Game Zero-Shot Transfer

OCRFlux-3B: Intelligent Text Recognition Toolkit

SAM3: Visual Segmentation Model

Depth-Anything-3: Restoring Visual Space From Any Perspective

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

Chandra: High-precision Document OCR

MOSS: Text-to-Spoken Dialogue Generation