3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

If "Bad Gateway" is displayed, it means that the model is initializing. Since the model is large, please wait for about 2-3 minutes and refresh the page. When using the Safari browser, the audio may not be played directly and needs to be downloaded before playing.

*This tutorial allows you to choose between single-player audio generation (Single) and two-player dialogue audio generation (Role) in the "Audio Input Mode".

HyperAI

Run this Notebook

Date

2 months ago

Size

8.4 MB

1. Tutorial Introduction

This tutorial uses a single RTX 5090 card as the resource.

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

If "Bad Gateway" is displayed, it means that the model is initializing. Since the model is large, please wait for about 2-3 minutes and refresh the page. When using the Safari browser, the audio may not be played directly and needs to be downloaded before playing.

*This tutorial allows you to choose between single-player audio generation (Single) and two-player dialogue audio generation (Role) in the "Audio Input Mode".

Citation Information

The citation information for this project is as follows:

@article{moss2025ttsd,
  title={Text to Spoken Dialogue Generation}, 
  author={OpenMOSS Team},
  year={2025}
}

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook

Date

2 months ago

Size

8.4 MB

1. Tutorial Introduction

This tutorial uses a single RTX 5090 card as the resource.

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Usage steps

If "Bad Gateway" is displayed, it means that the model is initializing. Since the model is large, please wait for about 2-3 minutes and refresh the page. When using the Safari browser, the audio may not be played directly and needs to be downloaded before playing.

*This tutorial allows you to choose between single-player audio generation (Single) and two-player dialogue audio generation (Role) in the "Audio Input Mode".

Citation Information

The citation information for this project is as follows:

@article{moss2025ttsd,
  title={Text to Spoken Dialogue Generation}, 
  author={OpenMOSS Team},
  year={2025}
}

Related Notebooks

Krea-realtime-video: Real-time Video Generation Model

3 months ago

F5-E2 TTS Clones Any Sound in Just 3 Seconds

2 months ago

ROCKET-2: 3D Game Zero-Shot Transfer

2 months ago

MAGE: Monoclonal Antibody Gene Generator

2 months ago

One-click Deployment of Ministry-3-14B-Instruct

2 months ago

LongCat-Image: A Bilingual Text-Driven Image Generation System

2 months ago

OCRFlux-3B: Intelligent Text Recognition Toolkit

3 months ago

JarvisArt-Preview Smart Photo Retouching Proxy

a month ago

kyutai-tts-1.6 b-en_fr Audio Generation

a month ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

MOSS: Text-to-Spoken Dialogue Generation

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Citation Information

Build AI with AI

HyperAI Newsletters

Command Palette

MOSS: Text-to-Spoken Dialogue Generation

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Citation Information

Related Notebooks

Krea-realtime-video: Real-time Video Generation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

ROCKET-2: 3D Game Zero-Shot Transfer

MAGE: Monoclonal Antibody Gene Generator

One-click Deployment of Ministry-3-14B-Instruct

LongCat-Image: A Bilingual Text-Driven Image Generation System

OCRFlux-3B: Intelligent Text Recognition Toolkit

JarvisArt-Preview Smart Photo Retouching Proxy

kyutai-tts-1.6 b-en_fr Audio Generation

Build AI with AI

HyperAI Newsletters

Command Palette

MOSS: Text-to-Spoken Dialogue Generation

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Citation Information

Related Notebooks

Krea-realtime-video: Real-time Video Generation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

ROCKET-2: 3D Game Zero-Shot Transfer

MAGE: Monoclonal Antibody Gene Generator

One-click Deployment of Ministry-3-14B-Instruct

LongCat-Image: A Bilingual Text-Driven Image Generation System

OCRFlux-3B: Intelligent Text Recognition Toolkit

JarvisArt-Preview Smart Photo Retouching Proxy

kyutai-tts-1.6 b-en_fr Audio Generation

Build AI with AI

HyperAI Newsletters

Related Notebooks

Krea-realtime-video: Real-time Video Generation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

ROCKET-2: 3D Game Zero-Shot Transfer

MAGE: Monoclonal Antibody Gene Generator

One-click Deployment of Ministry-3-14B-Instruct

LongCat-Image: A Bilingual Text-Driven Image Generation System

OCRFlux-3B: Intelligent Text Recognition Toolkit

JarvisArt-Preview Smart Photo Retouching Proxy

kyutai-tts-1.6 b-en_fr Audio Generation

Related Notebooks

Krea-realtime-video: Real-time Video Generation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

ROCKET-2: 3D Game Zero-Shot Transfer

MAGE: Monoclonal Antibody Gene Generator

One-click Deployment of Ministry-3-14B-Instruct

LongCat-Image: A Bilingual Text-Driven Image Generation System

OCRFlux-3B: Intelligent Text Recognition Toolkit

JarvisArt-Preview Smart Photo Retouching Proxy

kyutai-tts-1.6 b-en_fr Audio Generation