HyperAI

Traditional high-quality TTS (text-to-speech) models have long faced several core challenges: they often have high requirements for computing resources and cloud services, resulting in high costs that are difficult for small businesses and individual developers to afford; furthermore, most of these models require tens of minutes or even hours of audio data for training. These deployment and operational requirements not only raise the barrier to entry for using these models but also limit the application of TTS in privacy-sensitive scenarios.

NeuTTS-Air, the latest open-source end-to-end speech synthesis model, offers a brand-new solution to the challenges of using TTS.As the world's first locally running TTS language model that supports ultra-realistic speech synthesis and real-time speech cloning,NeuTTS-Air, based on the 0.5B Qwen LLM and NeuCodec audio codec, not only demonstrates excellent few-shot learning capabilities in edge deployment and real-time voice cloning, but can also generalize to new scenarios such as embedded agents and style transfer, supports 3-second audio cloning, and generates natural dialogue content.

Experimental evaluation shows thatNeuTTS Air achieves state-of-the-art (SOTA) performance among open-source models.Especially in hyper-realistic synthesis and real-time inference benchmarks. Post-training introduces GGML/ONNX support and a watermarking mechanism, leading the open-source field in edge-side TTS and power consumption optimization evaluations, and comparable to closed-source models in some scenarios. Even more noteworthy is this lightweight model.Inference can be performed on the CPU.Suitable for devices such as mobile phones, laptops, and Raspberry Pi.

Tutorial link for "Deploying NeuTTS-Air Voice Cloning Model on CPU":

https://go.hyper.ai/IP2a2

The release of NeuTTS-Air comes at a time when the industry's demand for efficient, low-latency, and highly realistic TTS is surging, especially in the fields of on-device deployment and real-time voice cloning. It lowers the barrier for developers to deploy high-quality TTS on mobile and edge devices, making "surreal" voices no longer the exclusive domain of large cloud models.

"NeuTTS-Air: A Lightweight and Efficient Speech Cloning Model" is now available on the HyperAI website (hyper.ai) in the "Tutorials" section.Come and experience one-click deployment!

Tutorial Link:

https://go.hyper.ai/EJvsH

Demo Run

1. After entering the hyper.ai homepage, select the "Tutorials" page, or click "View More Tutorials", select "NeuTTS-Air: Lightweight and Efficient Speech Cloning Model", and click "Run this tutorial online".

2. After the page redirects, click "Clone" in the upper right corner to clone the tutorial into your own container.

Note: You can switch languages in the upper right corner of the page. Currently, Chinese and English are available. This tutorial will show the steps in English.

3. Select the "NVIDIA GeForce RTX 5090" and "PyTorch" images, and choose "Pay As You Go" or "Daily Plan/Weekly Plan/Monthly Plan" as needed, then click "Continue job execution".

4. Wait for resources to be allocated. The first cloning process will take approximately 3 minutes. When the status changes to "Running," click the arrow next to "API Address" to jump to the Demo page. Please note that users must complete real-name authentication before using the API address.

Effect Demonstration

After entering the Demo running page, upload the reference audio in "Reference Audio", enter the reference text in the "Reference Text" text box, enter the desired audio text content after cloning in "Text to Generate", click "Submit" and wait a moment to get the cloned audio.

The above is the tutorial recommended by HyperAI this time. Everyone is welcome to come and experience it!

Tutorial Link:

https://go.hyper.ai/EJvsH

Command Palette

Online Tutorial | New State of Technology for Device-Based TTS! NeuTTS-Air Achieves 3-Second Audio Cloning Based on the 0.5B Model

Demo Run

Effect Demonstration