7 months ago

Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng Fei Tian Feiyu Shen Gang Yu Haoyang Zhang Jingbei Li

Abstract

This paper presents Step-Audio~2, an end-to-end multi-modal large languagemodel designed for industry-strength audio understanding and speechconversation. By integrating a latent audio encoder and reasoning-centricreinforcement learning (RL), Step-Audio 2 achieves promising performance inautomatic speech recognition (ASR) and audio understanding. To facilitategenuine end-to-end speech conversation, Step-Audio 2 incorporates thegeneration of discrete audio tokens into language modeling, significantlyenhancing its responsiveness to paralinguistic information such as speakingstyles and emotions. To effectively leverage the rich textual and acousticknowledge in real-world data, Step-Audio 2 integrates retrieval-augmentedgeneration (RAG) and is able to call external tools such as web search tomitigate hallucination and audio search to switch timbres. Trained on millionsof hours of speech and audio data, Step-Audio 2 delivers intelligence andexpressiveness across diverse conversational scenarios. Evaluation resultsdemonstrate that Step-Audio 2 achieves state-of-the-art performance on variousaudio understanding and conversational benchmarks compared to other open-sourceand commercial solutions. Please visithttps://github.com/stepfun-ai/Step-Audio2 for more information.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

7 months ago

Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng Fei Tian Feiyu Shen Gang Yu Haoyang Zhang Jingbei Li

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

7 months ago

Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng Fei Tian Feiyu Shen Gang Yu Haoyang Zhang Jingbei Li

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Step-Audio 2 Technical Report

Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng Fei Tian Feiyu Shen Gang Yu Haoyang Zhang Jingbei Li99 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Step-Audio 2 Technical Report

Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng Fei Tian Feiyu Shen Gang Yu Haoyang Zhang Jingbei Li99 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Step-Audio 2 Technical Report

Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng Fei Tian Feiyu Shen Gang Yu Haoyang Zhang Jingbei Li99 more

Abstract

Build AI with AI

HyperAI Newsletters

Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng Fei Tian Feiyu Shen Gang Yu Haoyang Zhang Jingbei Li

Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng Fei Tian Feiyu Shen Gang Yu Haoyang Zhang Jingbei Li

Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng Fei Tian Feiyu Shen Gang Yu Haoyang Zhang Jingbei Li