Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

Qwen2.5-Omni Technical Report































TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

Qwen2.5-Omni Technical Report






























Dual-Scale Single Image Dehazing Via Neural Augmentation
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm
Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement
Expand VSR Benchmark for VLLM to Expertize in Spatial Rules
DensityTool: A post-processing tool for space and spin-resolved density of states from VASP
A One-Dimensional Energy Balance Model Parameterization for the Formation of CO2 Ice on the Surfaces of Eccentric Extrasolar Planets
Towards The Ultimate Brain: Exploring Scientific Discovery with ChatGPT AI
Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1
Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
MatterGen: a generative model for inorganic materials design
MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers
Phi-4 Technical Report
A Set of Tutorials for the LAMMPS Simulation Package
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot
Length Aware Speech Translation for Video Dubbing
DrawingSpinUp: 3D Animation from Single Character Drawings
Phonetically-oriented word error alignment for speech recognition error analysis in speech translation
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON
Deployment Calculation and Analysis for a Fail-Operational Automotive Platform
MegActor: Harness the Power of Raw Video for Vivid Portrait Animation
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback
Online Algorithm for Demand Response with Inelastic Demands and Apparent Power Constraint
ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
Quick Back-Translation for Unsupervised Machine Translation
Dual-Scale Single Image Dehazing Via Neural Augmentation
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm
Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement
Expand VSR Benchmark for VLLM to Expertize in Spatial Rules
DensityTool: A post-processing tool for space and spin-resolved density of states from VASP
A One-Dimensional Energy Balance Model Parameterization for the Formation of CO2 Ice on the Surfaces of Eccentric Extrasolar Planets
Towards The Ultimate Brain: Exploring Scientific Discovery with ChatGPT AI
Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1
Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
MatterGen: a generative model for inorganic materials design
MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers
Phi-4 Technical Report
A Set of Tutorials for the LAMMPS Simulation Package
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot
Length Aware Speech Translation for Video Dubbing
DrawingSpinUp: 3D Animation from Single Character Drawings
Phonetically-oriented word error alignment for speech recognition error analysis in speech translation
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON
Deployment Calculation and Analysis for a Fail-Operational Automotive Platform
MegActor: Harness the Power of Raw Video for Vivid Portrait Animation
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback
Online Algorithm for Demand Response with Inelastic Demands and Apparent Power Constraint
ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
Quick Back-Translation for Unsupervised Machine Translation