Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Native and Compact Structured Latents for 3D Generation

Continuous Audio Language Models































Native and Compact Structured Latents for 3D Generation

Continuous Audio Language Models






























Evolving Interactive Diagnostic Agents in a Virtual Clinical Environment
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation
Fara-7B: An Efficient Agentic Model for Computer Use
Fun-ASR Technical Report
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques
Scaling Small Agents Through Strategy Auctions
Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR
EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models
A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization
SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently
Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks
AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints
CL-bench: A Benchmark for Context Learning
Reinforcement Learning via Self-Distillation
Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines
POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
Closing the Loop: Universal Repository Representation with RPG-Encoder
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
Kimi K2.5: Visual Agentic Intelligence
Evolving Interactive Diagnostic Agents in a Virtual Clinical Environment
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation
Fara-7B: An Efficient Agentic Model for Computer Use
Fun-ASR Technical Report
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques
Scaling Small Agents Through Strategy Auctions
Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR
EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models
A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization
SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently
Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks
AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints
CL-bench: A Benchmark for Context Learning
Reinforcement Learning via Self-Distillation
Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines
POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
Closing the Loop: Universal Repository Representation with RPG-Encoder
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
Kimi K2.5: Visual Agentic Intelligence