Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Algorithmic Thinking Theory

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics































Algorithmic Thinking Theory

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics






























Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
OneThinker: All-in-one Reasoning Model for Image and Video
ViDiC: Video Difference Captioning
PretrainZero: Reinforcement Active Pretraining
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
SimScale: Learning to Drive via Real-World Simulation at Scale
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Guided Self-Evolving LLMs with Minimal Human Supervision
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
How Far Are We from Genuinely Useful Deep Research Agents?
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection
Mem-α: Learning Memory Construction via Reinforcement Learning
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
OneThinker: All-in-one Reasoning Model for Image and Video
ViDiC: Video Difference Captioning
PretrainZero: Reinforcement Active Pretraining
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
SimScale: Learning to Drive via Real-World Simulation at Scale
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Guided Self-Evolving LLMs with Minimal Human Supervision
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
How Far Are We from Genuinely Useful Deep Research Agents?
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection
Mem-α: Learning Memory Construction via Reinforcement Learning
Search Self-play: Pushing the Frontier of Agent Capability without Supervision