Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)































PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)






























OmniSVG: A Unified Scalable Vector Graphics Generation Model
Algorithmic Thinking Theory
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
OneThinker: All-in-one Reasoning Model for Image and Video
ViDiC: Video Difference Captioning
PretrainZero: Reinforcement Active Pretraining
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
SimScale: Learning to Drive via Real-World Simulation at Scale
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Guided Self-Evolving LLMs with Minimal Human Supervision
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
How Far Are We from Genuinely Useful Deep Research Agents?
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Algorithmic Thinking Theory
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
OneThinker: All-in-one Reasoning Model for Image and Video
ViDiC: Video Difference Captioning
PretrainZero: Reinforcement Active Pretraining
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
SimScale: Learning to Drive via Real-World Simulation at Scale
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Guided Self-Evolving LLMs with Minimal Human Supervision
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
How Far Are We from Genuinely Useful Deep Research Agents?
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence