Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward































GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward






























Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Seedance 2.0: Advancing Video Generation for World Complexity
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark
ParseBench: A Document Parsing Benchmark for AI Agents
Memory Intelligence Agent
PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE
Internalized Reasoning for Long-Context Visual Document Understanding
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
ELT: Elastic Looped Transformers for Visual Generation
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
EXAONE 4.5 Technical Report
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Seedance 2.0: Advancing Video Generation for World Complexity
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark
ParseBench: A Document Parsing Benchmark for AI Agents
Memory Intelligence Agent
PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE
Internalized Reasoning for Long-Context Visual Document Understanding
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
ELT: Elastic Looped Transformers for Visual Generation
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
EXAONE 4.5 Technical Report
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details