Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Seedance 2.0: Advancing Video Generation for World Complexity































RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Seedance 2.0: Advancing Video Generation for World Complexity






























GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark
ParseBench: A Document Parsing Benchmark for AI Agents
Memory Intelligence Agent
PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE
Internalized Reasoning for Long-Context Visual Document Understanding
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
Elastic Looped Transformers for Visual Generation
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
EXAONE 4.5 Technical Report
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios
WildDet3D: Scaling Promptable 3D Detection in the Wild
Autoreason: Self-Refinement That Knows When to Stop
ActiveGlasses: Learning Manipulation with Active Vision from Ego-centric Human Demonstration
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark
ParseBench: A Document Parsing Benchmark for AI Agents
Memory Intelligence Agent
PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE
Internalized Reasoning for Long-Context Visual Document Understanding
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
Elastic Looped Transformers for Visual Generation
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
EXAONE 4.5 Technical Report
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios
WildDet3D: Scaling Promptable 3D Detection in the Wild
Autoreason: Self-Refinement That Knows When to Stop
ActiveGlasses: Learning Manipulation with Active Vision from Ego-centric Human Demonstration
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models