Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Listener-Rewarded Thinking in VLMs for Image Preferences































SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Listener-Rewarded Thinking in VLMs for Image Preferences






























Calligrapher: Freestyle Text Image Customization
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy
From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
Zero-shot antibody design in a 24-well plate
KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Ark: An Open-source Python-based Framework for Robot Learning
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
UniMate: A Unified Model for Mechanical Metamaterial Generation, Property Prediction, and Condition Confirmation
Learning to Skip the Middle Layers of Transformers
SAM4D: Segment Anything in Camera and LiDAR Streams
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
MADrive: Memory-Augmented Driving Scene Modeling
FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
WorldVLA: Towards Autoregressive Action World Model
ReCode: Updating Code API Knowledge with Reinforcement Learning
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning
MMSearch-R1: Incentivizing LMMs to Search
Calligrapher: Freestyle Text Image Customization
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy
From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
Zero-shot antibody design in a 24-well plate
KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Ark: An Open-source Python-based Framework for Robot Learning
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
UniMate: A Unified Model for Mechanical Metamaterial Generation, Property Prediction, and Condition Confirmation
Learning to Skip the Middle Layers of Transformers
SAM4D: Segment Anything in Camera and LiDAR Streams
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
MADrive: Memory-Augmented Driving Scene Modeling
FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
WorldVLA: Towards Autoregressive Action World Model
ReCode: Updating Code API Knowledge with Reinforcement Learning
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning
MMSearch-R1: Incentivizing LMMs to Search