Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

MADrive: Memory-Augmented Driving Scene Modeling

FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing































MADrive: Memory-Augmented Driving Scene Modeling

FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing






























Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
WorldVLA: Towards Autoregressive Action World Model
ReCode: Updating Code API Knowledge with Reinforcement Learning
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning
MMSearch-R1: Incentivizing LMMs to Search
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
EcoMapper: Generative Modeling for Climate-Aware Satellite Imagery
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Matrix-Game: Interactive World Foundation Model
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
Learning Approach to Efficient Vision-based Active Tracking of a Flying Target by an Unmanned Aerial Vehicle
TritonZ: A Remotely Operated Underwater Rover with Manipulator Arm for Exploration and Rescue Operations
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset
RLPR: Extrapolating RLVR to General Domains without Verifiers
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Light of Normals: Unified Feature Representation for Universal Photometric Stereo
Predicting cellular responses to perturbation across diverse contexts with State
CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity
Optimizing Multilingual Text-To-Speech with Accents & Emotions
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
WorldVLA: Towards Autoregressive Action World Model
ReCode: Updating Code API Knowledge with Reinforcement Learning
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning
MMSearch-R1: Incentivizing LMMs to Search
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
EcoMapper: Generative Modeling for Climate-Aware Satellite Imagery
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Matrix-Game: Interactive World Foundation Model
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
Learning Approach to Efficient Vision-based Active Tracking of a Flying Target by an Unmanned Aerial Vehicle
TritonZ: A Remotely Operated Underwater Rover with Manipulator Arm for Exploration and Rescue Operations
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset
RLPR: Extrapolating RLVR to General Domains without Verifiers
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Light of Normals: Unified Feature Representation for Universal Photometric Stereo
Predicting cellular responses to perturbation across diverse contexts with State
CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity
Optimizing Multilingual Text-To-Speech with Accents & Emotions
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding