Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

EpochX: Building the Infrastructure for an Emergent Agent Civilization































On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

EpochX: Building the Infrastructure for an Emergent Agent Civilization






























TAPS: Task Aware Proposal Distributions for Speculative Sampling
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments
World Reasoning Arena
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
Voxtral TTS
RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
PixelSmile: Toward Fine-Grained Facial Expression Editing
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models
PEARL: Personalized Streaming Video Understanding Model
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG
TAPS: Task Aware Proposal Distributions for Speculative Sampling
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments
World Reasoning Arena
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
Voxtral TTS
RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
PixelSmile: Toward Fine-Grained Facial Expression Editing
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models
PEARL: Personalized Streaming Video Understanding Model
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG