Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

Meta-CoT: Enhancing Granularity and Generalization in Image Editing

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

Recursive Multi-Agent Systems

Skill Retrieval Augmentation for Agentic AI

SketchVLM: Vision language models can annotate images to explain thoughts and guide users

RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N Ranking

LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

Video Analysis and Generation via a Semantic Progress Function

SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

AgentSearchBench: A Benchmark for AI Agent Search in the Wild

FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing

LLM Safety From Within: Detecting Harmful Content with Internal Representations

DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Decoupled DiLoCo for Resilient Distributed Pre-training

EVENT TENSOR: A UNIFIED ABSTRACTION FOR COMPILING DYNAMIC MEGAKERNEL

Seeing Fast and Slow: Learning the Flow of Time in Videos

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

Meta-CoT: Enhancing Granularity and Generalization in Image Editing

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

Recursive Multi-Agent Systems

Skill Retrieval Augmentation for Agentic AI

SketchVLM: Vision language models can annotate images to explain thoughts and guide users

RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N Ranking

LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

Video Analysis and Generation via a Semantic Progress Function

SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

AgentSearchBench: A Benchmark for AI Agent Search in the Wild

FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing

LLM Safety From Within: Detecting Harmful Content with Internal Representations

DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Decoupled DiLoCo for Resilient Distributed Pre-training

EVENT TENSOR: A UNIFIED ABSTRACTION FOR COMPILING DYNAMIC MEGAKERNEL

Seeing Fast and Slow: Learning the Flow of Time in Videos

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics