Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing

DINOv3

SSRL: Self-Search Reinforcement Learning

Thyme: Think Beyond Images

Grounding Multilingual Multimodal LLMs With Cultural Knowledge

HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation

Puppeteer: Rig and Animate Your 3D Models

STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving

Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Llama-Nemotron: Efficient Reasoning Models

Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

Virtual staining of label-free tissue in imaging mass spectrometry

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing

DINOv3

SSRL: Self-Search Reinforcement Learning

Thyme: Think Beyond Images

Grounding Multilingual Multimodal LLMs With Cultural Knowledge

HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation

Puppeteer: Rig and Animate Your 3D Models

STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving

Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Llama-Nemotron: Efficient Reasoning Models

Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

Virtual staining of label-free tissue in imaging mass spectrometry

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models