Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection































HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection






























Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
Puppeteer: Rig and Animate Your 3D Models
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
Llama-Nemotron: Efficient Reasoning Models
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Virtual staining of label-free tissue in imaging mass spectrometry
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
CharacterShot: Controllable and Consistent 4D Character Animation
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
Matrix-3D: Omnidirectional Explorable 3D World Generation
WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent
Marco-Voice Technical Report
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
Puppeteer: Rig and Animate Your 3D Models
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
Llama-Nemotron: Efficient Reasoning Models
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Virtual staining of label-free tissue in imaging mass spectrometry
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
CharacterShot: Controllable and Consistent 4D Character Animation
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
Matrix-3D: Omnidirectional Explorable 3D World Generation
WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent
Marco-Voice Technical Report
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C