Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing































BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing






























DINOv3
SSRL: Self-Search Reinforcement Learning
Thyme: Think Beyond Images
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset
CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
Puppeteer: Rig and Animate Your 3D Models
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
Llama-Nemotron: Efficient Reasoning Models
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Virtual staining of label-free tissue in imaging mass spectrometry
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
CharacterShot: Controllable and Consistent 4D Character Animation
DINOv3
SSRL: Self-Search Reinforcement Learning
Thyme: Think Beyond Images
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset
CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
Puppeteer: Rig and Animate Your 3D Models
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
Llama-Nemotron: Efficient Reasoning Models
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Virtual staining of label-free tissue in imaging mass spectrometry
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
CharacterShot: Controllable and Consistent 4D Character Animation