Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

HunyuanOCR Technical Report































Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

HunyuanOCR Technical Report






























PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
Solving Spatial Supersensing Without Spatial Supersensing
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
O-Mem: Omni Memory System for Personalized, Long Horizon Self-Evolving Agents
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
SAM 3: Segment Anything with Concepts
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
SERES: Semantic-Aware Neural Reconstruction from Sparse Views
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
Nested Learning: The Illusion of Deep Learning Architectures
SAM 3D: 3Dfy Anything in Images
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
First Frame Is the Place to Go for Video Content Customization
Scaling Spatial Intelligence with Multimodal Foundation Models
Step-Audio-R1 Technical Report
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
Olmo 3
Early science acceleration experiments with GPT-5
Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset
VisPlay: Self-Evolving Vision-Language Models from Images
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
Solving Spatial Supersensing Without Spatial Supersensing
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
O-Mem: Omni Memory System for Personalized, Long Horizon Self-Evolving Agents
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
SAM 3: Segment Anything with Concepts
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
SERES: Semantic-Aware Neural Reconstruction from Sparse Views
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
Nested Learning: The Illusion of Deep Learning Architectures
SAM 3D: 3Dfy Anything in Images
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
First Frame Is the Place to Go for Video Content Customization
Scaling Spatial Intelligence with Multimodal Foundation Models
Step-Audio-R1 Technical Report
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
Olmo 3
Early science acceleration experiments with GPT-5
Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset
VisPlay: Self-Evolving Vision-Language Models from Images