Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Flow Map Distillation Without Data

HunyuanOCR Technical Report































Flow Map Distillation Without Data

HunyuanOCR Technical Report






























PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
Solving Spatial Supersensing Without Spatial Supersensing
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
SAM 3: Segment Anything with Concepts
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
SERES: Semantic-Aware Neural Reconstruction from Sparse Views
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
Nested Learning: The Illusion of Deep Learning Architectures
SAM 3D: 3Dfy Anything in Images
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
First Frame Is the Place to Go for Video Content Customization
Scaling Spatial Intelligence with Multimodal Foundation Models
Step-Audio-R1 Technical Report
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
Olmo 3
Early science acceleration experiments with GPT-5
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset
VisPlay: Self-Evolving Vision-Language Models from Images
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
Solving Spatial Supersensing Without Spatial Supersensing
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
SAM 3: Segment Anything with Concepts
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
SERES: Semantic-Aware Neural Reconstruction from Sparse Views
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
Nested Learning: The Illusion of Deep Learning Architectures
SAM 3D: 3Dfy Anything in Images
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
First Frame Is the Place to Go for Video Content Customization
Scaling Spatial Intelligence with Multimodal Foundation Models
Step-Audio-R1 Technical Report
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
Olmo 3
Early science acceleration experiments with GPT-5
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset
VisPlay: Self-Evolving Vision-Language Models from Images
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks