Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution































GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution






























Multimodal Referring Segmentation: A Survey
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
SWE-Exp: Experience-Driven Software Issue Resolution
PixNerd: Pixel Neural Field Diffusion
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
Co-Producing AI: Toward an Augmented, Participatory Lifecycle
iLRM: An Iterative Large 3D Reconstruction Model
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
RecGPT Technical Report
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery
Less is More for Synthetic Speech Detection in the Wild
Solution-aware vs global ReLU selection: partial MILP strikes back for DNN verification
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
BANG: Dividing 3D Assets via Generative Exploded Dynamics
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data
Multimodal Referring Segmentation: A Survey
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
SWE-Exp: Experience-Driven Software Issue Resolution
PixNerd: Pixel Neural Field Diffusion
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
Co-Producing AI: Toward an Augmented, Participatory Lifecycle
iLRM: An Iterative Large 3D Reconstruction Model
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
RecGPT Technical Report
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery
Less is More for Synthetic Speech Detection in the Wild
Solution-aware vs global ReLU selection: partial MILP strikes back for DNN verification
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
BANG: Dividing 3D Assets via Generative Exploded Dynamics
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data