Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering































3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering






























Training Software Engineering Agents and Verifiers with SWE-Gym
MAKIEVAL: A Multilingual Automatic WiKIdata-based Framework for Cultural Awareness Evaluation for LLMs
GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning
Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models
BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation
GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models
Code World Models for General Game Playing
Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
Playful Agentic Robot Learning
DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance
EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts
Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding
Reinforcing Dual-Path Reasoning in Spatial Vision Language Models
SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior
Kairos: A Native World Model Stack for Physical AI
Guava: An Effective and Universal Harness for Embodied Manipulation
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games
LifeSciBench: Evaluating Language Models on Realistic, Expert-Level Tasks in the Life Sciences
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs
LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling
Predicting LLM Safety Before Release by Simulating Deployment
Training Software Engineering Agents and Verifiers with SWE-Gym
MAKIEVAL: A Multilingual Automatic WiKIdata-based Framework for Cultural Awareness Evaluation for LLMs
GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning
Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models
BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation
GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models
Code World Models for General Game Playing
Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
Playful Agentic Robot Learning
DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance
EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts
Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding
Reinforcing Dual-Path Reasoning in Spatial Vision Language Models
SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior
Kairos: A Native World Model Stack for Physical AI
Guava: An Effective and Universal Harness for Embodied Manipulation
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games
LifeSciBench: Evaluating Language Models on Realistic, Expert-Level Tasks in the Life Sciences
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs
LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling
Predicting LLM Safety Before Release by Simulating Deployment