Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Beyond Transcription: Mechanistic Interpretability in ASR































Self-Rewarding Vision-Language Model via Reasoning Decomposition

Beyond Transcription: Mechanistic Interpretability in ASR






























CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
WebSight: A Vision-First Architecture for Robust Web Agents
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Hermes 4 Technical Report
OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
Understanding Tool-Integrated Reasoning
Spacer: Towards Engineered Scientific Inspiration
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
VibeVoice Technical Report
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
MV-RAG: Retrieval Augmented Multiview Diffusion
Connecting metal-organic framework synthesis to applications using multimodal machine learning
Model Context Protocols in Adaptive Transport Systems: A Survey
Algorithmic Collective Action with Multiple Collectives
OpenCUA: Open Foundations for Computer-Use Agents
Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
Selective Contrastive Learning for Weakly Supervised Affordance Grounding
EgoTwin: Dreaming Body and View in First Person
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Constraints-Guided Diffusion Reasoner for Neuro-Symbolic Learning
LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
WebSight: A Vision-First Architecture for Robust Web Agents
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Hermes 4 Technical Report
OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
Understanding Tool-Integrated Reasoning
Spacer: Towards Engineered Scientific Inspiration
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
VibeVoice Technical Report
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
MV-RAG: Retrieval Augmented Multiview Diffusion
Connecting metal-organic framework synthesis to applications using multimodal machine learning
Model Context Protocols in Adaptive Transport Systems: A Survey
Algorithmic Collective Action with Multiple Collectives
OpenCUA: Open Foundations for Computer-Use Agents
Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
Selective Contrastive Learning for Weakly Supervised Affordance Grounding
EgoTwin: Dreaming Body and View in First Person
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Constraints-Guided Diffusion Reasoner for Neuro-Symbolic Learning
LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass