Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

The User-Centric Geo-Experience: An LLM-Powered Framework for Enhanced Planning, Navigation, and Dynamic Adaptation

PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

SingLoRA: Low Rank Adaptation Using a Single Matrix

A Survey on Latent Reasoning

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning

MedGemma Technical Report

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

Pre-Trained Policy Discriminators are General Reward Models

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture

Should We Still Pretrain Encoders with Masked Language Modeling?

MemOS: A Memory OS for AI System

OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows

OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason

Establishing Best Practices for Building Rigorous Agentic Benchmarks

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages

DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making

Energy-Based Transformers are Scalable Learners and Thinkers

IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

WebSailor: Navigating Super-human Reasoning for Web Agent

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

The User-Centric Geo-Experience: An LLM-Powered Framework for Enhanced Planning, Navigation, and Dynamic Adaptation

PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

SingLoRA: Low Rank Adaptation Using a Single Matrix

A Survey on Latent Reasoning

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning

MedGemma Technical Report

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

Pre-Trained Policy Discriminators are General Reward Models

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture

Should We Still Pretrain Encoders with Masked Language Modeling?

MemOS: A Memory OS for AI System

OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows

OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason

Establishing Best Practices for Building Rigorous Agentic Benchmarks

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages

DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making

Energy-Based Transformers are Scalable Learners and Thinkers

IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

WebSailor: Navigating Super-human Reasoning for Web Agent