Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs































Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs






























Urban Socio-Semantic Segmentation with Vision-Language Reasoning
STEP3-VL-10B Technical Report
SeedFold: Scaling Biomolecular Structure Prediction
TranslateGemma Technical Report
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL
A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation
Controlled Self-Evolution for Algorithmic Code Optimization
MAXS: Meta-Adaptive Exploration with LLM Agents
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
The motivic class of the space of genus 0 maps to the flag variety
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
How Do Large Language Models Learn Concepts During Continual Pre-Training?
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Motion Attribution for Video Generation
VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory
Ministral 3
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
Learning Latent Action World Models In The Wild
Dr. Zero: Self-Evolving Search Agents without Training Data
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
STEP3-VL-10B Technical Report
SeedFold: Scaling Biomolecular Structure Prediction
TranslateGemma Technical Report
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL
A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation
Controlled Self-Evolution for Algorithmic Code Optimization
MAXS: Meta-Adaptive Exploration with LLM Agents
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
The motivic class of the space of genus 0 maps to the flag variety
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
How Do Large Language Models Learn Concepts During Continual Pre-Training?
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Motion Attribution for Video Generation
VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory
Ministral 3
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
Learning Latent Action World Models In The Wild
Dr. Zero: Self-Evolving Search Agents without Training Data
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning