Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Memory-QA: Answering Recall Questions Based on Multimodal Memories

MAPO: Mixed Advantage Policy Optimization































Memory-QA: Answering Recall Questions Based on Multimodal Memories

MAPO: Mixed Advantage Policy Optimization






























Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation
Reinforcement Learning on Pre-Training Data
Do You Need Proprioceptive States in Visuomotor Policies?
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
GenExam: A Multidisciplinary Text-to-Image Exam
Nav-R1: Reasoning and Navigation in Embodied Scenes
MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE
ARE: Scaling Up Agent Environments and Evaluations
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models
LIMI: Less is More for Agency
A Modular Fusion Neural Network Approach to Efficiently Predict Multi-Metal Binding Sites in Protein Sequences
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
A Multi-Scale Graph Neural Process with Cross-Drug Co-Attention for Drug-Drug Interactions Prediction
GenCAD-3D: CAD Program Generation using Multimodal Latent Space Alignment and Synthetic Dataset Balancing
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Lynx: Towards High-Fidelity Personalized Video Generation
SPATIALGEN: Layout-guided 3D Indoor Scene Generation
BaseReward: A Strong Baseline for Multimodal Reward Model
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Oyster-I: Beyond Refusal - Constructive Safety Alignment for Responsible Language Models
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Synthetic bootstrapped pretraining
Skilful global seasonal predictions from a machine learning weather model trained on reanalysis data
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation
Reinforcement Learning on Pre-Training Data
Do You Need Proprioceptive States in Visuomotor Policies?
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
GenExam: A Multidisciplinary Text-to-Image Exam
Nav-R1: Reasoning and Navigation in Embodied Scenes
MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE
ARE: Scaling Up Agent Environments and Evaluations
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models
LIMI: Less is More for Agency
A Modular Fusion Neural Network Approach to Efficiently Predict Multi-Metal Binding Sites in Protein Sequences
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
A Multi-Scale Graph Neural Process with Cross-Drug Co-Attention for Drug-Drug Interactions Prediction
GenCAD-3D: CAD Program Generation using Multimodal Latent Space Alignment and Synthetic Dataset Balancing
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Lynx: Towards High-Fidelity Personalized Video Generation
SPATIALGEN: Layout-guided 3D Indoor Scene Generation
BaseReward: A Strong Baseline for Multimodal Reward Model
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Oyster-I: Beyond Refusal - Constructive Safety Alignment for Responsible Language Models
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Synthetic bootstrapped pretraining
Skilful global seasonal predictions from a machine learning weather model trained on reanalysis data
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning