Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention































StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention






























SimpleFold: Folding Proteins is Simpler than You Think
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion
Generalizable Geometric Image Caption Synthesis
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
Estimating the Empowerment of Language Model Agents
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Variational Reasoning for Language Models
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Quantile Advantage Estimation for Entropy-Safe Reasoning
LongLive: Real-time Interactive Long Video Generation
Combinatorial Creativity: A New Frontier in Generalization Abilities
Causal Spatio-Temporal Prediction: An Effective and Efficient Multi-Modal Approach
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Tree Search for LLM Agent Reinforcement Learning
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
BRISC: Annotated Dataset for Brain Tumor Segmentation and Classification with Swin-HAFNet
EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models
FDABench: A Benchmark for Data Agents on Analytical Queries over Heterogeneous Data
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
UniVerse-1: Unified Audio-Video Generation via Stitching of Experts
How Good are Foundation Models in Step-by-Step Embodied Reasoning?
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
SAGE: A Realistic Benchmark for Semantic Understanding
WAVECLIP: Wavelet Tokenization for Adaptive-Resolution CLIP
EmbeddingGemma: Powerful and Lightweight Text Representations
SimpleFold: Folding Proteins is Simpler than You Think
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion
Generalizable Geometric Image Caption Synthesis
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
Estimating the Empowerment of Language Model Agents
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Variational Reasoning for Language Models
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Quantile Advantage Estimation for Entropy-Safe Reasoning
LongLive: Real-time Interactive Long Video Generation
Combinatorial Creativity: A New Frontier in Generalization Abilities
Causal Spatio-Temporal Prediction: An Effective and Efficient Multi-Modal Approach
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Tree Search for LLM Agent Reinforcement Learning
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
BRISC: Annotated Dataset for Brain Tumor Segmentation and Classification with Swin-HAFNet
EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models
FDABench: A Benchmark for Data Agents on Analytical Queries over Heterogeneous Data
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
UniVerse-1: Unified Audio-Video Generation via Stitching of Experts
How Good are Foundation Models in Step-by-Step Embodied Reasoning?
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
SAGE: A Realistic Benchmark for Semantic Understanding
WAVECLIP: Wavelet Tokenization for Adaptive-Resolution CLIP
EmbeddingGemma: Powerful and Lightweight Text Representations