Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Kwai Keye-VL Technical Report

A Survey on Vision-Language-Action Models for Autonomous Driving































Kwai Keye-VL Technical Report

A Survey on Vision-Language-Action Models for Autonomous Driving






























MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Holistic Artificial Intelligence in Medicine; improved performance and explainability
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Listener-Rewarded Thinking in VLMs for Image Preferences
Calligrapher: Freestyle Text Image Customization
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy
From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
Zero-shot antibody design in a 24-well plate
KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Ark: An Open-source Python-based Framework for Robot Learning
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
UniMate: A Unified Model for Mechanical Metamaterial Generation, Property Prediction, and Condition Confirmation
Learning to Skip the Middle Layers of Transformers
SAM4D: Segment Anything in Camera and LiDAR Streams
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Holistic Artificial Intelligence in Medicine; improved performance and explainability
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Listener-Rewarded Thinking in VLMs for Image Preferences
Calligrapher: Freestyle Text Image Customization
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy
From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
Zero-shot antibody design in a 24-well plate
KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Ark: An Open-source Python-based Framework for Robot Learning
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
UniMate: A Unified Model for Mechanical Metamaterial Generation, Property Prediction, and Condition Confirmation
Learning to Skip the Middle Layers of Transformers
SAM4D: Segment Anything in Camera and LiDAR Streams
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language