Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Yume: An Interactive World Generation Model

Pixels, Patterns, but No Poetry: To See The World like Humans































Yume: An Interactive World Generation Model

Pixels, Patterns, but No Poetry: To See The World like Humans






























MedChatZH: a Better Medical Adviser Learns from Better Instructions
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Step-Audio 2 Technical Report
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
The Invisible Leash: Why RLVR May Not Escape Its Origin
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Design of intrinsically disordered region binding proteins
An All-Atom Generative Model for Designing Protein Complexes
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
PrefPalette: Personalized Preference Modeling with Latent Attributes
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
MedChatZH: a Better Medical Adviser Learns from Better Instructions
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Step-Audio 2 Technical Report
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
The Invisible Leash: Why RLVR May Not Escape Its Origin
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Design of intrinsically disordered region binding proteins
An All-Atom Generative Model for Designing Protein Complexes
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
PrefPalette: Personalized Preference Modeling with Latent Attributes
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner