Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

DoPE: Denoising Rotary Position Embedding































WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

DoPE: Denoising Rotary Position Embedding






























BRFL: A Blockchain-based Byzantine-Robust Federated Learning Model
Multi-Granularity Distribution Modeling for Video Watch Time Prediction via Exponential-Gaussian Mixture Network
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO
Black-Box On-Policy Distillation of Large Language Models
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
Consensus Sampling for Safer Generative AI
Argus: Resilience-Oriented Safety Assurance Framework for End-to-End ADSs
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces
TiDAR: Think in Diffusion, Talk in Autoregression
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions
KLASS: KL-Guided Fast Inference in Masked Diffusion Models
Grounding Computer Use Agents on Human Demonstrations
Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora
Adaptive Multi-Agent Response Refinement in Conversational Systems
SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection
Efficient Approximation of Volterra Series for High-Dimensional Systems
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
The Station: An Open-World Environment for AI-Driven Discovery
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation
BRFL: A Blockchain-based Byzantine-Robust Federated Learning Model
Multi-Granularity Distribution Modeling for Video Watch Time Prediction via Exponential-Gaussian Mixture Network
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO
Black-Box On-Policy Distillation of Large Language Models
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
Consensus Sampling for Safer Generative AI
Argus: Resilience-Oriented Safety Assurance Framework for End-to-End ADSs
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces
TiDAR: Think in Diffusion, Talk in Autoregression
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions
KLASS: KL-Guided Fast Inference in Masked Diffusion Models
Grounding Computer Use Agents on Human Demonstrations
Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora
Adaptive Multi-Agent Response Refinement in Conversational Systems
SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection
Efficient Approximation of Volterra Series for High-Dimensional Systems
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
The Station: An Open-World Environment for AI-Driven Discovery
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation