Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows































ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows






























The Era of Agentic Organization: Learning to Organize with Language Models
SPICE: Self-Play In Corpus Environments Improves Reasoning
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents
Exploring Conditions for Diffusion models in Robotic Control
Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games
Kimi Linear: An Expressive, Efficient Attention Architecture
Emu3.5: Native Multimodal Models are World Learners
The End of Manual Decoding: Towards Truly End-to-End Language Models
Human-AI Complementarity: A Goal for Amplified Oversight
GPTOpt: Towards Efficient LLM-Based Black-Box Optimization
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
Reasoning-Aware GRPO using Process Mining
Scaling Latent Reasoning via Looped Language Models
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales
Uniform Discrete Diffusion with Metric Path for Video Generation
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Tongyi DeepResearch Technical Report
InteractComp: Evaluating Search Agents With Ambiguous Queries
VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPT
TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting
FARMER: Flow AutoRegressive Transformer over Pixels
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
The Era of Agentic Organization: Learning to Organize with Language Models
SPICE: Self-Play In Corpus Environments Improves Reasoning
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents
Exploring Conditions for Diffusion models in Robotic Control
Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games
Kimi Linear: An Expressive, Efficient Attention Architecture
Emu3.5: Native Multimodal Models are World Learners
The End of Manual Decoding: Towards Truly End-to-End Language Models
Human-AI Complementarity: A Goal for Amplified Oversight
GPTOpt: Towards Efficient LLM-Based Black-Box Optimization
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
Reasoning-Aware GRPO using Process Mining
Scaling Latent Reasoning via Looped Language Models
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales
Uniform Discrete Diffusion with Metric Path for Video Generation
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Tongyi DeepResearch Technical Report
InteractComp: Evaluating Search Agents With Ambiguous Queries
VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPT
TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting
FARMER: Flow AutoRegressive Transformer over Pixels
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?