Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

VeriGUI: Verifiable Long-Chain GUI Dataset































Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

VeriGUI: Verifiable Long-Chain GUI Dataset






























Qwen2.5-VL Technical Report
The GAN is dead; long live the GAN! A Modern GAN Baseline
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
NVILA: Efficient Frontier Visual Language Models
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Baichuan-Omni Technical Report
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Emu3: Next-Token Prediction is All You Need
CogVLM2: Visual Language Models for Image and Video Understanding
Qwen2 Technical Report
An Image is Worth 32 Tokens for Reconstruction and Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
FIFO-Diffusion: Generating Infinite Videos from Text without Training
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
OmniFusion Technical Report
Machine learning prediction errors better than DFT accuracy
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Representation Shift: Unifying Token Compression with FlashAttention
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search
Qwen2.5-VL Technical Report
The GAN is dead; long live the GAN! A Modern GAN Baseline
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
NVILA: Efficient Frontier Visual Language Models
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Baichuan-Omni Technical Report
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Emu3: Next-Token Prediction is All You Need
CogVLM2: Visual Language Models for Image and Video Understanding
Qwen2 Technical Report
An Image is Worth 32 Tokens for Reconstruction and Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
FIFO-Diffusion: Generating Infinite Videos from Text without Training
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
OmniFusion Technical Report
Machine learning prediction errors better than DFT accuracy
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Representation Shift: Unifying Token Compression with FlashAttention
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search