Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

NVILA: Efficient Frontier Visual Language Models































Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

NVILA: Efficient Frontier Visual Language Models






























Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Baichuan-Omni Technical Report
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Emu3: Next-Token Prediction is All You Need
CogVLM2: Visual Language Models for Image and Video Understanding
Qwen2 Technical Report
An Image is Worth 32 Tokens for Reconstruction and Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
FIFO-Diffusion: Generating Infinite Videos from Text without Training
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
OmniFusion Technical Report
Machine learning prediction errors better than DFT accuracy
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Representation Shift: Unifying Token Compression with FlashAttention
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
CellForge: Agentic Design of Virtual Cell Models
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Baichuan-Omni Technical Report
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Emu3: Next-Token Prediction is All You Need
CogVLM2: Visual Language Models for Image and Video Understanding
Qwen2 Technical Report
An Image is Worth 32 Tokens for Reconstruction and Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
FIFO-Diffusion: Generating Infinite Videos from Text without Training
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
OmniFusion Technical Report
Machine learning prediction errors better than DFT accuracy
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Representation Shift: Unifying Token Compression with FlashAttention
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
CellForge: Agentic Design of Virtual Cell Models
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization