Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture

Should We Still Pretrain Encoders with Masked Language Modeling?































4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture

Should We Still Pretrain Encoders with Masked Language Modeling?






























MemOS: A Memory OS for AI System
OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows
OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason
Establishing Best Practices for Building Rigorous Agentic Benchmarks
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages
DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making
Energy-Based Transformers are Scalable Learners and Thinkers
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction
Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
WebSailor: Navigating Super-human Reasoning for Web Agent
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Depth Anything at Any Condition
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Kwai Keye-VL Technical Report
A Survey on Vision-Language-Action Models for Autonomous Driving
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Holistic Artificial Intelligence in Medicine; improved performance and explainability
MemOS: A Memory OS for AI System
OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows
OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason
Establishing Best Practices for Building Rigorous Agentic Benchmarks
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages
DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making
Energy-Based Transformers are Scalable Learners and Thinkers
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction
Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
WebSailor: Navigating Super-human Reasoning for Web Agent
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Depth Anything at Any Condition
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Kwai Keye-VL Technical Report
A Survey on Vision-Language-Action Models for Autonomous Driving
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Holistic Artificial Intelligence in Medicine; improved performance and explainability