Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs































InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs






























What matters when building vision-language models?
DDOS: The Drone Depth and Obstacle Segmentation Dataset
Deep learning-based framework for the on-demand inverse design of metamaterials with arbitrary target band gap
PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
PlayerOne: Egocentric World Simulator
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Efficient Machine Learning Force Field for Large-Scale Molecular Simulations of Organic Systems
vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM
MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
Sequence Model Design for Code Completion in the Modern IDE
ACE-Step: A Step Towards Music Generation Foundation Model
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?
A Flexible and Secure Deployment Framework for Distributed Applications
Multimodal Pretraining and Generation for Recommendation: A Tutorial
A Theoretical Limit to Physicalism: A Non-Technical Explanation of the Gemini Theorem
EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens
Propagation dynamics of the circular Airy Gaussian vortex beams in the fractional nonlinear Schrödinger equation
VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron
Information quantity in a pixel of digital image
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Recognition of Handwritten Roman Script Using Tesseract Open Source OCR Engine
TimeSenCLIP: A Time Series Vision-Language Model for Remote Sensing
Learning Temporal Evolution of Spatial Dependence with Generalized Spatiotemporal Gaussian Process Models
What matters when building vision-language models?
DDOS: The Drone Depth and Obstacle Segmentation Dataset
Deep learning-based framework for the on-demand inverse design of metamaterials with arbitrary target band gap
PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
PlayerOne: Egocentric World Simulator
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Efficient Machine Learning Force Field for Large-Scale Molecular Simulations of Organic Systems
vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM
MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
Sequence Model Design for Code Completion in the Modern IDE
ACE-Step: A Step Towards Music Generation Foundation Model
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?
A Flexible and Secure Deployment Framework for Distributed Applications
Multimodal Pretraining and Generation for Recommendation: A Tutorial
A Theoretical Limit to Physicalism: A Non-Technical Explanation of the Gemini Theorem
EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens
Propagation dynamics of the circular Airy Gaussian vortex beams in the fractional nonlinear Schrödinger equation
VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron
Information quantity in a pixel of digital image
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Recognition of Handwritten Roman Script Using Tesseract Open Source OCR Engine
TimeSenCLIP: A Time Series Vision-Language Model for Remote Sensing
Learning Temporal Evolution of Spatial Dependence with Generalized Spatiotemporal Gaussian Process Models