HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

InternLM-XComposer-2.5: A Versatile Large Vision Language Model
Supporting Long-Contextual Input and Output

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Multimodal Representation

Pan Zhang, Xiaoyi Dong, Yuhang Zang, et al.

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
Instruction-Tuning Dataset for LVLMs

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Ziyu Liu, Tao Chu, Yuhang Zang, et al.

What matters when building vision-language models?

Hugo Laurençon, Léo Tronchon, Matthieu Cord, et al.

DDOS: The Drone Depth and Obstacle Segmentation Dataset

Depth Estimation

Semantic Segmentation

Benedikt Kolbeinsson, Krystian Mikolajczyk

Deep learning-based framework for the on-demand inverse design of metamaterials with arbitrary target band gap

Convolutional Neural Network

Than V. Tran, S. S. Nanthakumar, Xiaoying Zhuang

PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking

Preference Modeling

Reinforcement Learning

Markus J. Buehler

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling
Paradigms for Text-to-Music Generation

Diffusion Model

Tal, Or, Kreuk, et al.

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Natural Language Processing

Gao, Yizhao, Guo, et al.

PlayerOne: Egocentric World Simulator

Video Generation

Yuanpeng Tu, Hao Luo, Xi Chen, et al.

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Zhenran Xu, Yiyu Wang, Xue Yang, et al.

Autoregressive Adversarial Post-Training for Real-Time Interactive Video
Generation

Video Generation

Diffusion Model

Shanchuan Lin, Ceyuan Yang, Hao He, et al.

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Reinforcement Learning

Supervised Fine-Tuning

Li, Pengyi, Skripkin, et al.

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Audio and Speech Processing

Siyang Wang, Gustav Eje Henter, Joakim Gustafson, et al.

Efficient Machine Learning Force Field for Large-Scale Molecular Simulations of Organic Systems

Junbao Hu, Liyang Zhou, Jian Jiang

vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

Ching-Yun Ko, Pin-Yu Chen

MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

Medical Imaging

Miguel Díaz Benito, Cecilia Diana-Albelda, Álvaro García-Martín, et al.

ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation

Image Generation

Oucheng Huang, Yuhang Ma, Zeng Zhao, et al.

Sequence Model Design for Code Completion in the Modern IDE

Code Generation

ACE-Step: A Step Towards Music Generation Foundation Model

Diffusion Model

Junmin Gong, Sean Zhao, Sen Wang, et al.

Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model

Kenichi Fujita, Takanori Ashihara, Hiroki Kanagawa, et al.

Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?

Mike Thelwall, Ehsan Mohammadi

A Flexible and Secure Deployment Framework for Distributed Applications

Alan Dearle, Graham Kirby, Andrew McCarthy, et al.

Multimodal Pretraining and Generation for Recommendation: A Tutorial

Jieming Zhu, Rui Zhang, Chuhan Wu, et al.

A Theoretical Limit to Physicalism: A Non-Technical Explanation of the Gemini Theorem

Catherine M Reason

EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens

Audio and Speech Processing

Joonyong Park, Kenichi Nakamura

Propagation dynamics of the circular Airy Gaussian vortex beams in the fractional nonlinear Schrödinger equation

Shangling He, Kangzhu Zhou, Xi Peng, et al.

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron

M. Hutchinson, M. Widom

Information quantity in a pixel of digital image

Image Processing

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Jingyuan Qi, Zhiyang Xu, Qifan Wang, et al.

Recognition of Handwritten Roman Script Using Tesseract Open Source OCR Engine

Image Recognition

Sandip Rakshit, Subhadip Basu

TimeSenCLIP: A Time Series Vision-Language Model for Remote Sensing

Multimodal Representation

Pallavi Jain, Diego Marcos, Dino Ienco, et al.

Learning Temporal Evolution of Spatial Dependence with Generalized Spatiotemporal Gaussian Process Models

Diffusion Model

InternLM-XComposer-2.5: A Versatile Large Vision Language Model
Supporting Long-Contextual Input and Output

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Multimodal Representation

Pan Zhang, Xiaoyi Dong, Yuhang Zang, et al.

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
Instruction-Tuning Dataset for LVLMs

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Ziyu Liu, Tao Chu, Yuhang Zang, et al.

What matters when building vision-language models?

Hugo Laurençon, Léo Tronchon, Matthieu Cord, et al.

DDOS: The Drone Depth and Obstacle Segmentation Dataset

Depth Estimation

Semantic Segmentation

Benedikt Kolbeinsson, Krystian Mikolajczyk

Deep learning-based framework for the on-demand inverse design of metamaterials with arbitrary target band gap

Convolutional Neural Network

Than V. Tran, S. S. Nanthakumar, Xiaoying Zhuang

PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking

Preference Modeling

Reinforcement Learning

Markus J. Buehler

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling
Paradigms for Text-to-Music Generation

Diffusion Model

Tal, Or, Kreuk, et al.

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Natural Language Processing

Gao, Yizhao, Guo, et al.

PlayerOne: Egocentric World Simulator

Video Generation

Yuanpeng Tu, Hao Luo, Xi Chen, et al.

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Zhenran Xu, Yiyu Wang, Xue Yang, et al.

Autoregressive Adversarial Post-Training for Real-Time Interactive Video
Generation

Video Generation

Diffusion Model

Shanchuan Lin, Ceyuan Yang, Hao He, et al.

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Reinforcement Learning

Supervised Fine-Tuning

Li, Pengyi, Skripkin, et al.

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Audio and Speech Processing

Siyang Wang, Gustav Eje Henter, Joakim Gustafson, et al.

Efficient Machine Learning Force Field for Large-Scale Molecular Simulations of Organic Systems

Junbao Hu, Liyang Zhou, Jian Jiang

vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

Ching-Yun Ko, Pin-Yu Chen

MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

Medical Imaging

Miguel Díaz Benito, Cecilia Diana-Albelda, Álvaro García-Martín, et al.

ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation

Image Generation

Oucheng Huang, Yuhang Ma, Zeng Zhao, et al.

Sequence Model Design for Code Completion in the Modern IDE

Code Generation

ACE-Step: A Step Towards Music Generation Foundation Model

Diffusion Model

Junmin Gong, Sean Zhao, Sen Wang, et al.

Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model

Kenichi Fujita, Takanori Ashihara, Hiroki Kanagawa, et al.

Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?

Mike Thelwall, Ehsan Mohammadi

A Flexible and Secure Deployment Framework for Distributed Applications

Alan Dearle, Graham Kirby, Andrew McCarthy, et al.

Multimodal Pretraining and Generation for Recommendation: A Tutorial

Jieming Zhu, Rui Zhang, Chuhan Wu, et al.

A Theoretical Limit to Physicalism: A Non-Technical Explanation of the Gemini Theorem

Catherine M Reason

EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens

Audio and Speech Processing

Joonyong Park, Kenichi Nakamura

Propagation dynamics of the circular Airy Gaussian vortex beams in the fractional nonlinear Schrödinger equation

Shangling He, Kangzhu Zhou, Xi Peng, et al.

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron

M. Hutchinson, M. Widom

Information quantity in a pixel of digital image

Image Processing

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Jingyuan Qi, Zhiyang Xu, Qifan Wang, et al.

Recognition of Handwritten Roman Script Using Tesseract Open Source OCR Engine

Image Recognition

Sandip Rakshit, Subhadip Basu

TimeSenCLIP: A Time Series Vision-Language Model for Remote Sensing

Multimodal Representation

Pallavi Jain, Diego Marcos, Dino Ienco, et al.

Learning Temporal Evolution of Spatial Dependence with Generalized Spatiotemporal Gaussian Process Models

Diffusion Model

What matters when building vision-language models?

DDOS: The Drone Depth and Obstacle Segmentation Dataset

Deep learning-based framework for the on-demand inverse design of metamaterials with arbitrary target band gap

PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

PlayerOne: Egocentric World Simulator

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Efficient Machine Learning Force Field for Large-Scale Molecular Simulations of Organic Systems

vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation

Sequence Model Design for Code Completion in the Modern IDE

ACE-Step: A Step Towards Music Generation Foundation Model

Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model

Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?

A Flexible and Secure Deployment Framework for Distributed Applications

Multimodal Pretraining and Generation for Recommendation: A Tutorial

A Theoretical Limit to Physicalism: A Non-Technical Explanation of the Gemini Theorem

EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens

Propagation dynamics of the circular Airy Gaussian vortex beams in the fractional nonlinear Schrödinger equation

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron

Information quantity in a pixel of digital image

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Recognition of Handwritten Roman Script Using Tesseract Open Source OCR Engine

TimeSenCLIP: A Time Series Vision-Language Model for Remote Sensing

Learning Temporal Evolution of Spatial Dependence with Generalized Spatiotemporal Gaussian Process Models

What matters when building vision-language models?

DDOS: The Drone Depth and Obstacle Segmentation Dataset

Deep learning-based framework for the on-demand inverse design of metamaterials with arbitrary target band gap

PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

PlayerOne: Egocentric World Simulator

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Efficient Machine Learning Force Field for Large-Scale Molecular Simulations of Organic Systems

vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation

Sequence Model Design for Code Completion in the Modern IDE

ACE-Step: A Step Towards Music Generation Foundation Model

Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model

Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?

A Flexible and Secure Deployment Framework for Distributed Applications

Multimodal Pretraining and Generation for Recommendation: A Tutorial

A Theoretical Limit to Physicalism: A Non-Technical Explanation of the Gemini Theorem

EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens

Propagation dynamics of the circular Airy Gaussian vortex beams in the fractional nonlinear Schrödinger equation

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron

Information quantity in a pixel of digital image

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Recognition of Handwritten Roman Script Using Tesseract Open Source OCR Engine

TimeSenCLIP: A Time Series Vision-Language Model for Remote Sensing

Learning Temporal Evolution of Spatial Dependence with Generalized Spatiotemporal Gaussian Process Models