HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance

Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance

Multi-Task Learning

Yang Zhang, Chenwei Wang, Ouyang Lu, et al.

SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM Evaluation

SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM Evaluation

Gayathri Saranathan, Cong Xu, Mahammad Parwez Alam, et al.

Mixture of Contexts for Long Video Generation

Video Generation

Shengqu Cai, Ceyuan Yang, Lvmin Zhang, et al.

MusicSwarm: Biologically Inspired Intelligence for Music Composition

Markus J. Buehler

LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications

Yujun Lin, Zhekai Zhang, Song Han

LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion
Transformers via Explicit Correspondence

Diffusion Model

Zixin Yin, Xili Dai, Duomin Wang, et al.

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based
Instruction Dataset Creation

Supervised Fine-Tuning

Retrieval-Augmented Generation

Iman Barati, Mostafa Amiri, Heshaam Faili

Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models

Pranav Pawar, Kavish Shah, Akshat Bhalani, et al.

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with
Realistic Layouts

Weipeng Zhong, Peizhou Cao, Yichen Jin, et al.

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Reinforcement Learning

Zhengxi Lu, Jiabo Ye, Fei Tang, et al.

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Video Understanding

Yang Zhou, Yifan Wang, Jianjun Zhou, et al.

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation

Yiqun Shen, Song Yuan, Zhengze Zhang, et al.

World Modeling with Probabilistic Structure Integration

Video Understanding

Klemen Kotar, Wanhee Lee, Rahul Venkatesh, et al.

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

Jun Zhan, Mingyang Han, Yuxuan Xie, et al.

HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented
Generation for Multi-hop Question Answering

Retrieval-Augmented Generation

Duolin Sun, Dan Yang, Yue Shen, et al.

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Diffusion Model

Image Generation

Tao Han, Wanghan Xu, Junchao Gong, et al.

X-Part: high fidelity and structure coherent shape decomposition

Semantic Segmentation

Xinhao Yan, Jiachen Xu, Yang Li, et al.

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Akshit Sinha, Arvindh Arun, Shashwat Goel, et al.

IntrEx: A Dataset for Modeling Engagement in Educational Conversations

Xingwei Tan, Mahathi Parvatham, Chiara Gambi, et al.

Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning

Retrieval-Augmented Generation

Junnan Dong, Siyu An, Yifei Yu, et al.

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

3D Machine Vision

Multimodal Representation

Yue Li, Qi Ma, Runyi Yang, et al.

Virtual Agent Economies

Preference Modeling

Nenad Tomasev, Matija Franklin, Joel Z. Leibo, et al.

Towards Understanding Visual Grounding in Visual Language Models

Multimodal Representation

Georgios Pantazopoulos, Eda B. Özyiğit

Multimodal Representation

Yikang Ding, Jiwen Liu, Wenyuan Zhang, et al.

MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

Machine Learning

Haoyu Dong, Pengkun Zhang, Mingzhe Lu, et al.

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for
Speech-to-Speech LLMs

Yuhao Zhang, Yuhao Du, Zhanchen Dai, et al.

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Reinforcement Learning

Supervised Fine-Tuning

Haozhan Li, Yuxin Zuo, Jiale Yu, et al.

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action
Model

Yihao Wang, Pengxiang Ding, Lingxiao Li, et al.

scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

Ping Xu, Zhiyuan Ning, Pengjiang Li, et al.

ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

Intelligent Question Answering

Zirui Tang, Boyu Niu, Xuanhe Zhou, et al.

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Mengdi Jia, Zekun Qi, Shaochen Zhang, et al.

Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games

Preference Modeling

Crystal Qian, Kehang Zhu, John Horton, et al.

Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance

Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance

Multi-Task Learning

Yang Zhang, Chenwei Wang, Ouyang Lu, et al.

SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM Evaluation

SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM Evaluation

Gayathri Saranathan, Cong Xu, Mahammad Parwez Alam, et al.

Mixture of Contexts for Long Video Generation

Video Generation

Shengqu Cai, Ceyuan Yang, Lvmin Zhang, et al.

MusicSwarm: Biologically Inspired Intelligence for Music Composition

Markus J. Buehler

LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications

Yujun Lin, Zhekai Zhang, Song Han

LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion
Transformers via Explicit Correspondence

Diffusion Model

Zixin Yin, Xili Dai, Duomin Wang, et al.

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based
Instruction Dataset Creation

Supervised Fine-Tuning

Retrieval-Augmented Generation

Iman Barati, Mostafa Amiri, Heshaam Faili

Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models

Pranav Pawar, Kavish Shah, Akshat Bhalani, et al.

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with
Realistic Layouts

Weipeng Zhong, Peizhou Cao, Yichen Jin, et al.

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Reinforcement Learning

Zhengxi Lu, Jiabo Ye, Fei Tang, et al.

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Video Understanding

Yang Zhou, Yifan Wang, Jianjun Zhou, et al.

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation

Yiqun Shen, Song Yuan, Zhengze Zhang, et al.

World Modeling with Probabilistic Structure Integration

Video Understanding

Klemen Kotar, Wanhee Lee, Rahul Venkatesh, et al.

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

Jun Zhan, Mingyang Han, Yuxuan Xie, et al.

HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented
Generation for Multi-hop Question Answering

Retrieval-Augmented Generation

Duolin Sun, Dan Yang, Yue Shen, et al.

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Diffusion Model

Image Generation

Tao Han, Wanghan Xu, Junchao Gong, et al.

X-Part: high fidelity and structure coherent shape decomposition

Semantic Segmentation

Xinhao Yan, Jiachen Xu, Yang Li, et al.

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Akshit Sinha, Arvindh Arun, Shashwat Goel, et al.

IntrEx: A Dataset for Modeling Engagement in Educational Conversations

Xingwei Tan, Mahathi Parvatham, Chiara Gambi, et al.

Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning

Retrieval-Augmented Generation

Junnan Dong, Siyu An, Yifei Yu, et al.

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

3D Machine Vision

Multimodal Representation

Yue Li, Qi Ma, Runyi Yang, et al.

Virtual Agent Economies

Preference Modeling

Nenad Tomasev, Matija Franklin, Joel Z. Leibo, et al.

Towards Understanding Visual Grounding in Visual Language Models

Multimodal Representation

Georgios Pantazopoulos, Eda B. Özyiğit

Multimodal Representation

Yikang Ding, Jiwen Liu, Wenyuan Zhang, et al.

MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

Machine Learning

Haoyu Dong, Pengkun Zhang, Mingzhe Lu, et al.

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for
Speech-to-Speech LLMs

Yuhao Zhang, Yuhao Du, Zhanchen Dai, et al.

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Reinforcement Learning

Supervised Fine-Tuning

Haozhan Li, Yuxin Zuo, Jiale Yu, et al.

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action
Model

Yihao Wang, Pengxiang Ding, Lingxiao Li, et al.

scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

Ping Xu, Zhiyuan Ning, Pengjiang Li, et al.

ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

Intelligent Question Answering

Zirui Tang, Boyu Niu, Xuanhe Zhou, et al.

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Mengdi Jia, Zekun Qi, Shaochen Zhang, et al.

Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games

Preference Modeling

Crystal Qian, Kehang Zhu, John Horton, et al.

Mixture of Contexts for Long Video Generation

MusicSwarm: Biologically Inspired Intelligence for Music Composition

LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications

LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation

Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation

World Modeling with Probabilistic Structure Integration

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

X-Part: high fidelity and structure coherent shape decomposition

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

IntrEx: A Dataset for Modeling Engagement in Educational Conversations

Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Virtual Agent Economies

Towards Understanding Visual Grounding in Visual Language Models

Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games

Mixture of Contexts for Long Video Generation

MusicSwarm: Biologically Inspired Intelligence for Music Composition

LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications

LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation

Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation

World Modeling with Probabilistic Structure Integration

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

X-Part: high fidelity and structure coherent shape decomposition

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

IntrEx: A Dataset for Modeling Engagement in Educational Conversations

Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Virtual Agent Economies

Towards Understanding Visual Grounding in Visual Language Models

Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games