HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Reinforcement Learning

Tong Wei, Yijun Yang, Junliang Xing, et al.

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

Ranjun Xu, Yang Yan

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

Embodied Intelligence

Pingyue Zhang, Zihan Huang, Yue Wang, et al.

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Code Generation

Kangsan Kim, Minki Kang, Taeil Kim, et al.

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

Xiaomeng Hu, Yinger Zhang, Fei Huang, et al.

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

3D Machine Vision

Visual Question Answering

Dinging Li, Yingxiu Zhao, Xinrui Cheng, et al.

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Image Generation

Haozhe Wang, Cong Wei, Weiming Ren, et al.

Seedance 2.0: Advancing Video Generation for World Complexity

Video Generation

Team Seedance, De Chen, Liyang Chen, et al.

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Mingyu Ouyang, Siyuan Hu, Kevin Qinghong Lin, et al.

Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark

Image Generation

Ke Cao, Xuanhua He, Xueheng Li, et al.

ParseBench: A Document Parsing Benchmark for AI Agents

Document Understanding

Boyang Zhang, Sebastián G. Acosta, Preston Carlson, et al.

Memory Intelligence Agent

Jingyang Qiao, Weicheng Meng, Yu Cheng, et al.

PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE

Maximilian Idahl, Benedikt Droste, Björn Plüster, et al.

Internalized Reasoning for Long-Context Visual Document Understanding

Document Understanding

Visual Question Answering

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Amir Zandieh, Majid Daliri, Majid Hadian, et al.

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

Text Generation

Hippolyte Gisserot-Boukhlef, Nicolas Boizard, Emmanuel Malherbe, et al.

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Reinforcement Learning

Tianyi Wang, Yixia Li, Long Li, et al.

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Jiachen Zhu, Lingyu Yang, Rong Shan, et al.

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

Zeyue Tian, Binxin Yang, Zhaoyang Liu, et al.

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Yaxuan Li, Yuxin Zuo, Bingxiang He, et al.

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

Reinforcement Learning

Linhao Yu, Tianmeng Yang, Siyu Ding, et al.

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

Video Generation

Video Understanding

Luozheng Qin, Jia Gong, Qian Qiao, et al.

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Fei Tang, Zhiqiong Lu, Boxuan Zhang, et al.

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Zunhai Su, Hengyuan Zhang, Wei Wu, et al.

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Video Generation

Donghao Zhou, Guisheng Liu, Hao Yang, et al.

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Reinforcement Learning

Yang Liu, Enxi Wang, Yufei Gao, et al.

QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

Code Generation

Ali Slim, Haydar Hamieh, Jawad Kotaich, et al.

ELT: Elastic Looped Transformers for Visual Generation

Image Generation

Video Generation

Sahil Goyal, Swayam Agrawal, Gautham Govind, et al.

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

Diffusion Model

Text Generation

Lifeng Chen, Tianqi You, Hao Liu, et al.

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

Zile Wang, Zexiang Liu, Jaixing Li, et al.

EXAONE 4.5 Technical Report

Eunbi Choi, Kibong Choi, Sehyun Chun, et al.

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

Diffusion Model

Dewei Zhou, You Li, Zongxin Yang, et al.

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Reinforcement Learning

Tong Wei, Yijun Yang, Junliang Xing, et al.

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

Ranjun Xu, Yang Yan

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

Embodied Intelligence

Pingyue Zhang, Zihan Huang, Yue Wang, et al.

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Code Generation

Kangsan Kim, Minki Kang, Taeil Kim, et al.

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

Xiaomeng Hu, Yinger Zhang, Fei Huang, et al.

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

3D Machine Vision

Visual Question Answering

Dinging Li, Yingxiu Zhao, Xinrui Cheng, et al.

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Image Generation

Haozhe Wang, Cong Wei, Weiming Ren, et al.

Seedance 2.0: Advancing Video Generation for World Complexity

Video Generation

Team Seedance, De Chen, Liyang Chen, et al.

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Mingyu Ouyang, Siyuan Hu, Kevin Qinghong Lin, et al.

Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark

Image Generation

Ke Cao, Xuanhua He, Xueheng Li, et al.

ParseBench: A Document Parsing Benchmark for AI Agents

Document Understanding

Boyang Zhang, Sebastián G. Acosta, Preston Carlson, et al.

Memory Intelligence Agent

Jingyang Qiao, Weicheng Meng, Yu Cheng, et al.

PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE

Maximilian Idahl, Benedikt Droste, Björn Plüster, et al.

Internalized Reasoning for Long-Context Visual Document Understanding

Document Understanding

Visual Question Answering

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Amir Zandieh, Majid Daliri, Majid Hadian, et al.

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

Text Generation

Hippolyte Gisserot-Boukhlef, Nicolas Boizard, Emmanuel Malherbe, et al.

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Reinforcement Learning

Tianyi Wang, Yixia Li, Long Li, et al.

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Jiachen Zhu, Lingyu Yang, Rong Shan, et al.

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

Zeyue Tian, Binxin Yang, Zhaoyang Liu, et al.

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Yaxuan Li, Yuxin Zuo, Bingxiang He, et al.

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

Reinforcement Learning

Linhao Yu, Tianmeng Yang, Siyu Ding, et al.

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

Video Generation

Video Understanding

Luozheng Qin, Jia Gong, Qian Qiao, et al.

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Fei Tang, Zhiqiong Lu, Boxuan Zhang, et al.

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Zunhai Su, Hengyuan Zhang, Wei Wu, et al.

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Video Generation

Donghao Zhou, Guisheng Liu, Hao Yang, et al.

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Reinforcement Learning

Yang Liu, Enxi Wang, Yufei Gao, et al.

QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

Code Generation

Ali Slim, Haydar Hamieh, Jawad Kotaich, et al.

ELT: Elastic Looped Transformers for Visual Generation

Image Generation

Video Generation

Sahil Goyal, Swayam Agrawal, Gautham Govind, et al.

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

Diffusion Model

Text Generation

Lifeng Chen, Tianqi You, Hao Liu, et al.

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

Zile Wang, Zexiang Liu, Jaixing Li, et al.

EXAONE 4.5 Technical Report

Eunbi Choi, Kibong Choi, Sehyun Chun, et al.

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

Diffusion Model

Dewei Zhou, You Li, Zongxin Yang, et al.

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Seedance 2.0: Advancing Video Generation for World Complexity

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark

ParseBench: A Document Parsing Benchmark for AI Agents

Memory Intelligence Agent

PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE

Internalized Reasoning for Long-Context Visual Document Understanding

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

ELT: Elastic Looped Transformers for Visual Generation

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

EXAONE 4.5 Technical Report

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Seedance 2.0: Advancing Video Generation for World Complexity

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark

ParseBench: A Document Parsing Benchmark for AI Agents

Memory Intelligence Agent

PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE

Internalized Reasoning for Long-Context Visual Document Understanding

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

ELT: Elastic Looped Transformers for Visual Generation

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

EXAONE 4.5 Technical Report

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details