HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

Diffusion Model

3D Machine Vision

Jin Hyeon Kim, Jaeeun Lee, Claire Kim, et al.

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

Video Generation

Songlin Yang, Haobin Zhong, Ruilin Zhang, et al.

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

Reinforcement Learning

Dingbang Wu, Rui Hao, Haiyang Wang, et al.

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

Haosong Peng, Hao Li, Jiaqi Chen, et al.

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Object Detection

Shihao Wang, Shilong Liu, Yuanguo Kuang, et al.

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Multimodal Representation

Madhuri Shanbhogue, Zhe Li, Shanfeng Zhang, et al.

Language Models Need Sleep

Sangyun Lee, Sean McLeish, Tom Goldstein, et al.

ECHO: Terminal Agents Learn World Models for Free

Reinforcement Learning

Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, et al.

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Video Understanding

Zuhao Yang, Kaichen Zhang, Sudong Wang, et al.

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Computer Vision

Weijie Wang, Zimu Li, Jinchuan Shi, et al.

Foundation Protocol: A Coordination Layer for Agentic Society

Bang Liu, Yongfeng Gu, Jiayi Zhang, et al.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Video Generation

Kaining Ying, Hengrui Hu, Siyu Ren, et al.

Macaron-A2UI: A Model for Generative UI in Personal Agents

Fancy Kong, Congjie Zheng, Murphy Zhuang, et al.

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

Reinforcement Learning

Multi-Task Learning

Guochao Jiang, Jingyi Song, Guofeng Quan, et al.

ViMU: Benchmarking Video Metaphorical Understanding

Video Understanding

Emotion Recognition

Qi Li, Xinchao Wang

SMOL: Professionally translated parallel data for 115 under-represented languages

Supervised Fine-Tuning

Isaac Caswell, Elizabeth Nielsen, Jiaming Luo, et al.

Chi-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Haolin Chen, Deon Metelski, Leon Qi, et al.

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

Miguel Moura Ramos, Duarte M. Alves, André F. T. Martins

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs

Visual Question Answering

Zhiyu Pan, Yizheng Wu, Jiashen Hua, et al.

HRM-Text: Efficient Pretraining Beyond Scaling

Guan Wang, Changling Liu, Chenyu Wang, et al.

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

Multimodal Representation

Boyuan Sun, Bowen Yin, Yuanming Li, et al.

StepAudio 2.5 Technical Report

Audio Recognition

Bin Lin, Bo Zhao, Boyong Wu, et al.

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

Shuofei Qiao, Yunxiang Wei, Jiazheng Fan, et al.

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Diffusion Model

Chao Xu, Maohua Li, Qirui Li, et al.

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

Diffusion Model

Dong Chen, Fangyun Wei, Ziyu Wan, et al.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Yifan Yang, Ziyang Gong, Weiquan Huang, et al.

CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

Image Processing

Ming Du, Xiangyu Yin, Yanqi Luo, et al.

Poly-EPO: Training Exploratory Reasoning Models

Reinforcement Learning

Ifdita Hasan Orney, Jubayer Ibn Hamid, Shreya S Ramanujam, et al.

MEMO: Memory as a Model

Retrieval-Augmented Generation

Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong, et al.

ACC: Compiling Agent Trajectories for Long-Context Training

Supervised Fine-Tuning

Qisheng Su, Zhen Fang, Shiting Huang, et al.

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Yanke Zhou, Yiduo Li, Hanlin Tang, et al.

$π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

Haoran Zhang, Luxin Xu, Zhilin Wang, et al.

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

Diffusion Model

3D Machine Vision

Jin Hyeon Kim, Jaeeun Lee, Claire Kim, et al.

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

Video Generation

Songlin Yang, Haobin Zhong, Ruilin Zhang, et al.

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

Reinforcement Learning

Dingbang Wu, Rui Hao, Haiyang Wang, et al.

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

Haosong Peng, Hao Li, Jiaqi Chen, et al.

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Object Detection

Shihao Wang, Shilong Liu, Yuanguo Kuang, et al.

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Multimodal Representation

Madhuri Shanbhogue, Zhe Li, Shanfeng Zhang, et al.

Language Models Need Sleep

Sangyun Lee, Sean McLeish, Tom Goldstein, et al.

ECHO: Terminal Agents Learn World Models for Free

Reinforcement Learning

Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, et al.

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Video Understanding

Zuhao Yang, Kaichen Zhang, Sudong Wang, et al.

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Computer Vision

Weijie Wang, Zimu Li, Jinchuan Shi, et al.

Foundation Protocol: A Coordination Layer for Agentic Society

Bang Liu, Yongfeng Gu, Jiayi Zhang, et al.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Video Generation

Kaining Ying, Hengrui Hu, Siyu Ren, et al.

Macaron-A2UI: A Model for Generative UI in Personal Agents

Fancy Kong, Congjie Zheng, Murphy Zhuang, et al.

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

Reinforcement Learning

Multi-Task Learning

Guochao Jiang, Jingyi Song, Guofeng Quan, et al.

ViMU: Benchmarking Video Metaphorical Understanding

Video Understanding

Emotion Recognition

Qi Li, Xinchao Wang

SMOL: Professionally translated parallel data for 115 under-represented languages

Supervised Fine-Tuning

Isaac Caswell, Elizabeth Nielsen, Jiaming Luo, et al.

Chi-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Haolin Chen, Deon Metelski, Leon Qi, et al.

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

Miguel Moura Ramos, Duarte M. Alves, André F. T. Martins

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs

Visual Question Answering

Zhiyu Pan, Yizheng Wu, Jiashen Hua, et al.

HRM-Text: Efficient Pretraining Beyond Scaling

Guan Wang, Changling Liu, Chenyu Wang, et al.

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

Multimodal Representation

Boyuan Sun, Bowen Yin, Yuanming Li, et al.

StepAudio 2.5 Technical Report

Audio Recognition

Bin Lin, Bo Zhao, Boyong Wu, et al.

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

Shuofei Qiao, Yunxiang Wei, Jiazheng Fan, et al.

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Diffusion Model

Chao Xu, Maohua Li, Qirui Li, et al.

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

Diffusion Model

Dong Chen, Fangyun Wei, Ziyu Wan, et al.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Yifan Yang, Ziyang Gong, Weiquan Huang, et al.

CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

Image Processing

Ming Du, Xiangyu Yin, Yanqi Luo, et al.

Poly-EPO: Training Exploratory Reasoning Models

Reinforcement Learning

Ifdita Hasan Orney, Jubayer Ibn Hamid, Shreya S Ramanujam, et al.

MEMO: Memory as a Model

Retrieval-Augmented Generation

Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong, et al.

ACC: Compiling Agent Trajectories for Long-Context Training

Supervised Fine-Tuning

Qisheng Su, Zhen Fang, Shiting Huang, et al.

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Yanke Zhou, Yiduo Li, Hanlin Tang, et al.

$π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

Haoran Zhang, Luxin Xu, Zhilin Wang, et al.

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Language Models Need Sleep

ECHO: Terminal Agents Learn World Models for Free

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Foundation Protocol: A Coordination Layer for Agentic Society

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Macaron-A2UI: A Model for Generative UI in Personal Agents

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

ViMU: Benchmarking Video Metaphorical Understanding

SMOL: Professionally translated parallel data for 115 under-represented languages

Chi-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs

HRM-Text: Efficient Pretraining Beyond Scaling

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

StepAudio 2.5 Technical Report

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

Poly-EPO: Training Exploratory Reasoning Models

MEMO: Memory as a Model

ACC: Compiling Agent Trajectories for Long-Context Training

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

$π$ -Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Language Models Need Sleep

ECHO: Terminal Agents Learn World Models for Free

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Foundation Protocol: A Coordination Layer for Agentic Society

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Macaron-A2UI: A Model for Generative UI in Personal Agents

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

ViMU: Benchmarking Video Metaphorical Understanding

SMOL: Professionally translated parallel data for 115 under-represented languages

Chi-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs

HRM-Text: Efficient Pretraining Beyond Scaling

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

StepAudio 2.5 Technical Report

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

Poly-EPO: Training Exploratory Reasoning Models

MEMO: Memory as a Model

ACC: Compiling Agent Trajectories for Long-Context Training

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

$π$ -Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows