HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

Algorithmic Thinking Theory

Algorithmic Thinking Theory

MohammadHossein Bateni, Vincent Cohen-Addad, Yuzhou Gu, et al.

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

Reinforcement Learning

Chenhao Li, Andreas Krause, Marco Hutter

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Video Generation

Diffusion Model

Yunhong Lu, Yanhong Zeng, Haobo Li, et al.

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Diffusion Model

Image Generation

Yueming Pan, Ruoyu Feng, Qi Dai, et al.

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Preference Modeling

Shengyuan Ding, Xinyu Fang, Ziyu Liu, et al.

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Nex-AGI Team, Yuxuan Cai, Lu Chen, et al.

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Fangyu Lei, Jinxiang Meng, Yiming Huang, et al.

Diffusion Model

Yubo Huang, Hailong Guo, Fangtai Wu, et al.

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Yushen Chen, Zhikang Niu, Ziyang Ma, et al.

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Video Understanding

Object Detection

Yash Garg, Saketh Bachu, Arindam Dutta, et al.

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

Reinforcement Learning

NVIDIA, Yulong Cao, Tong Che, et al.

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Neural Networks

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, et al.

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Diffusion Model

Subin Kim, Sangwoo Mo, Mamshad Nayeem Rizve, et al.

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Supervised Fine-Tuning

Siyuan Yang, Yang Zhang, Haoran He, et al.

OneThinker: All-in-one Reasoning Model for Image and Video

Visual Question Answering

Multi-Task Learning

Kaituo Feng, Manyuan Zhang, Hongyu Li, et al.

ViDiC: Video Difference Captioning

Video Captioning

Jiangtao Wu, Shihao Li, Zhaozhou Bian, et al.

PretrainZero: Reinforcement Active Pretraining

Reinforcement Learning

Xingrun Xing, Zhiyuan Fan, Jie Lou, et al.

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Xiang Hu, Zhanchao Zhou, Ruiqi Liang, et al.

SimScale: Learning to Drive via Real-World Simulation at Scale

Autonomous Driving

Haochen Tian, Tianyu Li, Haochen Liu, et al.

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Retrieval-Augmented Generation

Yifan Zhang, Liang Hu, Haofeng Sun, et al.

Guided Self-Evolving LLMs with Minimal Human Supervision

Wenhao Yu, Zhenwen Liang, Chengsong Huang, et al.

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Video Generation

Qinghe Wang, Xiaoyu Shi, Baolu Li, et al.

MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

Computer Vision

Object Detection

Bo Wang, Jiehong Lin, Chenzhi Liu, et al.

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

Image Generation

Ziheng Ouyang, Yiren Song, Yaoli Liu, et al.

How Far Are We from Genuinely Useful Deep Research Agents?

Dingling Zhang, He Zhu, Jincheng Ren, et al.

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Reinforcement Learning

Chujie Zheng, Kai Dang, Bowen Yu, et al.

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

Juanxi Tian, Siyuan Li, Conghui He, et al.

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Video Understanding

Visual Question Answering

Zuhao Yang, Sudong Wang, Kaichen Zhang, et al.

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Supervised Fine-Tuning

Jian Yang, Wei Zhang, Shark Liu, et al.

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Video Understanding

Video Generation

Shuhai Zhang, ZiHao Lian, Jiahao Yang, et al.

Mem-α: Learning Memory Construction via Reinforcement Learning

Reinforcement Learning

Yu Wang, Ryuichi Takanobu, Zhiqi Liang, et al.

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Reinforcement Learning

Hongliang Lu, Yuhang Wen, Pengyu Cheng, et al.

Algorithmic Thinking Theory

Algorithmic Thinking Theory

MohammadHossein Bateni, Vincent Cohen-Addad, Yuzhou Gu, et al.

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

Reinforcement Learning

Chenhao Li, Andreas Krause, Marco Hutter

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Video Generation

Diffusion Model

Yunhong Lu, Yanhong Zeng, Haobo Li, et al.

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Diffusion Model

Image Generation

Yueming Pan, Ruoyu Feng, Qi Dai, et al.

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Preference Modeling

Shengyuan Ding, Xinyu Fang, Ziyu Liu, et al.

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Nex-AGI Team, Yuxuan Cai, Lu Chen, et al.

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Fangyu Lei, Jinxiang Meng, Yiming Huang, et al.

Diffusion Model

Yubo Huang, Hailong Guo, Fangtai Wu, et al.

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Yushen Chen, Zhikang Niu, Ziyang Ma, et al.

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Video Understanding

Object Detection

Yash Garg, Saketh Bachu, Arindam Dutta, et al.

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

Reinforcement Learning

NVIDIA, Yulong Cao, Tong Che, et al.

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Neural Networks

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, et al.

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Diffusion Model

Subin Kim, Sangwoo Mo, Mamshad Nayeem Rizve, et al.

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Supervised Fine-Tuning

Siyuan Yang, Yang Zhang, Haoran He, et al.

OneThinker: All-in-one Reasoning Model for Image and Video

Visual Question Answering

Multi-Task Learning

Kaituo Feng, Manyuan Zhang, Hongyu Li, et al.

ViDiC: Video Difference Captioning

Video Captioning

Jiangtao Wu, Shihao Li, Zhaozhou Bian, et al.

PretrainZero: Reinforcement Active Pretraining

Reinforcement Learning

Xingrun Xing, Zhiyuan Fan, Jie Lou, et al.

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Xiang Hu, Zhanchao Zhou, Ruiqi Liang, et al.

SimScale: Learning to Drive via Real-World Simulation at Scale

Autonomous Driving

Haochen Tian, Tianyu Li, Haochen Liu, et al.

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Retrieval-Augmented Generation

Yifan Zhang, Liang Hu, Haofeng Sun, et al.

Guided Self-Evolving LLMs with Minimal Human Supervision

Wenhao Yu, Zhenwen Liang, Chengsong Huang, et al.

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Video Generation

Qinghe Wang, Xiaoyu Shi, Baolu Li, et al.

MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

Computer Vision

Object Detection

Bo Wang, Jiehong Lin, Chenzhi Liu, et al.

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

Image Generation

Ziheng Ouyang, Yiren Song, Yaoli Liu, et al.

How Far Are We from Genuinely Useful Deep Research Agents?

Dingling Zhang, He Zhu, Jincheng Ren, et al.

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Reinforcement Learning

Chujie Zheng, Kai Dang, Bowen Yu, et al.

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

Juanxi Tian, Siyuan Li, Conghui He, et al.

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Video Understanding

Visual Question Answering

Zuhao Yang, Sudong Wang, Kaichen Zhang, et al.

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Supervised Fine-Tuning

Jian Yang, Wei Zhang, Shark Liu, et al.

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Video Understanding

Video Generation

Shuhai Zhang, ZiHao Lian, Jiahao Yang, et al.

Mem-α: Learning Memory Construction via Reinforcement Learning

Reinforcement Learning

Yu Wang, Ryuichi Takanobu, Zhiqi Liang, et al.

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Reinforcement Learning

Hongliang Lu, Yuhang Wen, Pengyu Cheng, et al.

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

OneThinker: All-in-one Reasoning Model for Image and Video

ViDiC: Video Difference Captioning

PretrainZero: Reinforcement Active Pretraining

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

SimScale: Learning to Drive via Real-World Simulation at Scale

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Guided Self-Evolving LLMs with Minimal Human Supervision

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

How Far Are We from Genuinely Useful Deep Research Agents?

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Mem-α: Learning Memory Construction via Reinforcement Learning

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

OneThinker: All-in-one Reasoning Model for Image and Video

ViDiC: Video Difference Captioning

PretrainZero: Reinforcement Active Pretraining

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

SimScale: Learning to Drive via Real-World Simulation at Scale

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Guided Self-Evolving LLMs with Minimal Human Supervision

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

How Far Are We from Genuinely Useful Deep Research Agents?

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Mem-α: Learning Memory Construction via Reinforcement Learning

Search Self-play: Pushing the Frontier of Agent Capability without Supervision