HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

SEAR: Schema-Based Evaluation and Routing for LLM Gateways

SEAR: Schema-Based Evaluation and Routing for LLM Gateways

Text Generation

Zecheng Zhang, Han Zheng, Yue Xu

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Diffusion Model

Omer Dahary, Benaya Koren, Daniel Garibi, et al.

EpochX: Building the Infrastructure for an Emergent Agent Civilization

Huacan Wang, Chaofa Yuan, Xialie Zhuang, et al.

TAPS: Task Aware Proposal Distributions for Speculative Sampling

Text Generation

Mohamad Zbib, Mohamad Bazzi, Ammar Mohanna, et al.

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

Autonomous Driving

Royden Wagner, Omer Sahin Tas, Jaime Villa, et al.

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Code Generation

Jiajun Zhang, Yuying Li, Zhixun Li, et al.

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Jingwei Ni, Yihao Liu, Xinpeng Liu, et al.

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

Video Generation

Diffusion Model

Xiaofeng Mao, Shaohao Rui, Kaining Ying, et al.

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Video Generation

Yawen Luo, Xiaoyu Shi, Junhao Zhuang, et al.

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Video Generation

Object Tracking

Kaijin Chen, Dingkang Liang, Xin Zhou, et al.

BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments

Yuxuan Li, Yi Lin, Peng Wang, et al.

World Reasoning Arena

Qiyue Gao, Kun Zhou, Jiannan Xiang, et al.

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Retrieval-Augmented Generation

Yu Chen, Runkai Chen, Sheng Yi, et al.

Voxtral TTS

Alexander H. Liu, Alexis Tacnet, Andy Ehrenberg, et al.

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Diffusion Model

Yufeng Yang, Xianfang Zeng, Zhangqi Jiang, et al.

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Diffusion Model

Danil Tokhchukov, Aysel Mirzoeva, Andrey Kuznetsov, et al.

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Yicheng Zou, Dongsheng Zhu, Lin Zhu, et al.

PixelSmile: Toward Fine-Grained Facial Expression Editing

Diffusion Model

Jiabin Hua, Hengyuan Xu, Aojie Li, et al.

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov, Peter Romov, Igor Shilov, et al.

AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness

Code Generation

Xinghua Lou, Miguel Lázaro-Gredilla, Antoine Dedieu, et al.

GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

Video Understanding

Visual Question Answering

Yunzhe Wang, Runhui Xu, Kexin Zheng, et al.

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Jeonghye Kim, Xufang Luo, Minbeom Kim, et al.

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

Zichuan Lin, Feiyu Liu, Yijun Yang, et al.

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Hyomin Lee, Sangwoo Park, Yumin Choi, et al.

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Video Understanding

Xiangru Jian, Shravan Nayak, Kevin Qinghong Lin, et al.

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

Video Understanding

Yaolun Zhang, Ruohui Wang, Jiahao Wang, et al.

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Diffusion Model

Image Inpainting

Brian Chao, Lior Yariv, Howard Xiao, et al.

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

Video Understanding

Shoubin Yu, Lei Shu, Antoine Yang, et al.

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Ling Yue, Kushal Raj Bhandari, Ching-Yun Ko, et al.

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Haoyu Huang, Jinfa Huang, Zhongwei Wan, et al.

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

Diffusion Model

Video Processing

Jaewon Min, Jaeeun Lee, Yeji Choi, et al.

PEARL: Personalized Streaming Video Understanding Model

Video Understanding

Yuanhong Zheng, Ruichuan An, Xiaopeng Lin, et al.

SEAR: Schema-Based Evaluation and Routing for LLM Gateways

SEAR: Schema-Based Evaluation and Routing for LLM Gateways

Text Generation

Zecheng Zhang, Han Zheng, Yue Xu

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Diffusion Model

Omer Dahary, Benaya Koren, Daniel Garibi, et al.

EpochX: Building the Infrastructure for an Emergent Agent Civilization

Huacan Wang, Chaofa Yuan, Xialie Zhuang, et al.

TAPS: Task Aware Proposal Distributions for Speculative Sampling

Text Generation

Mohamad Zbib, Mohamad Bazzi, Ammar Mohanna, et al.

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

Autonomous Driving

Royden Wagner, Omer Sahin Tas, Jaime Villa, et al.

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Code Generation

Jiajun Zhang, Yuying Li, Zhixun Li, et al.

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Jingwei Ni, Yihao Liu, Xinpeng Liu, et al.

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

Video Generation

Diffusion Model

Xiaofeng Mao, Shaohao Rui, Kaining Ying, et al.

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Video Generation

Yawen Luo, Xiaoyu Shi, Junhao Zhuang, et al.

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Video Generation

Object Tracking

Kaijin Chen, Dingkang Liang, Xin Zhou, et al.

BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments

Yuxuan Li, Yi Lin, Peng Wang, et al.

World Reasoning Arena

Qiyue Gao, Kun Zhou, Jiannan Xiang, et al.

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Retrieval-Augmented Generation

Yu Chen, Runkai Chen, Sheng Yi, et al.

Voxtral TTS

Alexander H. Liu, Alexis Tacnet, Andy Ehrenberg, et al.

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Diffusion Model

Yufeng Yang, Xianfang Zeng, Zhangqi Jiang, et al.

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Diffusion Model

Danil Tokhchukov, Aysel Mirzoeva, Andrey Kuznetsov, et al.

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Yicheng Zou, Dongsheng Zhu, Lin Zhu, et al.

PixelSmile: Toward Fine-Grained Facial Expression Editing

Diffusion Model

Jiabin Hua, Hengyuan Xu, Aojie Li, et al.

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov, Peter Romov, Igor Shilov, et al.

AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness

Code Generation

Xinghua Lou, Miguel Lázaro-Gredilla, Antoine Dedieu, et al.

GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

Video Understanding

Visual Question Answering

Yunzhe Wang, Runhui Xu, Kexin Zheng, et al.

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Jeonghye Kim, Xufang Luo, Minbeom Kim, et al.

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

Zichuan Lin, Feiyu Liu, Yijun Yang, et al.

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Hyomin Lee, Sangwoo Park, Yumin Choi, et al.

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Video Understanding

Xiangru Jian, Shravan Nayak, Kevin Qinghong Lin, et al.

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

Video Understanding

Yaolun Zhang, Ruohui Wang, Jiahao Wang, et al.

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Diffusion Model

Image Inpainting

Brian Chao, Lior Yariv, Howard Xiao, et al.

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

Video Understanding

Shoubin Yu, Lei Shu, Antoine Yang, et al.

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Ling Yue, Kushal Raj Bhandari, Ching-Yun Ko, et al.

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Haoyu Huang, Jinfa Huang, Zhongwei Wan, et al.

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

Diffusion Model

Video Processing

Jaewon Min, Jaeeun Lee, Yeji Choi, et al.

PEARL: Personalized Streaming Video Understanding Model

Video Understanding

Yuanhong Zheng, Ruichuan An, Xiaopeng Lin, et al.

EpochX: Building the Infrastructure for an Emergent Agent Civilization

TAPS: Task Aware Proposal Distributions for Speculative Sampling

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments

World Reasoning Arena

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Voxtral TTS

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

PixelSmile: Toward Fine-Grained Facial Expression Editing

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness

GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

PEARL: Personalized Streaming Video Understanding Model

EpochX: Building the Infrastructure for an Emergent Agent Civilization

TAPS: Task Aware Proposal Distributions for Speculative Sampling

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments

World Reasoning Arena

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Voxtral TTS

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

PixelSmile: Toward Fine-Grained Facial Expression Editing

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness

GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

PEARL: Personalized Streaming Video Understanding Model