HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

TongSIM: A General Platform for Simulating Intelligent Machines

TongSIM: A General Platform for Simulating Intelligent Machines

Embodied Intelligence

Zhe Sun, Kunlun Wu, Chuanjian Fu, et al.

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Diffusion Model

Image Generation

Shengming Yin, Zekai Zhang, Zecheng Tang, et al.

RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

Le Wang, Zonghao Ying, Xiao Yang, et al.

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care

Natural Language Processing

Oliver Normand, Esther Borsi, Mitch Fruin, et al.

Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

Natural Language Processing

Nilesh Jain, Seyi Adeyinka, Leor Roseman, et al.

Active Intelligence in Video Avatars via Closed-loop World Modeling

Embodied Intelligence

Reinforcement Learning

Xuanhua He, Tianyu Yang, Ke Cao, et al.

FaithLens: Detecting and Explaining Faithfulness Hallucination

Retrieval-Augmented Generation

Supervised Fine-Tuning

Shuzheng Si, Qingyi Wang, Haozhe Zhao, et al.

SAM Audio: Segment Anything in Audio

Bowen Shi, Andros Tjandra, John Hoffman, et al.

Step-DeepResearch Technical Report

Supervised Fine-Tuning

Chen Hu, Haikuo Du, Heng Wang, et al.

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Yuxi Xiao, Longfei Li, Shen Yan, et al.

SemanticGen: Video Generation in Semantic Space

Video Generation

Jianhong Bai, Xiaoshi Wu, Xintao Wang, et al.

Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent

Humza Nusrat, Luke Francisco, Bing Luo, et al.

LongVideoAgent: Multi-Agent Reasoning with Long Videos

Visual Question Answering

Runtao Liu, Ziyi Liu, Jiaqi Tang, et al.

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Jiacheng Guo, Ling Yang, Peter Chen, et al.

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Diffusion Model

Hanyang Kong, Xingyi Yang, Xiaoxu Zheng, et al.

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

Embodied Intelligence

Depth Estimation

Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, et al.

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Ming Li, Han Chen, Yunze Xiao, et al.

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Retrieval-Augmented Generation

Intelligent Question Answering

Dehai Min, Kailin Zhang, Tongtong Wu, et al.

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Multimodal Representation

Weichen Fan, Haiwen Diao, Quan Wang, et al.

Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing

Zhihui Chen, Mengling Feng

Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference

Dhruv Deshmukh, Saurabh Goyal, Nipun Kwatra, et al.

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Rang Li, Lei Li, Shuhuai Ren, et al.

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Diffusion Model

Shilong Zhang, He Zhang, Zhifei Zhang, et al.

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Visual Question Answering

Multimodal Representation

Chiao-An Yang, Ryo Hachiuma, Sifei Liu, et al.

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Code Generation

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, et al.

When Reasoning Meets Its Laws

Junyu Zhang, Yifan Sun, Tianang Leng, et al.

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Wanghan Xu, Yuhao Zhou, Yifan Zhou, et al.

K2-V2: A 360-Open, Reasoning-Enhanced LLM

Supervised Fine-Tuning

Zhengzhong Liu, Liping Tang, Linghao Jin, et al.

VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks

Human-Computer Interaction

Beitong Zhou, Zhexiao Huang, Yuan Guo, et al.

MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

Sara Papi, Maike Züfle, Marco Gaido, et al.

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Computer Vision

Video Understanding

Loic Magne, Anas Awadalla, Guanzhi Wang, et al.

TongSIM: A General Platform for Simulating Intelligent Machines

TongSIM: A General Platform for Simulating Intelligent Machines

Embodied Intelligence

Zhe Sun, Kunlun Wu, Chuanjian Fu, et al.

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Diffusion Model

Image Generation

Shengming Yin, Zekai Zhang, Zecheng Tang, et al.

RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

Le Wang, Zonghao Ying, Xiao Yang, et al.

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care

Natural Language Processing

Oliver Normand, Esther Borsi, Mitch Fruin, et al.

Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

Natural Language Processing

Nilesh Jain, Seyi Adeyinka, Leor Roseman, et al.

Active Intelligence in Video Avatars via Closed-loop World Modeling

Embodied Intelligence

Reinforcement Learning

Xuanhua He, Tianyu Yang, Ke Cao, et al.

FaithLens: Detecting and Explaining Faithfulness Hallucination

Retrieval-Augmented Generation

Supervised Fine-Tuning

Shuzheng Si, Qingyi Wang, Haozhe Zhao, et al.

SAM Audio: Segment Anything in Audio

Bowen Shi, Andros Tjandra, John Hoffman, et al.

Step-DeepResearch Technical Report

Supervised Fine-Tuning

Chen Hu, Haikuo Du, Heng Wang, et al.

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Yuxi Xiao, Longfei Li, Shen Yan, et al.

SemanticGen: Video Generation in Semantic Space

Video Generation

Jianhong Bai, Xiaoshi Wu, Xintao Wang, et al.

Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent

Humza Nusrat, Luke Francisco, Bing Luo, et al.

LongVideoAgent: Multi-Agent Reasoning with Long Videos

Visual Question Answering

Runtao Liu, Ziyi Liu, Jiaqi Tang, et al.

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Jiacheng Guo, Ling Yang, Peter Chen, et al.

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Diffusion Model

Hanyang Kong, Xingyi Yang, Xiaoxu Zheng, et al.

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

Embodied Intelligence

Depth Estimation

Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, et al.

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Ming Li, Han Chen, Yunze Xiao, et al.

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Retrieval-Augmented Generation

Intelligent Question Answering

Dehai Min, Kailin Zhang, Tongtong Wu, et al.

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Multimodal Representation

Weichen Fan, Haiwen Diao, Quan Wang, et al.

Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing

Zhihui Chen, Mengling Feng

Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference

Dhruv Deshmukh, Saurabh Goyal, Nipun Kwatra, et al.

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Rang Li, Lei Li, Shuhuai Ren, et al.

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Diffusion Model

Shilong Zhang, He Zhang, Zhifei Zhang, et al.

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Visual Question Answering

Multimodal Representation

Chiao-An Yang, Ryo Hachiuma, Sifei Liu, et al.

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Code Generation

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, et al.

When Reasoning Meets Its Laws

Junyu Zhang, Yifan Sun, Tianang Leng, et al.

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Wanghan Xu, Yuhao Zhou, Yifan Zhou, et al.

K2-V2: A 360-Open, Reasoning-Enhanced LLM

Supervised Fine-Tuning

Zhengzhong Liu, Liping Tang, Linghao Jin, et al.

VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks

Human-Computer Interaction

Beitong Zhou, Zhexiao Huang, Yuan Guo, et al.

MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

Sara Papi, Maike Züfle, Marco Gaido, et al.

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Computer Vision

Video Understanding

Loic Magne, Anas Awadalla, Guanzhi Wang, et al.

RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care

Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

Active Intelligence in Video Avatars via Closed-loop World Modeling

FaithLens: Detecting and Explaining Faithfulness Hallucination

SAM Audio: Segment Anything in Audio

Step-DeepResearch Technical Report

SpatialTree: How Spatial Abilities Branch Out in MLLMs

SemanticGen: Video Generation in Semantic Space

Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent

LongVideoAgent: Multi-Agent Reasoning with Long Videos

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing

Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

When Reasoning Meets Its Laws

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

K2-V2: A 360-Open, Reasoning-Enhanced LLM

VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks

MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

NitroGen: An Open Foundation Model for Generalist Gaming Agents

RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care

Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

Active Intelligence in Video Avatars via Closed-loop World Modeling

FaithLens: Detecting and Explaining Faithfulness Hallucination

SAM Audio: Segment Anything in Audio

Step-DeepResearch Technical Report

SpatialTree: How Spatial Abilities Branch Out in MLLMs

SemanticGen: Video Generation in Semantic Space

Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent

LongVideoAgent: Multi-Agent Reasoning with Long Videos

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing

Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

When Reasoning Meets Its Laws

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

K2-V2: A 360-Open, Reasoning-Enhanced LLM

VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks

MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

NitroGen: An Open Foundation Model for Generalist Gaming Agents