HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

Multi-Ontology Integration with Dual-Axis Propagation for Medical Concept Representation

Multi-Ontology Integration with Dual-Axis Propagation for Medical Concept Representation

Retrieval-Augmented Generation

Mohsen Nayebi Kerdabadi, Arya Hadizadeh Moghaddam, Dongjie Wang, Zijun Yao

Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture

Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture

Yeawon Lee, Xiaoyang Wang, Christopher C. Yang

SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion

Document Understanding

Ahmed Nassar, Andres Marafioti, Matteo Omenetti, et al.

olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

Document Understanding

Luca Soldaini, Kyle Lo, Christopher Wilhelm, et al.

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench

Venkatesh Mishra, Amir Saeidi, Satyam Raj, et al.

UI-Level Evaluation of ALLaM 34B: Measuring an Arabic-Centric LLM via
HUMAIN Chat

Natural Language Processing

From reactive to cognitive: brain-inspired spatial intelligence for
embodied agents

Embodied Intelligence

Shouwei Ruan, Liyuan Wang, Caixin Kang, et al.

No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

Computer Vision

Object Detection

Blaž Rolih, Matic Fučka, Danijel Skočaj

T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

Jie Zhang, Changzai Pan, Kaiwen Wei, et al.

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Reinforcement Learning

Wenfeng Feng, Penghong Zhao, Guochao Jiang, et al.

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Preference Modeling

Reinforcement Learning

Yuntao Bai, Andy Jones, Kamal Ndousse, et al.

UQ: Assessing Language Models on Unsolved Questions

Fan Nie, Ken Ziyu Liu, Zihao Wang, et al.

CARJAN: Agent-Based Generation and Simulation of Traffic Scenarios with AJAN

Autonomous Driving

Leonard Frank Neis, Andre Antakli, Matthias Klusch

TiKMiX: Take Data Influence into Dynamic Mixture for Language Model
Pre-training

Yifan Wang, Binbin Liu, Fengze Liu, et al.

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

Shunian Chen, Hejin Huang, Yexin Liu, et al.

Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

Video Understanding

Xiaochuan Li, Guoguang Du, Runze Zhang, et al.

A.S.E: A Repository-Level Benchmark for Evaluating Security in
AI-Generated Code

Code Generation

Keke Lian, Bin Wang, Lei Zhang, et al.

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for
General Robot Control

Embodied Intelligence

Delin Qu, Haoming Song, Qizhi Chen, et al.

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Jie Jiang, Qi Yang, Bolin Ni, et al.

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards

Supervised Fine-Tuning

Preference Modeling

Xiaolong Wei, Bo Lu, Xingyu Zhang, et al.

TMUAD: Enhancing Logical Capabilities in Unified Anomaly Detection Models with a Text Memory Bank

Computer Vision

Image Understanding

Jiawei Liu, Jiahe Hou, Wei Wang, et al.

Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation?

Supervised Fine-Tuning

Samuel Lewis-Lim, Xingwei Tan, Zhixue Zhao, et al.

AWorld: Orchestrating the Training Recipe for Agentic AI

Chengyue Yu, Siyuan Lu, Chenyi Zhuang, et al.

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World
Tasks via MCP Servers

Zhenting Wang, Qi Chang, Hemani Patel, et al.

rStar2-Agent: Agentic Reasoning Technical Report

Reinforcement Learning

Ning Shang, Yifei Liu, Yi Zhu, et al.

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning

Preference Modeling

Yibin Wang, Zhimin Li, Yuhang Zang, et al.

MobileCLIP2: Improving Multi-Modal Reinforced Training

Image Captioning

Fartash Faghri, Pavan Kumar Anasosalu Vasu, Cem Koc, et al.

AI-AI Esthetic Collaboration with Explicit Semiotic Awareness and Emergent Grammar Development

Artificial Intelligence

Natural Language Processing

Nicanor I. Moldovan

Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health
Biomarkers Estimation

Computer Vision

Video Understanding

Konstantin Egorov, Stepan Botman, Pavel Blinov, et al.

Predicting the Order of Upcoming Tokens Improves Language Modeling

Zayd M. K. Zuhri, Erland Hilman Fuadi, Alham Fikri Aji

MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time
Autoregressive Video Generation

Ming Chen, Liyuan Cui, Wenyuan Zhang, et al.

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding
in Vision-Language-Action Policies

Diffusion Model

Zhixuan Liang, Yizhuo Li, Tianshuo Yang, et al.

Multi-Ontology Integration with Dual-Axis Propagation for Medical Concept Representation

Multi-Ontology Integration with Dual-Axis Propagation for Medical Concept Representation

Retrieval-Augmented Generation

Mohsen Nayebi Kerdabadi, Arya Hadizadeh Moghaddam, Dongjie Wang, Zijun Yao

Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture

Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture

Yeawon Lee, Xiaoyang Wang, Christopher C. Yang

SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion

Document Understanding

Ahmed Nassar, Andres Marafioti, Matteo Omenetti, et al.

olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

Document Understanding

Luca Soldaini, Kyle Lo, Christopher Wilhelm, et al.

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench

Venkatesh Mishra, Amir Saeidi, Satyam Raj, et al.

UI-Level Evaluation of ALLaM 34B: Measuring an Arabic-Centric LLM via
HUMAIN Chat

Natural Language Processing

From reactive to cognitive: brain-inspired spatial intelligence for
embodied agents

Embodied Intelligence

Shouwei Ruan, Liyuan Wang, Caixin Kang, et al.

No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

Computer Vision

Object Detection

Blaž Rolih, Matic Fučka, Danijel Skočaj

T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

Jie Zhang, Changzai Pan, Kaiwen Wei, et al.

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Reinforcement Learning

Wenfeng Feng, Penghong Zhao, Guochao Jiang, et al.

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Preference Modeling

Reinforcement Learning

Yuntao Bai, Andy Jones, Kamal Ndousse, et al.

UQ: Assessing Language Models on Unsolved Questions

Fan Nie, Ken Ziyu Liu, Zihao Wang, et al.

CARJAN: Agent-Based Generation and Simulation of Traffic Scenarios with AJAN

Autonomous Driving

Leonard Frank Neis, Andre Antakli, Matthias Klusch

TiKMiX: Take Data Influence into Dynamic Mixture for Language Model
Pre-training

Yifan Wang, Binbin Liu, Fengze Liu, et al.

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

Shunian Chen, Hejin Huang, Yexin Liu, et al.

Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

Video Understanding

Xiaochuan Li, Guoguang Du, Runze Zhang, et al.

A.S.E: A Repository-Level Benchmark for Evaluating Security in
AI-Generated Code

Code Generation

Keke Lian, Bin Wang, Lei Zhang, et al.

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for
General Robot Control

Embodied Intelligence

Delin Qu, Haoming Song, Qizhi Chen, et al.

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Jie Jiang, Qi Yang, Bolin Ni, et al.

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards

Supervised Fine-Tuning

Preference Modeling

Xiaolong Wei, Bo Lu, Xingyu Zhang, et al.

TMUAD: Enhancing Logical Capabilities in Unified Anomaly Detection Models with a Text Memory Bank

Computer Vision

Image Understanding

Jiawei Liu, Jiahe Hou, Wei Wang, et al.

Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation?

Supervised Fine-Tuning

Samuel Lewis-Lim, Xingwei Tan, Zhixue Zhao, et al.

AWorld: Orchestrating the Training Recipe for Agentic AI

Chengyue Yu, Siyuan Lu, Chenyi Zhuang, et al.

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World
Tasks via MCP Servers

Zhenting Wang, Qi Chang, Hemani Patel, et al.

rStar2-Agent: Agentic Reasoning Technical Report

Reinforcement Learning

Ning Shang, Yifei Liu, Yi Zhu, et al.

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning

Preference Modeling

Yibin Wang, Zhimin Li, Yuhang Zang, et al.

MobileCLIP2: Improving Multi-Modal Reinforced Training

Image Captioning

Fartash Faghri, Pavan Kumar Anasosalu Vasu, Cem Koc, et al.

AI-AI Esthetic Collaboration with Explicit Semiotic Awareness and Emergent Grammar Development

Artificial Intelligence

Natural Language Processing

Nicanor I. Moldovan

Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health
Biomarkers Estimation

Computer Vision

Video Understanding

Konstantin Egorov, Stepan Botman, Pavel Blinov, et al.

Predicting the Order of Upcoming Tokens Improves Language Modeling

Zayd M. K. Zuhri, Erland Hilman Fuadi, Alham Fikri Aji

MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time
Autoregressive Video Generation

Ming Chen, Liyuan Cui, Wenyuan Zhang, et al.

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding
in Vision-Language-Action Policies

Diffusion Model

Zhixuan Liang, Yizhuo Li, Tianshuo Yang, et al.

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$ -bench

UI-Level Evaluation of ALLaM 34B: Measuring an Arabic-Centric LLM via HUMAIN Chat

From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

UQ: Assessing Language Models on Unsolved Questions

CARJAN: Agent-Based Generation and Simulation of Traffic Scenarios with AJAN

TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards

TMUAD: Enhancing Logical Capabilities in Unified Anomaly Detection Models with a Text Memory Bank

Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation?

AWorld: Orchestrating the Training Recipe for Agentic AI

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

rStar2-Agent: Agentic Reasoning Technical Report

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

MobileCLIP2: Improving Multi-Modal Reinforced Training

AI-AI Esthetic Collaboration with Explicit Semiotic Awareness and Emergent Grammar Development

Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation

Predicting the Order of Upcoming Tokens Improves Language Modeling

MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$ -bench

UI-Level Evaluation of ALLaM 34B: Measuring an Arabic-Centric LLM via HUMAIN Chat

From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

UQ: Assessing Language Models on Unsolved Questions

CARJAN: Agent-Based Generation and Simulation of Traffic Scenarios with AJAN

TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards

TMUAD: Enhancing Logical Capabilities in Unified Anomaly Detection Models with a Text Memory Bank

Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation?

AWorld: Orchestrating the Training Recipe for Agentic AI

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

rStar2-Agent: Agentic Reasoning Technical Report

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

MobileCLIP2: Improving Multi-Modal Reinforced Training

AI-AI Esthetic Collaboration with Explicit Semiotic Awareness and Emergent Grammar Development

Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation

Predicting the Order of Upcoming Tokens Improves Language Modeling

MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies