HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, et al.

TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

Supervised Fine-Tuning

Aditya Tanna, Pratinav Seth, Mohamed Bouadi, et al.

Step-Audio-EditX Technical Report

Chao Yan, Boyong Wu, Peng Yang, et al.

LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied
Environments with Tool Augmentation

Gyeom Hwangbo, Hyungjoo Chae, Minseok Kang, et al.

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal
Interactions

Guozhen Zhang, Zixiang Zhou, Teng Hu, et al.

Diffusion Language Models are Super Data Learners

Natural Language Processing

Jinjie Ni, Qian Liu, Longxu Dou, et al.

UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models

Chen Chen, ZeYang Hu, Fengjiao Chen, et al.

Dynamic Population Distribution Aware Human Trajectory Generation with Diffusion Model

Diffusion Model

Qingyue Long, Can Rong, Tong Li, et al.

Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models

Visual Question Answering

Alexander Htet Kyaw, Richa Gupta, Dhruv Shah, et al.

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, et al.

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length
Regularizers in Math RLVR

Abdelaziz Bounhar, Hadi Abdine, Evan Dufraisse, et al.

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Computer Vision

Roman Beliy, Amit Zalcher, Jonathan Kogman, et al.

When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs
Preference Dynamics in MLLMs

Visual Question Answering

Zhuoran Zhang, Tengyue Wang, Xilin Gong, et al.

Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

Multimodal Representation

Nikita Kachaev, Mikhail Kolosov, Daniil Zelezetsky, et al.

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for
Visual Chain-of-Thought

Yiyang Zhou, Haoqin Tu, Zijun Wang, et al.

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual
Representation

Code Generation

Kevin Qinghong Lin, Yuhao Zheng, Hangyu Ran, et al.

The AI Productivity Index (APEX)

Bertie Vidgen, Abby Fennelly, Evan Pinnix, et al.

Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning

Video Understanding

Visual Question Answering

Sara Ghazanfari, Francesco Croce, Nicolas Flammarion, et al.

Towards Robust Mathematical Reasoning

Thang Luong, Dawsen Hwang, Hoang H. Nguyen, et al.

Towards a future space-based, highly scalable AI infrastructure system design

High-Performance Computing

Blaise Agüera y Arcas, Travis Beals, Maria Biggs, et al.

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

Kyungmin Lee, Sibeen Kim, Minho Park, et al.

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Feng Han, Yibin Wang, Chenglin Li, et al.

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

Fali Wang, Jihai Chen, Shuhua Yang, et al.

UniLumos: Fast and Unified Image and Video Relighting with
Physics-Plausible Feedback

Diffusion Model

Depth Estimation

Ropeway Liu, Hangjie Yuan, Bo Dong, et al.

The Underappreciated Power of Vision Models for Graph Structural Understanding

Computer Vision

Xinjian Zhao, Wei Pang, Zhongkai Xue, et al.

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

Ling-Team, Ang Li, Ben Liu, et al.

NOBLE - Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models

Luca Ghafourpour, Valentin Duruisseaux, Bahareh Tolooshams, et al.

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Pouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, et al.

Context Engineering 2.0: The Context of Context Engineering

Artificial Intelligence

Qishuo Hua, Lyumanshan Ye, Dayuan Fu, et al.

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised
Reinforcement Learning

Image Understanding

Computer Vision

Yuhong Liu, Beichen Zhang, Yuhang Zang, et al.

Continuous Autoregressive Language Models

Text Generation

Chenze Shao, Darren Li, Fandong Meng, et al.

π𝚁𝙻: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Reinforcement Learning

Supervised Fine-Tuning

Kang Chen, Zhihao Liu, Tonghe Zhang, et al.

Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, et al.

TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

Supervised Fine-Tuning

Aditya Tanna, Pratinav Seth, Mohamed Bouadi, et al.

Step-Audio-EditX Technical Report

Chao Yan, Boyong Wu, Peng Yang, et al.

LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied
Environments with Tool Augmentation

Gyeom Hwangbo, Hyungjoo Chae, Minseok Kang, et al.

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal
Interactions

Guozhen Zhang, Zixiang Zhou, Teng Hu, et al.

Diffusion Language Models are Super Data Learners

Natural Language Processing

Jinjie Ni, Qian Liu, Longxu Dou, et al.

UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models

Chen Chen, ZeYang Hu, Fengjiao Chen, et al.

Dynamic Population Distribution Aware Human Trajectory Generation with Diffusion Model

Diffusion Model

Qingyue Long, Can Rong, Tong Li, et al.

Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models

Visual Question Answering

Alexander Htet Kyaw, Richa Gupta, Dhruv Shah, et al.

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, et al.

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length
Regularizers in Math RLVR

Abdelaziz Bounhar, Hadi Abdine, Evan Dufraisse, et al.

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Computer Vision

Roman Beliy, Amit Zalcher, Jonathan Kogman, et al.

When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs
Preference Dynamics in MLLMs

Visual Question Answering

Zhuoran Zhang, Tengyue Wang, Xilin Gong, et al.

Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

Multimodal Representation

Nikita Kachaev, Mikhail Kolosov, Daniil Zelezetsky, et al.

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for
Visual Chain-of-Thought

Yiyang Zhou, Haoqin Tu, Zijun Wang, et al.

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual
Representation

Code Generation

Kevin Qinghong Lin, Yuhao Zheng, Hangyu Ran, et al.

The AI Productivity Index (APEX)

Bertie Vidgen, Abby Fennelly, Evan Pinnix, et al.

Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning

Video Understanding

Visual Question Answering

Sara Ghazanfari, Francesco Croce, Nicolas Flammarion, et al.

Towards Robust Mathematical Reasoning

Thang Luong, Dawsen Hwang, Hoang H. Nguyen, et al.

Towards a future space-based, highly scalable AI infrastructure system design

High-Performance Computing

Blaise Agüera y Arcas, Travis Beals, Maria Biggs, et al.

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

Kyungmin Lee, Sibeen Kim, Minho Park, et al.

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Feng Han, Yibin Wang, Chenglin Li, et al.

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

Fali Wang, Jihai Chen, Shuhua Yang, et al.

UniLumos: Fast and Unified Image and Video Relighting with
Physics-Plausible Feedback

Diffusion Model

Depth Estimation

Ropeway Liu, Hangjie Yuan, Bo Dong, et al.

The Underappreciated Power of Vision Models for Graph Structural Understanding

Computer Vision

Xinjian Zhao, Wei Pang, Zhongkai Xue, et al.

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

Ling-Team, Ang Li, Ben Liu, et al.

NOBLE - Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models

Luca Ghafourpour, Valentin Duruisseaux, Bahareh Tolooshams, et al.

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Pouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, et al.

Context Engineering 2.0: The Context of Context Engineering

Artificial Intelligence

Qishuo Hua, Lyumanshan Ye, Dayuan Fu, et al.

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised
Reinforcement Learning

Image Understanding

Computer Vision

Yuhong Liu, Beichen Zhang, Yuhang Zang, et al.

Continuous Autoregressive Language Models

Text Generation

Chenze Shao, Darren Li, Fandong Meng, et al.

π𝚁𝙻: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Reinforcement Learning

Supervised Fine-Tuning

Kang Chen, Zhihao Liu, Tonghe Zhang, et al.

Step-Audio-EditX Technical Report

LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Diffusion Language Models are Super Data Learners

UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models

Dynamic Population Distribution Aware Human Trajectory Generation with Diffusion Model

Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models

Kosmos: An AI Scientist for Autonomous Discovery

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs

Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

The AI Productivity Index (APEX)

Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning

Towards Robust Mathematical Reasoning

Towards a future space-based, highly scalable AI infrastructure system design

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

The Underappreciated Power of Vision Models for Graph Structural Understanding

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

NOBLE - Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Context Engineering 2.0: The Context of Context Engineering

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

Continuous Autoregressive Language Models

π𝚁𝙻: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Step-Audio-EditX Technical Report

LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Diffusion Language Models are Super Data Learners

UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models

Dynamic Population Distribution Aware Human Trajectory Generation with Diffusion Model

Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models

Kosmos: An AI Scientist for Autonomous Discovery

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs

Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

The AI Productivity Index (APEX)

Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning

Towards Robust Mathematical Reasoning

Towards a future space-based, highly scalable AI infrastructure system design

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

The Underappreciated Power of Vision Models for Graph Structural Understanding

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

NOBLE - Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Context Engineering 2.0: The Context of Context Engineering

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

Continuous Autoregressive Language Models

π𝚁𝙻: Online RL Fine-tuning for Flow-based Vision-Language-Action Models