HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Machine Learning

Liang, Zhiyuan, Tang, et al.

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

Diffusion Model

Aggarwal, Anirud, Shrivastava, et al.

RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation

Xu, Xinnuo, Lawrence, et al.

SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning

Chopra, Anuradha, Roy, et al.

All is Not Lost: LLM Recovery without Checkpoints

Blagoev, Nikolay, Ersoy, et al.

Sundial: A Family of Highly Capable Time Series Foundation Models

Yong Liu, Guo Qin, Zhiyuan Shi, et al.

ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems

Fanzhi Zeng, Siqi Wang, Chuzhao Zhu, et al.

Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction

Code Generation

Chengzhi Xu, Yuyang Wang, Lai Wei, et al.

Show-o2: Improved Native Unified Multimodal Models

Multimodal Representation

Jinheng Xie, Zhenheng Yang, Mike Zheng Shou

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Reinforcement Learning

Zhoujun Cheng, Shibo Hao, Tianyang Liu, et al.

Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models

Medical Imaging

Ulzee An, Moonseong Jeong, Simon Austin Lee, et al.

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech
Emotion Detection

Emotion Recognition

Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby, et al.

s1: Simple test-time scaling

Supervised Fine-Tuning

Niklas Muennighoff, Zitong Yang, Weijia Shi, et al.

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Retrieval-Augmented Generation

Xiaoxi Li, Guanting Dong, Jiajie Jin, et al.

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One
Vision Token

Shaolei Zhang, Qingkai Fang, Zhe Yang, et al.

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at
Scale

Jarvis Guo, Tuney Zheng, Yuelin Bai, et al.

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Image Understanding

Kevin Qinghong Lin, Linjie Li, Difei Gao, et al.

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, et al.

GPT-4o System Card

OpenAI, Aaron Hurst, Adam Lerer, et al.

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a
Training-Free Memory Tree

Semantic Segmentation

Object Tracking

Shuangrui Ding, Rui Qian, Xiaoyi Dong, et al.

Aria: An Open Multimodal Native Mixture-of-Experts Model

Dongxu Li, Yudong Liu, Haoning Wu, et al.

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution

Peng Wang, Shuai Bai, Sinan Tan, et al.

VGGT: Visual Geometry Grounded Transformer

3D Machine Vision

Depth Estimation

Jianyuan Wang, Minghao Chen, Nikita Karaev, et al.

Multi-Turn Code Generation Through Single-Step Rewards

Code Generation

Reinforcement Learning

Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, et al.

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated
Agent Intelligence

Embodied Intelligence

Yining Hong, Rui Sun, Bingxuan Li, et al.

Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form
Generation

Preference Modeling

Text Generation

Zongxia Li, Yapei Chang, Yuhang Zhou, et al.

BUT System for the MLC-SLM Challenge

Audio and Speech Processing

Multi-Task Learning

Alexander Polok, Jiangyu Han, Dominik Klement, et al.

GenRecal: Generation after Recalibration from Large to Small
Vision-Language Models

Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro, et al.

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs

Feng He, Zijun Chen, Xinnian Liang, et al.

Sekai: A Video Dataset towards World Exploration

Video Understanding

Video Captioning

Zhen Li, Chuanhao Li, Xiaofeng Mao, et al.

Data-driven material screening of secondary and natural cementitious precursors

Soroush Mahjoubi, Vineeth Venugopal, Ipek Bensu Manav, et al.

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Machine Learning

Liang, Zhiyuan, Tang, et al.

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

Diffusion Model

Aggarwal, Anirud, Shrivastava, et al.

RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation

Xu, Xinnuo, Lawrence, et al.

SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning

Chopra, Anuradha, Roy, et al.

All is Not Lost: LLM Recovery without Checkpoints

Blagoev, Nikolay, Ersoy, et al.

Sundial: A Family of Highly Capable Time Series Foundation Models

Yong Liu, Guo Qin, Zhiyuan Shi, et al.

ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems

Fanzhi Zeng, Siqi Wang, Chuzhao Zhu, et al.

Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction

Code Generation

Chengzhi Xu, Yuyang Wang, Lai Wei, et al.

Show-o2: Improved Native Unified Multimodal Models

Multimodal Representation

Jinheng Xie, Zhenheng Yang, Mike Zheng Shou

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Reinforcement Learning

Zhoujun Cheng, Shibo Hao, Tianyang Liu, et al.

Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models

Medical Imaging

Ulzee An, Moonseong Jeong, Simon Austin Lee, et al.

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech
Emotion Detection

Emotion Recognition

Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby, et al.

s1: Simple test-time scaling

Supervised Fine-Tuning

Niklas Muennighoff, Zitong Yang, Weijia Shi, et al.

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Retrieval-Augmented Generation

Xiaoxi Li, Guanting Dong, Jiajie Jin, et al.

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One
Vision Token

Shaolei Zhang, Qingkai Fang, Zhe Yang, et al.

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at
Scale

Jarvis Guo, Tuney Zheng, Yuelin Bai, et al.

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Image Understanding

Kevin Qinghong Lin, Linjie Li, Difei Gao, et al.

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, et al.

GPT-4o System Card

OpenAI, Aaron Hurst, Adam Lerer, et al.

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a
Training-Free Memory Tree

Semantic Segmentation

Object Tracking

Shuangrui Ding, Rui Qian, Xiaoyi Dong, et al.

Aria: An Open Multimodal Native Mixture-of-Experts Model

Dongxu Li, Yudong Liu, Haoning Wu, et al.

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution

Peng Wang, Shuai Bai, Sinan Tan, et al.

VGGT: Visual Geometry Grounded Transformer

3D Machine Vision

Depth Estimation

Jianyuan Wang, Minghao Chen, Nikita Karaev, et al.

Multi-Turn Code Generation Through Single-Step Rewards

Code Generation

Reinforcement Learning

Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, et al.

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated
Agent Intelligence

Embodied Intelligence

Yining Hong, Rui Sun, Bingxuan Li, et al.

Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form
Generation

Preference Modeling

Text Generation

Zongxia Li, Yapei Chang, Yuhang Zhou, et al.

BUT System for the MLC-SLM Challenge

Audio and Speech Processing

Multi-Task Learning

Alexander Polok, Jiangyu Han, Dominik Klement, et al.

GenRecal: Generation after Recalibration from Large to Small
Vision-Language Models

Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro, et al.

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs

Feng He, Zijun Chen, Xinnian Liang, et al.

Sekai: A Video Dataset towards World Exploration

Video Understanding

Video Captioning

Zhen Li, Chuanhao Li, Xiaofeng Mao, et al.

Data-driven material screening of secondary and natural cementitious precursors

Soroush Mahjoubi, Vineeth Venugopal, Ipek Bensu Manav, et al.

RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation

SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning

All is Not Lost: LLM Recovery without Checkpoints

Sundial: A Family of Highly Capable Time Series Foundation Models

ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems

Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction

Show-o2: Improved Native Unified Multimodal Models

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

s1: Simple test-time scaling

Search-o1: Agentic Search-Enhanced Large Reasoning Models

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

GPT-4o System Card

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Aria: An Open Multimodal Native Mixture-of-Experts Model

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

VGGT: Visual Geometry Grounded Transformer

Multi-Turn Code Generation Through Single-Step Rewards

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation

BUT System for the MLC-SLM Challenge

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

Sekai: A Video Dataset towards World Exploration

Data-driven material screening of secondary and natural cementitious precursors

RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation

SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning

All is Not Lost: LLM Recovery without Checkpoints

Sundial: A Family of Highly Capable Time Series Foundation Models

ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems

Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction

Show-o2: Improved Native Unified Multimodal Models

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

s1: Simple test-time scaling

Search-o1: Agentic Search-Enhanced Large Reasoning Models

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

GPT-4o System Card

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Aria: An Open Multimodal Native Mixture-of-Experts Model

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

VGGT: Visual Geometry Grounded Transformer

Multi-Turn Code Generation Through Single-Step Rewards

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation

BUT System for the MLC-SLM Challenge

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

Sekai: A Video Dataset towards World Exploration

Data-driven material screening of secondary and natural cementitious precursors