HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
自然语言视觉定位
Natural Language Visual Grounding On
Natural Language Visual Grounding On
评估指标
Accuracy (%)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy (%)
Paper Title
Repository
UGround-V1-7B
86.34
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Aguvis-7B
83.0
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
OS-Atlas-Base-7B
82.47
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Aria-UI
81.1
Aria-UI: Visual Grounding for GUI Instructions
Aguvis-G-7B
81.0
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
UGround-V1-2B
77.67
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
ShowUI
75.1
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
ShowUI-G
75.0
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
UGround
73.3
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
OmniParser
73.0
OmniParser for Pure Vision Based GUI Agent
OS-Atlas-Base-4B
68.0
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
SeeClick
53.4
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
CogAgent
47.4
CogAgent: A Visual Language Model for GUI Agents
Qwen2-VL-7B
42.1
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Qwen-GUI
28.6
GUICourse: From General Vision Language Models to Versatile GUI Agents
MiniGPT-v2
5.7
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Groma
5.2
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Qwen-VL
5.2
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
0 of 18 row(s) selected.
Previous
Next
Natural Language Visual Grounding On | SOTA | HyperAI超神经