4 个月前

面向通用对象的基础模型在大规模图像和视频中的应用

面向通用对象的基础模型在大规模图像和视频中的应用

摘要

在本工作中,我们介绍了GLEE,这是一种用于图像和视频中定位和识别对象的对象级基础模型。通过统一的框架,GLEE能够在开放世界场景中完成检测、分割、跟踪、定位和识别任意对象的各种对象感知任务。采用连贯的学习策略,GLEE从具有不同监督水平的多样化数据源中获取知识,形成通用的对象表示,从而在零样本迁移至新数据和新任务时表现出色。具体而言,我们使用了图像编码器、文本编码器和视觉提示器来处理多模态输入,能够在保持最先进性能的同时解决各种以对象为中心的下游任务。经过对来自多个基准测试集的超过五百万张图像的广泛训练,GLEE展示了出色的多功能性和改进的泛化性能,能够高效地应对下游任务而无需进行特定任务的适应。通过整合大量自动标注的数据,我们进一步增强了其零样本泛化能力。此外,GLEE可以集成到大型语言模型中,作为基础模型为多模态任务提供通用的对象级信息。我们希望该方法的多功能性和通用性将在开发适用于AGI系统的高效视觉基础模型方面迈出重要一步。模型和代码将在https://glee-vision.github.io 发布。

代码仓库

FoundationVision/GLEE
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
instance-segmentation-on-cocoGLEE-Lite
mask AP: 48.3
instance-segmentation-on-cocoGLEE-Plus
mask AP: 53.3
instance-segmentation-on-cocoGLEE-Pro
mask AP: 54.5
instance-segmentation-on-coco-minivalGLEE-Pro
mask AP: 54.2
instance-segmentation-on-coco-minivalGLEE-Plus
mask AP: 53.0
instance-segmentation-on-coco-minivalGLEE-Lite
mask AP: 48.4
instance-segmentation-on-lvis-v1-0-valGLEE-Pro
mask AP: 49.9
long-tail-video-object-segmentation-on-burstGLEE-Lite
HOTA (all): 22.6
HOTA (com): 36.4
HOTA (unc): 19.1
mAP (all): 12.6
mAP (com): 18.9
mAP (unc): 11.0
long-tail-video-object-segmentation-on-burst-1GLEE-Lite
HOTA (all): 22.6
HOTA (com): 36.4
HOTA (unc): 19.1
mAP (all): 12.6
mAP (com): 18.9
mAP (unc): 11.0
long-tail-video-object-segmentation-on-burst-1GLEE-Pro
HOTA (all): 31.2
HOTA (com): 48.7
HOTA (unc): 26.9
mAP (all): 19.2
mAP (com): 24.8
mAP (unc): 17.7
long-tail-video-object-segmentation-on-burst-1GLEE-Plus
HOTA (all): 26.9
HOTA (com): 38.8
HOTA (unc): 23.9
mAP (all): 17.2
mAP (com): 23.7
mAP (unc): 15.5
multi-object-tracking-on-taoGLEE-Lite
AssocA: 39.9
ClsA: 24.1
LocA: 56.3
TETA: 40.1
multi-object-tracking-on-taoGLEE-Plus
AssocA: 40.9
ClsA: 30.8
LocA: 52.9
TETA: 41.5
multi-object-tracking-on-taoGLEE-Pro
AssocA: 46.2
ClsA: 29.1
LocA: 66.2
TETA: 47.2
object-detection-on-cocoGLEE-Lite
box mAP: 54.7
object-detection-on-cocoGLEE-Pro
box mAP: 62.3
object-detection-on-cocoGLEE-Plus
box mAP: 60.6
object-detection-on-coco-minivalGLEE-Pro
box AP: 62.0
object-detection-on-coco-minivalGLEE-Lite
box AP: 55.0
object-detection-on-coco-minivalGLEE-Plus
box AP: 60.4
object-detection-on-lvis-v1-0-valGLEE-Pro
box AP: 55.7
open-world-instance-segmentation-on-uvoGLEE-Pro
ARmask: 72.6
referring-expression-segmentation-on-refcocoGLEE-Pro
Overall IoU: 80.0
referring-expression-segmentation-on-refcoco-3GLEE-Pro
Overall IoU: 69.6
referring-expression-segmentation-on-refcoco-6GLEE-Pro
IoU: 80.0
referring-expression-segmentation-on-refcocogGLEE-Pro
Overall IoU: 72.9
referring-expression-segmentation-on-refer-1GLEE-Pro
F: 72.9
J: 68.2
Ju0026F: 70.6
referring-video-object-segmentation-on-referGLEE-Plus
F: 69.7
J: 65.6
Ju0026F: 67.7
referring-video-object-segmentation-on-referGLEE-Pro
F: 72.9
J: 68.2
Ju0026F: 70.6
video-instance-segmentation-on-ovis-1GLEE-Pro
AP75: 55.5
mask AP: 50.4

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供