HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Tracking by Natural Language Specification

{Arnold W. M. Smeulders Efstratios Gavves Zhenyang Li Ran Tao Cees G. M. Snoek}

Tracking by Natural Language Specification

Abstract

This paper strives to track a target object in a video. Rather than specifying the target in the first frame of a video by a bounding box, we propose to track the object based on a natural language specification of the target, which provides a more natural human-machine interaction as well as a means to improve tracking results. We define three variants of tracking by language specification: one relying on lingual target specification only, one relying on visual target specification based on language, and one leveraging their joint capacity. To show the potential of tracking by natural language specification we extend two popular tracking datasets with lingual descriptions and report experiments. Finally, we also sketch new tracking scenarios in surveillance and other live video streams that become feasible with a lingual specification of the target.

Benchmarks

BenchmarkMethodologyMetrics
referring-expression-segmentation-on-a2dLi et al.
AP: 0.163
IoU mean: 0.354
IoU overall: 0.515
Precision@0.5: 0.387
Precision@0.6: 0.290
Precision@0.7: 0.175
Precision@0.8: 0.066
Precision@0.9: 0.001
referring-expression-segmentation-on-j-hmdbLi et al.
AP: 0.173
IoU mean: 0.491
IoU overall: 0.529
Precision@0.5: 0.578
Precision@0.6: 0.335
Precision@0.7: 0.103
Precision@0.8: 0.060
Precision@0.9: 0.000

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Tracking by Natural Language Specification | Papers | HyperAI