8 months ago

Computer Vision

Object Detection

Image Understanding

Computer Vision

Liu Chang ; Zhong Yujie ; Zisserman Andrew ; Xie Weidi

Abstract

In this paper, we consider the problem of generalised visual object counting,with the goal of developing a computational model for counting the number ofobjects from arbitrary semantic categories, using arbitrary number of"exemplars", i.e. zero-shot or few-shot counting. To this end, we make thefollowing four contributions: (1) We introduce a novel transformer-basedarchitecture for generalised visual object counting, termed as CountingTransformer (CounTR), which explicitly capture the similarity between imagepatches or with given "exemplars" with the attention mechanism;(2) We adopt atwo-stage training regime, that first pre-trains the model with self-supervisedlearning, and followed by supervised fine-tuning;(3) We propose a simple,scalable pipeline for synthesizing training images with a large number ofinstances or that from different semantic categories, explicitly forcing themodel to make use of the given "exemplars";(4) We conduct thorough ablationstudies on the large-scale counting benchmark, e.g. FSC-147, and demonstratestate-of-the-art performance on both zero and few-shot settings.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Computer Vision

Object Detection

Image Understanding

Computer Vision

Liu Chang ; Zhong Yujie ; Zisserman Andrew ; Xie Weidi

Abstract

In this paper, we consider the problem of generalised visual object counting,with the goal of developing a computational model for counting the number ofobjects from arbitrary semantic categories, using arbitrary number of"exemplars", i.e. zero-shot or few-shot counting. To this end, we make thefollowing four contributions: (1) We introduce a novel transformer-basedarchitecture for generalised visual object counting, termed as CountingTransformer (CounTR), which explicitly capture the similarity between imagepatches or with given "exemplars" with the attention mechanism;(2) We adopt atwo-stage training regime, that first pre-trains the model with self-supervisedlearning, and followed by supervised fine-tuning;(3) We propose a simple,scalable pipeline for synthesizing training images with a large number ofinstances or that from different semantic categories, explicitly forcing themodel to make use of the given "exemplars";(4) We conduct thorough ablationstudies on the large-scale counting benchmark, e.g. FSC-147, and demonstratestate-of-the-art performance on both zero and few-shot settings.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp