Command Palette
Search for a command to run...
Liu Chang ; Zhong Yujie ; Zisserman Andrew ; Xie Weidi

Abstract
In this paper, we consider the problem of generalised visual object counting,with the goal of developing a computational model for counting the number ofobjects from arbitrary semantic categories, using arbitrary number of"exemplars", i.e. zero-shot or few-shot counting. To this end, we make thefollowing four contributions: (1) We introduce a novel transformer-basedarchitecture for generalised visual object counting, termed as CountingTransformer (CounTR), which explicitly capture the similarity between imagepatches or with given "exemplars" with the attention mechanism;(2) We adopt atwo-stage training regime, that first pre-trains the model with self-supervisedlearning, and followed by supervised fine-tuning;(3) We propose a simple,scalable pipeline for synthesizing training images with a large number ofinstances or that from different semantic categories, explicitly forcing themodel to make use of the given "exemplars";(4) We conduct thorough ablationstudies on the large-scale counting benchmark, e.g. FSC-147, and demonstratestate-of-the-art performance on both zero and few-shot settings.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| exemplar-free-counting-on-fsc147 | CounTR | MAE(test): 14.71 MAE(val): 18.07 RMSE(test): 106.87 RMSE(val): 71.84 |
| object-counting-on-carpk | CounTR | MAE: 5.75 RMSE: 7.45 |
| object-counting-on-fsc147 | CounTR | MAE(test): 11.95 MAE(val): 13.13 RMSE(test): 91.23 RMSE(val): 49.83 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.