Command Palette
Search for a command to run...
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification
Hao Luo Pichao Wang Yi Xu Feng Ding Yanxin Zhou Fan Wang Hao Li Rong Jin

Abstract
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID). However, due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset (e.g. ImageNet-21K) to boost the performance because of the strong data fitting ability of the transformer. To address this challenge, this work targets to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure, respectively. We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks. To further reduce the domain gap and accelerate the pre-training, the Catastrophic Forgetting Score (CFS) is proposed to evaluate the gap between pre-training and fine-tuning data. Based on CFS, a subset is selected via sampling relevant data close to the down-stream ReID data and filtering irrelevant data from the pre-training dataset. For the model structure, a ReID-specific module named IBN-based convolution stem (ICS) is proposed to bridge the domain gap by learning more invariant features. Extensive experiments have been conducted to fine-tune the pre-training models under supervised learning, unsupervised domain adaptation (UDA), and unsupervised learning (USL) settings. We successfully downscale the LUPerson dataset to 50% with no performance degradation. Finally, we achieve state-of-the-art performance on Market-1501 and MSMT17. For example, our ViT-S/16 achieves 91.3%/89.9%/89.6% mAP accuracy on Market1501 for supervised/UDA/USL ReID. Codes and models will be released to https://github.com/michuanhaohao/TransReID-SSL.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| person-re-identification-on-market-1501 | TransReID-SSL (ViT-B w/o RK) | Rank-1: 96.7 mAP: 93.2 |
| person-re-identification-on-msmt17 | TransReID-SSL (without RK) | Rank-1: 89.6 |
| person-re-identification-on-msmt17 | TransReID-SSL (ViT-B without RK) | Rank-1: 89.5 mAP: 75.0 |
| unsupervised-person-re-identification-on-12 | TransReID-SSL (ViT-S) | Rank-1: 66.4 mAP: 40.9 |
| unsupervised-person-re-identification-on-12 | TransReID-SSL (ViTi-S) | Rank-1: 75 mAP: 50.6 |
| unsupervised-person-re-identification-on-4 | TransReID-SSL (ViTi-S) | MAP: 89.6 Rank-1: 95.3 |
| unsupervised-person-re-identification-on-4 | TransReID-SSL (ViT-S) | MAP: 88.2 Rank-1: 94.2 |
| unsupervised-person-re-identification-on-4 | TransReID-SSL (ViT-S w/o RK) | Rank-1: 95.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.