Command Palette
Search for a command to run...
Yuhui Yuan Rao Fu Lang Huang Weihong Lin Chao Zhang Xilin Chen Jingdong Wang

Abstract
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost. We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet), along with local-window self-attention that performs self-attention over small non-overlapping image windows, for improving the memory and computation efficiency. In addition, we introduce a convolution into the FFN to exchange information across the disconnected image windows. We demonstrate the effectiveness of the High-Resolution Transformer on both human pose estimation and semantic segmentation tasks, e.g., HRFormer outperforms Swin transformer by $1.3$ AP on COCO pose estimation with $50\%$ fewer parameters and $30\%$ fewer FLOPs. Code is available at: https://github.com/HRNet/HRFormer.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-classification-on-imagenet | HRFormer-B | GFLOPs: 13.7 Number of params: 50.3M Top 1 Accuracy: 82.8% |
| image-classification-on-imagenet | HRFormer-T | GFLOPs: 1.8 Number of params: 8.0M Top 1 Accuracy: 78.5% |
| multi-person-pose-estimation-on-crowdpose | HRFormer-B | AP Easy: 80.0 AP Hard: 62.4 AP Medium: 73.5 mAP @0.5:0.95: 72.4 |
| multi-person-pose-estimation-on-ochuman | HRFormer-B | AP50: 81.4 AP75: 67.1 Validation AP: 62.1 |
| pose-estimation-on-aic | HRFormer (HRFomer-S) | AP: 31.6 AP75: 20.9 AR: 35.8 AR50: 78.0 |
| pose-estimation-on-aic | HRFormer (HRFomer-B) | AP: 34.4 AP50: 78.3 AP75: 24.8 AR: 38.7 AR50: 80.9 |
| pose-estimation-on-coco-test-dev | HRFormer-B | AP: 76.2 AP50: 92.7 AP75: 83.8 APL: 82.3 APM: 72.5 AR: 81.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.