Command Palette
Search for a command to run...
Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence
Hong Sunghwan ; Cho Seokju ; Kim Seungryong ; Lin Stephen

Abstract
We present a novel architecture for dense correspondence. The currentstate-of-the-art are Transformer-based approaches that focus on either featuredescriptors or cost volume aggregation. However, they generally aggregate oneor the other but not both, though joint aggregation would boost each other byproviding information that one has but other lacks, i.e., structural orsemantic information of an image, or pixel-wise matching similarity. In thiswork, we propose a novel Transformer-based network that interleaves both formsof aggregations in a way that exploits their complementary information.Specifically, we design a self-attention layer that leverages the descriptor todisambiguate the noisy cost volume and that also utilizes the cost volume toaggregate features in a manner that promotes accurate matching. A subsequentcross-attention layer performs further aggregation conditioned on thedescriptors of both images and aided by the aggregated outputs of earlierlayers. We further boost the performance with hierarchical processing, in whichcoarser level aggregations guide those at finer levels. We evaluate theeffectiveness of the proposed method on dense matching tasks and achievestate-of-the-art performance on all the major benchmarks. Extensive ablationstudies are also provided to validate our design choices.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| geometric-matching-on-hpatches | IFCAT (Ours) | Average End-Point Error: 17.59 |
| semantic-correspondence-on-spair-71k | IFCAT | PCK: 64.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.