5 months ago

Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence

Hong Sunghwan ; Cho Seokju ; Kim Seungryong ; Lin Stephen

Abstract

We present a novel architecture for dense correspondence. The currentstate-of-the-art are Transformer-based approaches that focus on either featuredescriptors or cost volume aggregation. However, they generally aggregate oneor the other but not both, though joint aggregation would boost each other byproviding information that one has but other lacks, i.e., structural orsemantic information of an image, or pixel-wise matching similarity. In thiswork, we propose a novel Transformer-based network that interleaves both formsof aggregations in a way that exploits their complementary information.Specifically, we design a self-attention layer that leverages the descriptor todisambiguate the noisy cost volume and that also utilizes the cost volume toaggregate features in a manner that promotes accurate matching. A subsequentcross-attention layer performs further aggregation conditioned on thedescriptors of both images and aided by the aggregated outputs of earlierlayers. We further boost the performance with hierarchical processing, in whichcoarser level aggregations guide those at finer levels. We evaluate theeffectiveness of the proposed method on dense matching tasks and achievestate-of-the-art performance on all the major benchmarks. Extensive ablationstudies are also provided to validate our design choices.

Benchmarks

Benchmark	Methodology	Metrics
geometric-matching-on-hpatches	IFCAT (Ours)	Average End-Point Error: 17.59
semantic-correspondence-on-spair-71k	IFCAT	PCK: 64.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning