HyperAIHyperAI

Command Palette

Search for a command to run...

FlexAttention

Date

a year ago

FlexAttention is a new API made public by the PyTorch team in July 2024 that provides a flexible interface that allows implementing many attention variants in a few lines of typical PyTorch code and using torch.compile It is reduced to a fused FlashAttention kernel, thus providing flexibility without sacrificing performance.FlexAttention for Efficient High-Resolution Vision-Language Models", has been accepted by ECCV 2024.

FlexAttention is a flexible attention mechanism designed to improve the efficiency of high-resolution visual language models. The mechanism significantly reduces the computational cost by encoding high-resolution and low-resolution image tags and computing the attention map using only the low-resolution tags and a few selected high-resolution tags. The selection of high-resolution tags is performed by a high-resolution selection module that can retrieve tags for relevant regions based on the input attention map. The selected high-resolution tags are then input to the hierarchical self-attention layer together with the low-resolution tags and text tags, and the attention map generated by this layer is used for the next step of high-resolution tag selection. This process is iterated at each attention layer. Experiments show that FlexAttention outperforms existing high-resolution visual language models on multimodal benchmarks while significantly reducing the computational cost by nearly 40%.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
FlexAttention | Wiki | HyperAI