Command Palette
Search for a command to run...
REFRAG Decoding Framework
REFRAG was proposed by Meta Superintelligence Labs in collaboration with the National University of Singapore and Rice University in September 2025. The relevant research results were published in the paper “REFRAG: Rethinking RAG based Decoding".
REFRAG is an efficient decoding framework that improves latency for Retrieval-Augmented Generation (RAG) applications through compression, perception, and expansion. REFRAG introduces several innovative improvements to the decoding process: instead of using tokens from retrieved passages as input, it leverages pre-computed and compressed segment embeddings as approximate representations, feeding these embeddings directly into the decoder. As a result, REFRAG minimizes reliance on computationally intensive token embeddings, allowing most query blocks to be compressed in the RAG setting.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.