Command Palette
Search for a command to run...
Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners
Ningyu Zhang Luoqiu Li Xiang Chen Shumin Deng Zhen Bi Chuanqi Tan Fei Huang Huajun Chen

Abstract
Large-scale pre-trained language models have contributed significantly to natural language processing by demonstrating remarkable abilities as few-shot learners. However, their effectiveness depends mainly on scaling the model parameters and prompt design, hindering their implementation in most real-world applications. This study proposes a novel pluggable, extensible, and efficient approach named DifferentiAble pRompT (DART), which can convert small language models into better few-shot learners without any prompt engineering. The main principle behind this approach involves reformulating potential natural language processing tasks into the task of a pre-trained language model and differentially optimizing the prompt template as well as the target label with backpropagation. Furthermore, the proposed approach can be: (i) Plugged to any pre-trained language models; (ii) Extended to widespread classification tasks. A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance. Code is available in https://github.com/zjunlp/DART.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| few-shot-learning-on-cr | DART | Acc: 91.8(0.5) |
| few-shot-learning-on-glue-qqp | DART | F1-score: 67.8(3.2) |
| few-shot-learning-on-mr | DART | Acc: 88.2(1.0) |
| few-shot-learning-on-mrpc | DART | F1-score: 78.3(4.5) |
| few-shot-learning-on-sst-2-binary | DART | Acc: 93.5(0.5) |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.