HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Continual Pre-training of Language Models

Zixuan Ke Yijia Shao Haowei Lin Tatsuya Konishi Gyuhak Kim Bing Liu

Continual Pre-training of Language Models

Abstract

Language models (LMs) have been instrumental for the rapid advance of natural language processing. This paper studies continual pre-training of LMs, in particular, continual domain-adaptive pre-training (or continual DAP-training). Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in the pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances. Empirical evaluation demonstrates the effectiveness of the proposed method.

Code Repositories

zixuanke/pycontinual
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
continual-pretraining-on-acl-arcDAS
F1 (macro): 0.6936
continual-pretraining-on-sciercDAS
F1 (macro): 0.7093

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Continual Pre-training of Language Models | Papers | HyperAI