4 个月前

深度平衡模型

Shaojie Bai; J. Zico Kolter; Vladlen Koltun

摘要

我们提出了一种新的顺序数据建模方法：深度平衡模型（Deep Equilibrium Model，简称DEQ）。基于对现有许多深度序列模型的隐藏层趋向于某个固定点这一现象的观察，我们提出了DEQ方法，该方法通过求根直接找到这些平衡点。这种方法等价于运行一个无限深度（权重共享）的前馈网络，但具有显著的优势，即可以通过隐式微分在平衡点处进行解析反向传播。利用这种方法，无论网络的有效“深度”如何，训练和预测过程仅需常数级内存。我们展示了如何将DEQ应用于两种最先进的深度序列模型：自注意力变换器（Self-Attention Transformers）和格网网络（Trellis Networks）。在大规模语言建模任务中，如WikiText-103基准测试中，我们证明了DEQ在以下方面表现出色：1）通常在性能上优于这些最先进模型（参数量相似的情况下）；2）计算需求与现有模型相当；3）大幅减少内存消耗（通常是训练大型序列模型的瓶颈），实验表明最多可减少88%的内存使用。代码可在https://github.com/locuslab/deq 获取。

代码仓库

locuslab/impsq

pytorch

GitHub 中提及

cgoemaere/hamdeq

pytorch

GitHub 中提及

martaskrt/qdeq

pytorch

GitHub 中提及

prolearner/hypertorch

pytorch

GitHub 中提及

reacho/deep-equilibrium-vs-bilevel

pytorch

GitHub 中提及

lufanma/ifr

pytorch

GitHub 中提及

locuslab/deq

官方

pytorch

GitHub 中提及

cgoemaere/hopdeq

pytorch

GitHub 中提及

locuslab/monotone_op_net

pytorch

GitHub 中提及

SinclairHudson/DeepEquilibrium

pytorch

GitHub 中提及

sciml/fastdeq.jl

GitHub 中提及

基准测试

基准	方法	指标
language-modelling-on-penn-treebank-word	DEQ-TrellisNet	Params: 24M Test perplexity: 57.1
language-modelling-on-wikitext-103	DEQ-TrellisNet	Number of params: 180M Test perplexity: 29.0
language-modelling-on-wikitext-103	DEQ-Transformer (medium, adaptive embed)	Number of params: 110M Test perplexity: 23.2
language-modelling-on-wikitext-103	DEQ-Transformer (small)	Number of params: 138M Test perplexity: 32.4

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供