2 个月前

BERT 获得了一个日期：将 Transformers 引入时间标注

Satya Almasian Dennis Aumiller Michael Gertz

摘要

时间表达式在文本理解中扮演着重要角色，准确识别这些表达式是各类信息检索与自然语言处理系统的基础。以往的研究逐渐从基于规则的方法转向神经网络架构，后者在时间表达式标注任务中展现出更高的准确率。然而，当前的神经模型在区分不同类型时间表达式方面，仍难以达到传统规则方法的水平。本文旨在识别最适合联合进行时间表达式标注与类型分类的Transformer架构，并探讨半监督训练对系统性能的影响。基于对多种token分类变体及编码器-解码器架构的深入研究，我们提出了一种基于RoBERTa语言模型的Transformer编码器-解码器模型，该模型在各项指标中表现最优。通过引入基于规则系统生成的弱标注数据作为补充训练资源，我们的模型在时间表达式标注与类型分类任务上均超越了现有方法，尤其在罕见类别上的表现显著提升。相关代码与预训练实验结果已公开，地址为：https://github.com/satya77/Transformer_Temporal_Tagger。

代码仓库

satya77/Transformer_Temporal_Tagger

官方

pytorch

GitHub 中提及

基准测试

基准	方法	指标
temporal-tagging-on-tempeval-3	BERT-base	Strict Detection (Pr.): 81.83 Strict Detection (Re.): 79.56 Relaxed Detection (F1): 90.08 Relaxed Detection (Pr.): 91.37 Relaxed Detection (Re.): 88.84 Strict Detection (F1): 80.67 Type: 82.00
temporal-tagging-on-tempeval-3	B2B	Strict Detection (Pr.): 94.11 Strict Detection (Re.): 81.01 Relaxed Detection (F1): 92.52 Relaxed Detection (Pr.): 100 Relaxed Detection (Re.): 86.09 Strict Detection (F1): 87.07 Type: 83.79
temporal-tagging-on-tempeval-3	DateBERT	Strict Detection (Pr.): 82.72 Strict Detection (Re.): 85.79 Relaxed Detection (F1): 92.60 Relaxed Detection (Pr.): 90.95 Relaxed Detection (Re.): 94.35 Strict Detection (F1): 84.21 Type: 86.21
temporal-tagging-on-tempeval-3	R2R	Strict Detection (Pr.): 96.37 Strict Detection (Re.): 96.37 Relaxed Detection (F1): 100 Relaxed Detection (Pr.): 100 Relaxed Detection (Re.): 100 Strict Detection (F1): 96.37 Type: 90.43

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供