
摘要
解决数学应用题的关键在于如何准确表述问题,即模型看待人类语言表达的视角。在现实世界的应用场景中,这一方法尤为重要,因为同一数学运算可能对应多种不同的实践形式。以往的研究受限于预测策略的单一性,未能充分考虑这些思维过程在数学知识获取中的实际意义。为此,本文提出一种基于注意力机制的思维扩展网络架构(Attention-based THought Expansion Network Architecture,简称 ATHENA),通过模拟人类思维扩展的机制,以神经网络的传播方式应对现实应用中的复杂挑战。该模型通过循环生成蕴含潜在数学表达思路的候选方案,并基于有效路径选择机制,逐步推导出合理且连贯的解题思路。实验结果表明,ATHENA 在面对多样化问题时,显著超越现有方法,达到当前最优性能,即使在训练样本信息有限的情况下,仍展现出强大的泛化能力,逼近理想数学推理模型的目标。
代码仓库
the-jb/athena-math
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| math-word-problem-solving-on-asdiv-a | ATHENA (roberta-base) | Execution Accuracy: 86.4 |
| math-word-problem-solving-on-asdiv-a | ATHENA (roberta-large) | Execution Accuracy: 91 |
| math-word-problem-solving-on-math23k | ATHENA (roberta-base) | Accuracy (training-test): 84.4 |
| math-word-problem-solving-on-math23k | ATHENA (roberta-large) | Accuracy (training-test): 86.5 |
| math-word-problem-solving-on-mawps | ATHENA (roberta-base) | Accuracy (%): 92.2 |
| math-word-problem-solving-on-mawps | ATHENA (roberta-large) | Accuracy (%): 93 |
| math-word-problem-solving-on-svamp | ATHENA (roberta-large) | Execution Accuracy: 54.8 |
| math-word-problem-solving-on-svamp | ATHENA (roberta-base) | Execution Accuracy: 45.6 |
| math-word-problem-solving-on-svamp-1-n | ATHENA (roberta-large) | Execution Accuracy: 67.8 |
| math-word-problem-solving-on-svamp-1-n | ATHENA (roberta-base) | Execution Accuracy: 52.5 |