
摘要
我们提出了一种高效且准确的场景文本检测框架,命名为FAST(即快速任意形状文本检测器)。与近期采用复杂后处理步骤和手工设计网络结构的先进文本检测方法不同,这些方法往往导致推理速度较慢,FAST引入了两项创新设计:(1)我们设计了一种极简的核表示方法(仅输出1通道),用于建模任意形状的文本,并结合GPU并行后处理机制,能够以可忽略的时间开销高效地组装文本行;(2)我们针对文本检测任务进行了网络架构搜索,所获得的网络结构能够提取比大多数为图像分类任务设计的网络更强大的特征表示。得益于上述两项设计,FAST在多个具有挑战性的数据集(包括Total Text、CTW1500、ICDAR 2015和MSRA-TD500)上实现了精度与效率之间的卓越平衡。例如,在Total Text数据集上,FAST-T模型在152 FPS的推理速度下达到了81.6%的F-measure,相比此前最快的方法在精度上提升1.7个百分点,同时速度提升70 FPS。通过TensorRT优化,推理速度可进一步提升至超过600 FPS。代码与模型将公开于 https://github.com/czczup/FAST。
代码仓库
whai362/pan_pp.pytorch
pytorch
GitHub 中提及
czczup/FAST
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| scene-text-detection-on-icdar-2015 | FAST-B-1280 | F-Measure: 87.1 FPS: 15.7 Precision: 89.7 Recall: 84.6 |
| scene-text-detection-on-icdar-2015 | FAST-B-736 | F-Measure: 84.7 FPS: 42.7 Precision: 88.0 Recall: 81.7 |
| scene-text-detection-on-icdar-2015 | FAST-T-736 | F-Measure: 81.7 FPS: 60.9 Precision: 86 Recall: 77.9 |
| scene-text-detection-on-icdar-2015 | FAST-B-896 | F-Measure: 86.3 FPS: 31.8 Precision: 89.2 Recall: 83.6 |
| scene-text-detection-on-icdar-2015 | FAST-S-736 | F-Measure: 82.9 FPS: 53.9 Precision: 86.3 Recall: 79.8 |
| scene-text-detection-on-msra-td500 | FAST-T-512 | F-Measure: 84.5 FPS: 137.2 Precision: 91.1 Recall: 78.8 |
| scene-text-detection-on-msra-td500 | FAST-B-736 | F-Measure: 87.3 FPS: 56.8 Precision: 92.1 Recall: 83 |
| scene-text-detection-on-msra-td500 | FAST-T-736 | F-Measure: 84.9 FPS: 79.6 Precision: 88.1 Recall: 81.9 |
| scene-text-detection-on-msra-td500 | FAST-S-736 | F-Measure: 86.4 FPS: 72 Precision: 91.6 Recall: 81.7 |
| scene-text-detection-on-scut-ctw1500 | FAST-S-512 | F-Measure: 82 FPS: 112.9 Precision: 85.6 Recall: 78.7 |
| scene-text-detection-on-scut-ctw1500 | FAST-B-640 | F-Measure: 84.2 FPS: 66.5 Precision: 87.8 Recall: 80.9 |
| scene-text-detection-on-scut-ctw1500 | FAST-T-512 | F-Measure: 81.5 FPS: 129.1 Precision: 85.5 Recall: 77.9 |
| scene-text-detection-on-scut-ctw1500 | FAST-B-512 | F-Measure: 82.9 FPS: 92.6 Precision: 85.7 Recall: 80.2 |
| scene-text-detection-on-total-text | FAST-T-448 | F-Measure: 81.6% FPS: 152.8 Precision: 86.5 Recall: 77.2 |
| scene-text-detection-on-total-text | FAST-B-512 | F-Measure: 85.8% FPS: 93.2 Precision: 89.6 Recall: 82.4 |
| scene-text-detection-on-total-text | FAST-S-512 | F-Measure: 84.9% FPS: 115.5 Precision: 88.3 Recall: 81.7 |
| scene-text-detection-on-total-text | FAST-B-800 | F-Measure: 87.5% FPS: 46 Precision: 90.0 Recall: 85.2 |
| scene-text-detection-on-total-text | FAST-B-640 | F-Measure: 86.4% FPS: 67.5 Precision: 89.9 Recall: 83.2 |