摘要
针对目前主流场景文本检测算法在进行多尺度特征融合时不能够充分利用高、低层信息造成的文本漏检,以及长文本边界检测错误的问题,本文提出一种应用注意力机制的多尺度特征融合与残差坐标注意力的场景文本检测算法。该算法将注意力特征融合模块嵌入到金字塔中,通过纠正不同尺度特征的不一致性来提取更多的细节信息,以改善文本的漏检;在融合之后,使用残差坐标注意力模块在纵、横两个方向上捕获方向感知和位置敏感信息,细化边界信息,以优化长文本检测的效果。通过在公开数据集ICDAR 2015和Total-Text上的实验结果表明,该算法在F分数上分别达到了85.5%和83.6%,在推理速度上分别达到了22.4 FPS和40 FPS,相较于DBNet网络,在推理速度上略有下降,但在F分数上分别提高3.2%和0.8%。
Aiming at the problems of text omission caused by the failure of the mainstream scene text detec-tion algorithm to make full use of the high and low-level information in the multi-scale feature fusion, and the error of long text boundary detection, this paper proposes a scene text detection algorithm which applies the multi-scale feature fusion of attention mechanism and the residual coordinate attention. The model embedded the attention feature fusion module into the pyramid. It extracts more detailed information by correcting the inconsistency of features at different scales to improve the missed detection of text;after feature fusion, the residual coordinate attention module is used to capture orientation-aware and position-sensitive information in vertical and horizontal directions, refine boundary information to optimize the effect of long text detection. The experimental results on the public datasets ICDAR 2015 and Total-Text show that the model achieves 85.5% and 83.6% in F-measure, respectively, and 22.4 FPS and 40 FPS in inference speed. Compared with the DBNet network, this network has a slight decrease in inference speed, but 3.2% and 0.8% improvement in F-measure, respectively.
出处
《计算机科学与应用》
2022年第11期2608-2618,共11页
Computer Science and Application