摘要
提出一种改进的编码器-解码器模型。模型采用多尺度密集卷积网络作为编码器,以提取手写数学公式图像的多分辨率特征。采用完全基于注意力机制的Transformer模型作为解码器,依据图像特征将二维手写数学公式解码为一维LaTeX序列。通过相对位置编码嵌入图像位置信息和LaTeX符号位置信息。实验结果表明,模型在官方CROHME 2014数据集上取得了优异的性能,相比于当前最先进的方法,其公式识别准确率提高了3.55%,字错误率降低了1.41%。
In recent years,great progress on handwritten mathematical expression recognition have been made by using Encoder-Decoder models.However,these Encoder-Decoder models still have two shortcomings.One is that the image feature information is insufficient by the encoder,and the other is that the decoder is inefficient in processing long sequences.For these shortcomings,this paper proposes an improved Encoder-Decoder model.The model uses a multi-scale Densely Connected Convolutional Networks as the encoder to extract the multi-resolution features of handwritten mathematical expressions images.By using a Transformer model based on the attention entirely as the decoder we decode two-dimensional handwritten mathematical expressions into one-dimensional LaTeX sequences according to the image features.Hence,image position information and LaTeX symbol position information have been embedded by relative position encoding.The results show that the model achieves excellent performance on the official CROHME 2014 dataset,with a 3.55%improvement in formula recognition accuracy and a 1.41%reduction in word error rate compared to current state-of-the-art methods.
作者
杜永涛
余元辉
DU Yongtao;YU Yuanhui(College of Computer Engineering,Jimei University,Xiamen 361021,China)
出处
《集美大学学报(自然科学版)》
CAS
2022年第6期570-576,共7页
Journal of Jimei University:Natural Science
基金
厦门市科技补助项目(2022CXY0301)。