摘要
为了提升数据标注速率,本文采用了一种应用轻量模型、后处理实施半自动化标注的方法,实现试卷版面拆解的快速开发与应用。使用LCNet改进PicoDet网络的轻量预训练模型,对已经标注的小样本训练基础模型修改网络输出,基础模型预测剩余样本的数据转换为标注格式,经过人工校验以后的全样本标注数据使用更大规模主干网络的PicoDet网络训练最终模型。经过实验验证,本文提出的半自动标注的方法与人工标注相比,数据标注速率提升195%,标注所花费时间周期缩短86.47%,项目开发周期得到大幅度缩短,经过版面拆解处理的文档图像调用百度OCR接口,可快速实现文档图像到文档的转换。
This paper adopts a method of using a lightweight model and post-processing to implement semi-automatic labeling to realize the rapid development and application of test paper layout disassembly.Using LCNet to improve the lightweight pre-training model of the PicoDet network to train the basic model for small samples that have been labeled,modify the network output,and convert the data of the remaining samples predicted by the basic model into a labeled format.After manual verification,the full-sample labeled data is trained by a large-scale backbone network—PicoDet,deriving the final model.The semi-automatic labeling method proposed in this paper is compared with the manual labeling method,and an experiment is carried out to verify it.It shows that the data labeling rate is increased by 195%,the time period spent on labeling is shortened by 86.47%,and the project development cycle is greatly shortened,the document image processed after layout disassembly calls the Baidu OCR interface,which can quickly realize the conversion from picture format to text format of a document.
作者
周家丰
杨蕾
ZHOU Jiafeng;YANG Lei(School of Electrical and Electronic Engineering,Wuhan Polytechnic University,Wuhan 430048,China)
出处
《应用科技》
CAS
2023年第1期26-32,共7页
Applied Science and Technology
基金
湖北省教育厅科技基金项目(Q20191602)。