摘要
针对文档纠错方法的不足,提出了一种统计与规则相结合的文档构件查错纠错方法。针对文档构件不同的错误情况,采取不同的查错纠错方法:对于文档局部构件的结构错误采用Schema有效性验证与统计相结合的方法处理;对于文档列表、标题、公式等构件的编号内容采用规则的方法处理。实验表明,该方法有较好的纠错效果。
Aiming at the shortage of document error-correction method,an artifact document error-checking method combining statistics and rules is proposed.According to the different error conditions of document components,different error checking and correction methods are adopted.For the structural errors of local document components,the method combining Schema validation and statistics is adopted.The numbered contents of the document list,title,formula and other components are processed in a regular way.The experiment shows that the method has good error correction effect.
作者
王娟
李宁
郝海利
WANG Juan;LI Ning;HAO Haili(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China)
出处
《北京信息科技大学学报(自然科学版)》
2020年第5期14-19,共6页
Journal of Beijing Information Science and Technology University
基金
国家自然科学基金资助项目(61672105)。
关键词
文档纠错
文档构件
文档规范化
document error correction
document component
document normalization