期刊文献+

基于动态文本窗口和权重动态分配的中文文本纠错方法 被引量:10

Chinese Text Error Correction Method Based on Dynamic Text Window and Weighted Dynamic Allocation
下载PDF
导出
摘要 提出一种基于动态文本窗口的中文文本查错方法,依靠窗口的不断滑动来检测文本错误。当中文文本有疑似错误时,采用聚类词集平滑数据稀疏问题,然后采用权重动态分配的纠错词集进行纠错,若纠错结果仍不符合检错规则,则用缩小文本窗口法和拓展窗口法来检查具体错误。构建纠错词集则采用基于最小编辑距离和权重动态分配的方法。实验结果表明,基于动态文本窗口查错方法的F值达到了77.9%;再结合权重动态分配的纠错方法,纠错准确率达到78.1%,相较黑马校对系统和基于平均权重的纠错策略,准确率分别提升了9.7%和15.8%。 A Chinese text error checking method based on dynamic text window was proposed,which relied on the continuous sliding window to detect errors in text.When the text was suspected to be wrong,the data sparse problem was smoothed by the clustering word set,and the error correction would be carried out by using the word set assigned dynamically.If the error correction results still could not conform to the error detection rules,the reduced window method and the extended window method would be used to check the specific errors.The error correction word set was constructed by a method which based on the minimum edit distance and the weighted dynamic allocation.The experimental results showed that the F-score of the dynamic text window error checking method was 77.9%.Combined with the error correction method of the weighted dynamic allocation,the error correction accuracy was 78.1%,which was 9.7%and 15.8%higher than black horse proofreading system and average weighted error correction strategy,respectively.
作者 黄改娟 王匆匆 张仰森 HUANG Gaijuan;WANG Congcong;ZHANG Yangsen(Institute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing 100101, China;Beijing Key Laboratory of Internet Culture Digital Dissemination Research, Beijing 100101, China)
出处 《郑州大学学报(理学版)》 CAS 北大核心 2020年第3期9-14,共6页 Journal of Zhengzhou University:Natural Science Edition
基金 国家自然科学基金项目(61772081) 国家重点研发计划项目(2018YFB1402901) 科技创新服务能力建设-科研基地建设-北京实验室-国家经济安全预警工程北京实验室项目(PXM2018_014224_000010)。
关键词 语义搭配 数据稀疏 动态文本窗口 权重动态分配 semantic collocation data sparse dynamic text window weighted dynamic allocation
  • 相关文献

参考文献14

二级参考文献51

共引文献80

同被引文献89

引证文献10

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部