摘要
介绍了为自动校对系统检测出的错误字串提供有效纠错建议的算法。该算法针对音同、音近、形似或编码键位相近的错误产生特点,构造了字驱动的双向词典和近似字词典,并利用模糊匹配算法为错误字串提供纠错建议,然后对所有建议根据上下文信息和统计频率进行排序。通过在Windows环境下所实现的系统试验,表明正确建议的召回率达到91.8%,而前5选建议的正确率为76.4%。
This paper introduces an algorithm to offer the effective correct candidates for the detected error strings by automatic proofreading system. Constructing the bi-way dictionary drove by Chinese character and the approximate word dictionary based on characteristic of the similarity or same of pronunciation, shape, and/or input coding key position, this algorithm offers reasonable candidates for the error strings through the likelihood matching method, and then sorts the candidates by text context information and statistical frequency. The test through the system that realizes under Windows environment, shows that the correct suggestion recall ratio is 91.8%, and the correct rate of the fore 5 candidates is 76.4%.
出处
《计算机工程》
CAS
CSCD
北大核心
2004年第11期106-109,共4页
Computer Engineering
基金
山西省青年科技研究基金资助项目(20021015)
关键词
纠错建议
词典构造
排序算法
Correcting candidate suggestion
Dictionary construction
Suggestion sort algorithm