摘要
在就业招聘信息搜索系统中,如何正确切分机构名是一个非常重要的问题。在对机构名的组成结构进行了深入研究的基础上,提出了机构名的构成规则,建立了用于机构名切分的专有词典,并定制了一个基于合并策略的未登录词识别方法。本系统与海量分词系统进行了对比实验,实验表明,针对机构名切分这个特定领域,文中系统有更好的切分性能。在封闭测试中未登录词识别的准确率可以达到97.26%,召回率可达96.77%。
Organization name segmentation plays an important role in employment information retrieval system. Based on complete research of the organization name composition, the relevant structural features and domain dictionary were obtained, And also a combination approach is presented for unknown words identification in this paper, Experimental results show that the performance of the new system is better than several state - of- the - art systems in this special area. The experiment achieved 97.26 % precision and 96.77 % recall by close teat.
出处
《计算机技术与发展》
2008年第5期12-14,18,共4页
Computer Technology and Development
关键词
中文分词
机构名切分
1-最短路径算法
未登录词识别
Chinese word segmentation
organization name segmentation
one- shortest paths algorithm
unknown word identifieation