期刊文献+

基于条件随机场与规则改进的复杂中文地名识别 被引量:10

Complex Chinese place name recognition based on conditional random field and rule improvement
原文传递
导出
摘要 中文地名构词能力强、特征多样,难以从文本中准确定位到地名的位置和边界。为实现复杂中文地名的准确自动识别,深入分析了复杂中文地名的特征,将地名识别问题转换为序列标注问题,训练条件随机场(conditional random field,CRF)模型对地名进行识别,并结合规则对CRF结果进行了修正和补召。为进一步提高对复杂地名的识别精度,设计了一种基于信息熵和点互信息的复杂地名识别算法,该算法利用地名数据库生成关联性词典,并基于该词典对文本相邻用字之间的关联性进行计算,从而确定复杂地名与上下文的边界,最终实现复杂地名的识别。实验结果显示,所提方法能够将现有的规则集合高效地用于地名识别,与CRF模型配合,提高了识别精度。在测试集上所提出的地名识别算法的准确度都高于包括深度学习算法在内的目前主流的识别算法。 Chinese place names have strong word-formation ability and diverse features. It is a challenging task to accurately recognize the place names from the Chinese text. In order to realize accurate and automatic identification of complex Chinese place names, we analyze the characteristics of complex Chinese place names and convert the recognition into a sequence labeling problem. A conditional random field(CRF) model is trained to identify and recognize complex Chinese place names. First, we combine CRF with the rules to modify and supplement the results. Then, we design a recognition algorithm based on information entropy and point mutual information for complex place names. The algorithm uses the place name database to generate the relevance dictionary. Based on the dictionary the correlation between the adjacent words in a text is calculated to determine the boundary between the complex place name and its context to recognize the complex place name. The experimental results indicate the proposed method can effectively use the existing rule sets for place name recognition, and through matching the rules with the CRF model, the recognition accuracy is improved.According to the tests on benchmark and generated datasets, the accuracy of the complex place name recognition achieved by the proposed method in this paper is higher than that of the current recognition algorithms including deep learning method.
作者 毛波 滕炜 MAO Bo;TENG Wei(Collaborative Innovation Center for Modem Grain Circulation and Safety of Jiangsu Province,College of Information Engineering,Nanjing University of Finance and Economics,Nanjing 210003,China)
出处 《武汉大学学报(工学版)》 CAS CSCD 北大核心 2020年第5期456-463,共8页 Engineering Journal of Wuhan University
基金 国家自然科学基金项目(编号:41671457) 江苏省自然科学基金项目(编号:BK20151551) 江苏省高校自然科学研究重大项目(编号:16KJA170003)。
关键词 复杂地名识别 条件随机场 信息熵 complex place name recognition conditional random field information entropy
  • 相关文献

参考文献10

二级参考文献102

共引文献124

同被引文献113

引证文献10

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部