摘要
本文提出了一种基于规则识别中文组织机构全称和简称的方法。全称的识别首先借助机构后缀词库获得其右边界,然后通过规则匹配并借助贝叶斯概率模型加以决策获得其左边界。简称的识别是在全称的基础上应用其对应的简称规则实现的。在开放性测试中,该方法的总体查全率为85.19%,查准率为83.03%,F Measure为84.10%;简称的查全率为67.18%,查准率为74.14%。目前该方法已应用于中文关系的抽取系统。
This paper proposes a method for recognizing Chinese organization names and their abbreviations based on rules. The right boundary of an organization name is identified with the help of the organization suffix lexicon. The left boundary is recognized by the optimum rules based on Bayesian probability model. After idendifying an organization name, we can get candidate abbreviations based on abbreviation rules accordingly. In open test, the recall is 85.19%, the precision is 83.03%, the F Measure is 84.10% for name recognition, and the recall is 67.18%, the precision is 74.14 % for abbreviation recognition. This method has been applied in the Chinese relation identification system.
出处
《中文信息学报》
CSCD
北大核心
2007年第6期17-21,共5页
Journal of Chinese Information Processing
基金
上海市科委(045107035)
德方的赞助
关键词
计算机应用
中文信息处理
组织机构名称识别
组织机构简称识别
规则匹配
贝叶斯概率模型
computer application
Chinese information processing
recognition of Chinese organization names
recognition of Chinese organization abbreviations
rule matching
bayesian probability model