摘要
提出了一种基于支持向量机(SVM)和条件随机场(CRF)的双层模型进行中文机构名识别的方法.第一层模型采用CRF识别简单机构名,并将识别结果传至第二层辅助下一步的识别;第二层采用基于驱动的方法,将SVM和CRF结合进行复杂机构名的识别;最后将两层的识别结果合并,并通过一个后续处理对置信度较低的识别结果进行修正.大规模真实语料的开放测试表明,精确率达到94.83%,召回率达到95.02%,证明了该方法的有效性.
A cascaded approach of Chinese organization name recognition based on support vector machine(SVM)and conditional random fields(CRF)is proposed.The simple organization name is recognized in the first level with CRF,and the result is used to support the decision of the second level.Then,a drive-based method is proposed in the second level for recognition of the complicated organization name combining SVM and CRF.Finally,the results of the two levels are combined,and apost-processing to correct those results with low confidence is adopted.The results show that this approach based on SVM and CRF is efficient in recognizing organization name through open test for large-scale real linguistics,and the recalling rate achieves 95.02% and the precision rate achieves 94.83%.
出处
《大连理工大学学报》
EI
CAS
CSCD
北大核心
2010年第5期782-787,共6页
Journal of Dalian University of Technology
基金
中央高校基本科研业务费专项资金资助项目(DUT10RW202)