期刊文献+

基于多步聚类的汉语命名实体识别和歧义消解 被引量:17

Chinese Named Entity Recognition and Disambiguation Based on Multi-stage Clustering
下载PDF
导出
摘要 命名实体识别和歧义消解是自然语言理解的重要研究内容。针对提供实体知识库情况下的命名实体识别和歧义消解任务,该文提出了一种基于多步聚类的方法。首先通过两轮聚类将命名实体与知识库实体定义链接,然后通过层次聚合式聚类对知识库中未出现的实体进行聚类,最后进行普通词的识别和基于K-Means聚类的结果调整。在CLP-2012的汉语命名实体识别和歧义消解评测数据上的实验表明,该文的方法表现出良好的性能,在测试集上的F值高出评测参赛队伍最好水平6.46%,达到86.68%。 Named Entity Recognition and Disambiguation is an important research of Natural Language Understanding.For the task of Named Entity Recognition and Disambiguation in the situation of entity knowledge base provided,this paper presents a method based on multi-stage clustering.First,we link the document to the entity definition in the knowledge base by two rounds of clustering.Second,we group entities which don’t exist in the knowledge base by Hierarchical Agglomerative Clustering.Finally,we recognize ordinary words and adjust the results by KMeans Clustering.Our experiments on data of CLP-2012Chinese person name disambiguation task proves our system performs well.The F score on test data is 86.68%,exceeding the best result of the Bake-off by 6.46%.
出处 《中文信息学报》 CSCD 北大核心 2013年第5期29-34,42,共7页 Journal of Chinese Information Processing
基金 国家社科基金重大资助项目(12&ZD227) 国家863计划资助项目(2012AA0111101) 国家自然科学基金资助项目(91024009)
关键词 命名实体识别 命名实体消歧 聚类 named entity recognition name entity disambiguation clustering
  • 相关文献

参考文献20

  • 1赵军.命名实体识别、排歧和跨语言关联[J].中文信息学报,2009,23(2):3-17. 被引量:50
  • 2J Artiles,j Gonzalo,S Sekine.The SemEval-2007 WePS evaluation:Establishing a Benchmark for the Web People Search Task[C]//Proceedings of SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations,2007:64-69. 被引量:1
  • 3J Artiles,J Gonzalo,S Sekine.WePS 2 Evaluation Campaign:Overview of the Web People Search Clustering Task[C]//Proceedings of 2nd Web People Search Evaluation Workshop,18th WWW Conference,2009. 被引量:1
  • 4J Artiles,A Borthwick,J Gonzalo,et al.WePS-3 Evaluation Campaign:Overview of the Web People Search Clustering and Attribute Extraction Tasks[C]//Proceedings of Conference on Multilingual and Multimodal Information Access Evaluation (CLEF).2010. 被引量:1
  • 5H Ji,R Grishman,H T.Dang,et al.An Overview of the TAC2010 Knowledge Base Population Track[C]//Proceedings of Text Analytics Conference (TAC2010). 被引量:1
  • 6H Ji,R Grishman,H T Dang.An Overview of the TAC2011 Knowledge Base Population Track[C]//Proceedings of Text Analysis Conference (TAC2011). 被引量:1
  • 7R Grishman,B Sundheim.Design of the MUC-6 evaluation[C]//Proceedings of 6th Message Understanding Conference,1995. 被引量:1
  • 8J Sun,J Gao,L Zhang,et al.Chinese Named Entity Identification Using Class-based Language Model[C]//Proceedings of the 19 th International Conference on Computational Linguistics(COLING 2002):1-7. 被引量:1
  • 9A Borthwick.A Maximum Entropy Approach to Named Entity Recognition[D].New York:New York University.1999. 被引量:1
  • 10X Mao,Y Dong,S He,et al.Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields[C]//Proceedings of Sixth SIGHAN Workshop on Chinese Language Processing.2008:90-93. 被引量:1

二级参考文献76

  • 1孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:87
  • 2蒋龙,周明,简立峰.利用音译和网络挖掘翻译命名实体[J].中文信息学报,2007,21(1):23-29. 被引量:11
  • 3NIST. The ACE 2007 (ACE07) Evaluation Plan: Evaluation of the Detection and Recognition of ACE Entities, Values, Temporal Expressions, Relations, and Events [EB/OL]. [-2007]. http://www, hist. gov/ speech/tests/ace/2OOT/doc/aceOT-evalplan, vl. 3a. pdf. 被引量:1
  • 4Nancy A. Chinchor. Overview of MUC-7/MET-2[C]//Proceedings of the Seventh Message Under- standing Conference (MUC-7), Fairfax, Virginia, 1998. 被引量:1
  • 5Gina Anne Levow. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition[C]//Proceedings of the Fifth SigHAN Workshop on Chinese Language Processing, Sydney: Association for Computational Lin- guistics, 2006:108 117. 被引量:1
  • 6A. Mikheev, C. Grover, Moens M. Description of the LTG System Used for MUC-7[C]//Proceedings of 7th Message Understanding Conference ( MUC-7 ), Fairfax, Virginia, 1998. 被引量:1
  • 7863计划中文信息处理与智能人机接口技术评测组.2004年度863计划中文信息处理与智能人机交互技术评测:命名实体评测结果报告[R].北京:863计划中文信息处理与智能人机接口技术评测组,2004. 被引量:1
  • 8Ralph Grishman, Beth Sundheim. Design of the MUC-6 evaluation [C]//Proceedings of 6th Message Under- standing Conference, Columbia, MD, 199S. 被引量:1
  • 9G. R. Krupka, K. Hausman. IsoQuest. Inc.:Description of the NetOwl TM Extractor System as Used for MUC-7 [C]//Proceedings of the 7th Message Understanding Conference. (MUC-7), Fairfax, Virginia, 1998. 被引量:1
  • 10W.J. Black, F. Rinaldi, D. Mowart. FACILE: Description of the NE System Used for MUC-7 [C]// Proceedings of the 7th Message Understanding Conference. (MUC-7), Fairfax, Virginia, 1998. 被引量:1

共引文献51

同被引文献167

引证文献17

二级引证文献88

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部