摘要
命名实体识别和歧义消解是自然语言理解的重要研究内容。针对提供实体知识库情况下的命名实体识别和歧义消解任务,该文提出了一种基于多步聚类的方法。首先通过两轮聚类将命名实体与知识库实体定义链接,然后通过层次聚合式聚类对知识库中未出现的实体进行聚类,最后进行普通词的识别和基于K-Means聚类的结果调整。在CLP-2012的汉语命名实体识别和歧义消解评测数据上的实验表明,该文的方法表现出良好的性能,在测试集上的F值高出评测参赛队伍最好水平6.46%,达到86.68%。
Named Entity Recognition and Disambiguation is an important research of Natural Language Understanding.For the task of Named Entity Recognition and Disambiguation in the situation of entity knowledge base provided,this paper presents a method based on multi-stage clustering.First,we link the document to the entity definition in the knowledge base by two rounds of clustering.Second,we group entities which don’t exist in the knowledge base by Hierarchical Agglomerative Clustering.Finally,we recognize ordinary words and adjust the results by KMeans Clustering.Our experiments on data of CLP-2012Chinese person name disambiguation task proves our system performs well.The F score on test data is 86.68%,exceeding the best result of the Bake-off by 6.46%.
出处
《中文信息学报》
CSCD
北大核心
2013年第5期29-34,42,共7页
Journal of Chinese Information Processing
基金
国家社科基金重大资助项目(12&ZD227)
国家863计划资助项目(2012AA0111101)
国家自然科学基金资助项目(91024009)
关键词
命名实体识别
命名实体消歧
聚类
named entity recognition
name entity disambiguation
clustering