期刊文献+

融合多特征的中文论文同名学者消歧研究 被引量:3

Disambiguation of Chinese Author Names with Multiple Features
原文传递
导出
摘要 【目的】解决文献资源管理系统中中文论文学者同名问题。【方法】在文献数据的基础上构建以"作者名+机构名"为标识的学者实体,利用学者实体的属性构建三个方面的6个相似度特征,分别采用主成分分析、直接赋值权重以及二者结合的方法融合特征,研究各融合方法消歧能力和各特征消歧效果。【结果】主成分分析与以单个特征为单位的赋值权重相结合的融合方法,以及以单个方面为单位的赋值权重的融合方法能有效降低时间开销,在LIS测试集上F1值分别达到70.74%和70.42%,在经济学测试集上F1值分别达到81.90%和80.93%。【局限】研究所使用的特征有限,均来源于论文的元数据描述,没有使用外部信息或挖掘文本内容。【结论】所提特征融合方法可有效解决多特征融合时权重设置问题。 [Objective] This paper aims to address the issues facing document management systems due to Chinese authors with the same names. [Methods] We built author entities with"author name + institution name"based on bibliographic data. Then, we used the attributes of author entities to construct six similarity features from three aspects. Third, we merged these features by principal component analysis or direct weight assignment.Finally, we evaluated the performance of the proposed method. [Results] Our methods significantly reduced processing time. Their F1 values on the LIS dataset were 70.74% and 70.42%, while their F1 values on the economics dataset were 81.90% and 80.93%. [Limitations] The attributes used in this research were only retrieved from metadata of the papers. [Conclusions] The proposed method could improve weight setting of multiple features.
作者 林克柔 王昊 龚丽娟 张宝隆 Lin Kerou;Wang Hao;Gong Lijuan;Zhang Baolong(School of Information Management,Nanjing University,Nanjing 210023,China;Jiangsu Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2021年第4期90-102,共13页 Data Analysis and Knowledge Discovery
基金 江苏省“六大人才高峰”高层次人才项目(项目编号:JY-001) 江苏青年社科英才和南京大学仲英青年学者的研究成果之一。
关键词 特征融合 同名消歧 主成分分析 中文论文 Feature Fusion Author Name Disambiguation PCA Chinese Papers
  • 相关文献

参考文献13

二级参考文献80

  • 1刘远超,王晓龙,刘秉权.一种改进的k-means文档聚类初值选择算法[J].高技术通讯,2006,16(1):11-15. 被引量:23
  • 2Han Jiawei,Kamber M.数据挖掘:概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2007. 被引量:8
  • 3CHOI J D,L EE K,LOGINOV A,et al.Efficient and precise data race detection for multithreaded object-oriented programs[C]//Proceeding of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation.Berlin,2002:258-269. 被引量:1
  • 4Fleischman M.B,Hovy E.Multi-document Person Name Resolution[C]//Proceedings of ACL-42 Reference Resolution Workshop,Barcelona,Spain,2004,7. 被引量:1
  • 5Chen Y,Martin J.Towards Robust Unsupervised Personal Name Disambiguation[C]//Proceedings of the EMNLP and CoNLL,Prague,2007:190-198. 被引量:1
  • 6Artiles J,Gonzalo J,Sekine S.The SemEval-2007 WePS Evaluation:Establishing a benchmark for the Web People Search Task[C]//Proceedings of the 4th International Workshop on Semantic Evaluations 2007,Prague,June,2007:64-69. 被引量:1
  • 7Shingo O,Issei S,Minoru Y.Person Name Disambiguation in Web Pages Using Social Network[J].Compound Words and Latent Topics.PAKDD,2008:260-271. 被引量:1
  • 8http://www.cdblp.cn/. 被引量:1
  • 9Malin B.Unsupervised Name Disambiguation via Social Network Similarity[C] ∥Proc of SIAM Workshop on Link Analsis,Counterterrorism,and Security,2005:93-102. 被引量:1
  • 10Bhattacharya I,Getoor L.Iterative Record Linkage for Cleaning and Integration[C] ∥Proc of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery,2004:11-18. 被引量:1

共引文献56

同被引文献29

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部