期刊文献+

数据规模对合著关系预测的影响研究 被引量:3

The Impact of Data Size on Co-authorship Prediction
下载PDF
导出
摘要 [目的/意义]为了发现适合合著关系预测的最佳数据集规模,并公平比较合著关系预测的指标,需要比较和分析不同数据规模下合著关系预测的整体准确率和最优指标的变化情况。[方法/过程]选取12个共同邻居及其改进指标作为代表性的合著关系预测指标,在不同规模的合著网络数据集上运用链路预测的理论和方法计算不同指标的预测准确率,并发现不同数据规模下的最优指标,从而揭示数据规模对合著关系预测的影响以及造成这些影响的原因。[结果/结论]在图书情报领域,通过作者出现频次大小形成不同规模的合著网络数据集,实验结果表明,数据规模越大,合著关系预测的整体准确率越高,并在合著网络全数据集上实现了准确率的巨大提升,说明没有经过任何过滤的完整合著网络是合著关系预测的最佳数据集;同时,不同数据集中合著关系预测的最优指标发生了变化,验证了指标具有数据规模偏好,说明公平科学比较合著关系预测指标需要在多个不同规模的数据集下进行。造成该结果的原因在于随着数据规模变大,合著网络数据集越接近真实情况,改进指标的优势得到了充分发挥。该方法可以扩展应用到其他领域并对结论进行验证。 [ Purpose/Significance]In order to find the optimaldatasetsize for co-authorship predictionand compareindicators of co-an- thorship prediction fairy, we need to compare and analyze the changes of overall accuracy and optimal indicators in different size datasets for co-anthorship prediction. [ Method/Process] This paper selects 12 representative indicators for co-authorship prediction including com- mon indicator (CN) and its improvements, and then useslink prediction method for calculating accuraciesof different indicators in different size co-authorship networks and finds the best appropriate indicator for co-authorship prediction. It could reveal how and why data size in- fluences co-authorship prediction. [ Result/Conclusion] In the field of Library and Information Science, the different sizedatasets of co- authorship network are formed through author occurringfrequency. The results show that the larger the size of the datasets, the higher the o- verall accuracy of the co-authorship prediction. The best appropriate dataset is the co-authorship network without any filtering because the accuracy of full dataset is the highest that achieves a huge boost compared to others. Furthermore, the indicators have biases in different datasets because optimal indicator changes along with the different size of datasets. It indicates thata fair comparison among indicators needs to be experimented amongdifferent size datasets. The reason is that the largerthe data size becomes, the closerthe co-authorship net- work is to the real situation, and thereforethe advantages of improved indicators couldbe fully activated. The method could be extended toother areas and to validate the conclusions.
作者 张金柱 韩涛
出处 《情报杂志》 CSSCI 北大核心 2016年第9期80-85,共6页 Journal of Intelligence
基金 国家自然科学基金青年基金"基于被引科学知识突变的突破性创新动态识别及其形成机理研究"(编号:71503125) 教育部人文社会科学研究青年基金"异构知识网络中主题突变动态识别研究"(编号:14YJC870025) 中央高校基本科研业务专项资金"基于专利引用科学知识突变的突破性创新动态识别方法与形成机理研究"(编号:30915013101)的研究成果之一
关键词 数据规模 合著关系预测 图书情报 准确率 最优指标 data size co-authorship prediction Library and Information Science precision optimal indicator
  • 相关文献

参考文献13

  • 1Guns R, Rousseau R. Recommending research collaborations u- sing link prediction and random forest classifiers [ J 1. Scientomet- rics,2014,101 (2) :1461-1473. 被引量:1
  • 2陈卫静,郑颖.基于作者关键词耦合的潜在合作关系挖掘[J].情报杂志,2013,32(5):127-131. 被引量:51
  • 3Zhang Q, Xu X, Zhu Y, et al. Measuring multiple evolution mechanisms of complex networks [ J ]. Scientific Reports, 2015, 5 : 10350. 被引量:1
  • 4张斌,马费成.科学知识网络中的链路预测研究述评[J].中国图书馆学报,2015,41(3):99-113. 被引量:58
  • 5Zhang J, Han T, Wang X. Uncovering the mechanism of knowl- edge network evolution by link prediction[ J ]. C_reomatics and In- formation Science of Wuhan University,2015,39 ( Spec. 1 ) : 100 -106. 被引量:1
  • 6Zhao J, Miao L, Yang J, et al. Prediction of links and weights in networks by reliable routes [ J ]. Scientific Reports, 2015,5 : 12261. 被引量:1
  • 7Lyu L, Zhou T. Link prediction in complex networks:a survey [ J]. Physica A: Statistical Mechanics and its Applications,2010, 390(6) :1150-1170. 被引量:1
  • 8Yan E, Guns R. Predicting and recommending collaborations:an author-, institution-, and country - level analysis [ J ]. Journal of Informetrics, 2014,8 ( 2 ) : 295 -309. 被引量:1
  • 9Liu Z, Zhang Q-M, Lyu L, et al. Link prediction in complex net- works: a local nai've Bayes model [ J ]. EPL ( Europhysics Let- ters) ,2011,96(4) :48007. 被引量:1
  • 10Zhou T, Lyu L,Zhang Y C. Predicting missing links via local in- formation [ J ]. The European Physical Journal B- Condensed Matter and Complex Systems,2009,71 (4) :623-630. 被引量:1

二级参考文献24

  • 1张晗,崔雷,姜洋.运用非相关文献知识发现方法挖掘科研机构潜在的合作方向[J].现代图书情报技术,2006(4):45-48. 被引量:12
  • 2杨立英.化学领域国际主要科研机构论文“共现”现象研究[J].科学观察,2006,1(5):10-17. 被引量:15
  • 3Christopher D. Manning, Parbhakar Ragha-van, Hinrich Schutze.信息检索导论[M].王斌,译.北京:人民邮电出版社,2010:76-287. 被引量:4
  • 4Howard D White, Belver C Griffith. Author Cocitation : A Liter- ature Meassure of Intellectural Structure [ J ]. Journal of the A- medcan Sciety for Information Science, 1981,32 ( 3 ) : 163-171. 被引量:1
  • 5Ahlgren P, Jameving B, Rousseau R. RequirementS for a Coci- tation Similarity Measure, with Special Reference to Pearson's Correlation Coefficient[ J]. Journal of the American Society for Information Science and Technology,2003,54(6) :550-560. 被引量:1
  • 6Nees Jan van Eck, Ludo Waltman. Appropriate Similarity Meas- ure for Author co-citation Analysis[ J]. Journal of the American Society for Information Science and Technology,2008,59 (10) : 1653-1661. 被引量:1
  • 7Zhao Dangzhi, Andress Strotman. Evolution of Research Activi- ties and Intellectual Influences in Information Science 1996-2005 : Introducing Author Bibliographic- coupfing Analysis [ J ]. Journal of the American Society for Information Science and Technology ,2008,59 ( 13 ) :2070-2086. 被引量:1
  • 8Steven A Morris, Gary G Yen. Crossmaps:Visualization of O- verlapping Relationships in Collections of Journal Papers [ J ]. PANS ,2004,101:5291-5296. 被引量:1
  • 9Ruimin Ma. Author Bibliographic Coupling Analysis: A Test Based on a Chinese Academic Database[ J ]. Journal of Informet- ric,2012,6(4) :532-542. 被引量:1
  • 10钱俐娟,张新民,郑彦宁.国外图书情报学领域主要科研机构“共现”现象研究[J].图书情报工作,2008,52(11):49-52. 被引量:10

共引文献103

同被引文献24

引证文献3

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部