标签共现的标签聚类算法研究被引量：3

Research on tags co-occurrence for tags clustering algorithm

下载PDF

导出

摘要在社会网络中,标签聚类研究可以解决标签冗余和语义模糊等问题。为了提高聚类有效性,提出综合标签共现信息确定标签特征向量,通过特征向量的提取计算相似度,将传统聚类算法中用几何距离计算对象与中心对象的距离改为用皮尔森相关系数计算,提出结合K-means聚类算法对标签进行聚类的标签共现聚类算法,并分析了算法的复杂度。最后对不同聚类算法进行了相关对比实验,实验结果表明该聚类算法效果要好于其他的聚类算法,从而验证了该聚类算法的有效性和可行性。 In the social network, tag clustering analysis can deal with problems such as tag redundancy and semantic fuzziness and so on. In order to improve the effectiveness of clustering, it proposes to integrate label co-occurrence information and derive the feature vector of label, extracts the feature vector to calculate the similarity. The traditional clustering algorithm uses the geometric distance to calculate the distance to the object and the center of the object, now uses the Pearson correlation coefficient to calculate. The tag clustering algorithm that combines with K-means clustering algorithm to cluster label is proposed, and then analyzes the complexity of the algorithm. Finally, doing relevant comparative experiments for different clustering algorithms, the experimental results show that the proposed clustering algorithm enhances the clustering performance than other clustering algorithms, and verify the availability and effectiveness of the proposed clustering algorithm.

作者王娅丹李鹏金瑜刘宇

机构地区武汉科技大学计算机科学与技术学院智能信息处理与实时工业系统湖北省重点实验室

出处《计算机工程与应用》 CSCD 北大核心 2015年第2期146-150,208,共6页 Computer Engineering and Applications

基金国家自然科学基金(No.61303117) 湖北省重点实验室开放基金资助项目(No.znss2013B012) 湖北省教育厅科研基金(No.B2014085 No.B20101104) 武汉科技大学大学生科技创新基金研究项目(No.12ZRC061)

关键词标签聚类标签共现 K-MEANS 皮尔森系数特征向量 tag clustering tag co-occurrence K-means Pearson correlation coefficient feature vector

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献15

1Golder S A,Huberman B A.Usage patterns of collaborative tagging systems[J].Journal of Information Science,2006,32(2):198-208. 被引量：1
2Owen K,Daniel L.Tag Cloud drawing:algorithms for cloud visualization[C]//Proceedings of Tagging and Metadata for Social Information Organization(WWW2007),2007. 被引量：1
3Golder S A,Huberman B A.Usage patterns of collaborative tagging systems[J].Journal of Information Science,2006,32(2):198-208. 被引量：1
4Lin Y R,Chi Y,Zhu S,et al.Analyzing communities and their evolutions in dynamic social network[J].ACM Transactions on Knowledge Discovery from Data(TKDD),2009,3(2):1-31. 被引量：1
5孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量：1079
6Gruber T.Ontology of folksonomy:a mash-up of apples and oranges[J].International Journal on Semantic Web and Information Systems(IJSWIS),2007,3(1):1-11. 被引量：1
7Begelman G,Keller P,Smadja F.Automated tag clustering:improving search and exploration in the tag space[C]//Collaborative Web Tagging Workshop at WWW2006,Edinburgh,Scotland,2006:15-33. 被引量：1
8Golder S A,Huberman B A.Usage patterns of collaborative tagging systems[J].Journal of Information Science,2006,32(2):198-208. 被引量：1
9雷小锋,谢昆青,林帆,夏征义.一种基于K-Means局部最优性的高效聚类算法[J].软件学报,2008,19(7):1683-1692. 被引量：114
10Ahlgren P,Jarneving B,Rousseau R.Requirements for a cocitation similarity measure,with special reference to Pearson's correlation coefficient[J].Journal of the American Society for Information Science and Technology,2003,54(6):550-560. 被引量：1

二级参考文献46

1李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量：114
2Lewis D. D.. An evaluation of phrasal and clustered representalions on a text categorization task. In: Proceedings of SIGIR'92,the 15st ACM International Conference on Research and Development in Information Retrieval, Copenhagen, Denmark,1992, 37-50. 被引量：1
3Sebastiani F,. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1-47. 被引量：1
4Lewis D.. Naive bayes at forty: The independence assumption in information retrieval. In: Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 1998,4-15. 被引量：1
5Salton G.. Automatic Text Processing: The Transformation,Analysis, and Retrieval of Information by Computer. Reading,MA: Addison Wesley, 1989. 被引量：1
6Mitchell T. M.. Machine Learning. New York: McCraw Hill,1996. 被引量：1
7Joachims T.. Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning,Chemnitz, Germany, 1998, 137-142. 被引量：1
8Yang Y. , Liu X.. A Re-examination of text categorization methods. In: Proceedings of SIGIR'99, the 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999, 42-49. 被引量：1
9樊兴华.因果推理和文本分类.清华大学博士后出站报告,2004. 被引量：1
10Larkey L. S.. Automatic essay grading using text categorization techniques.. In: Proceedings of SIGIR'98, the 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998, 90-95. 被引量：1

共引文献1267

1吕政阳,邓涛,张丽艳.一种基于机器视觉的飞机钣金件跨粒度识别方法[J].仪器仪表学报,2020,41(2):195-204. 被引量：10
2丁小军,陈杰,李霖,徐碧通,朱晓姝.一种基于聚类结果稳定性来确定聚类数的方法[J].玉林师范学院学报,2020(3):43-47. 被引量：1
3王玥,李文权,梁爽,余静财.基于改进聚类算法的共享汽车网点选址研究[J].武汉理工大学学报,2021,43(2):79-85. 被引量：1
4林耿堃,盛积良.乡村振兴时代背景下农民消费结构变迁研究[J].农业农村部管理干部学院学报,2021(2):76-81. 被引量：3
5高显义,林欣晖.基于文本聚类的变电工程变更特征识别研究[J].建筑经济,2020,41(S02):200-203. 被引量：2
6毛颖颖,杨新凯.融合拓扑势的自适应层次聚类算法研究[J].计算机应用研究,2020,37(S01):37-39.
7张睿恺,吴克河.基于优化特征集的LeNet-5攻击检测模型的态势感知技术[J].计算机应用研究,2020,37(S01):287-289. 被引量：4
8李对红,王裴岩 ,张桂平,张少阳.基于字簇的多模型中文分词方法研究[J].计算机应用研究,2020,37(2):355-359. 被引量：2
9尧少波,蒋励剑,赵文文,卢铮,吴昌聚,陈伟芳.耦合聚类的数据驱动稀薄流非线性本构计算方法[J].航空学报,2022,43(S02):43-56.
10段桂芹.基于改进密度的簇内均值最小距离聚类算法[J].智能计算机与应用,2021,11(12):82-86. 被引量：1

同被引文献38

1张林东.一颗长势良好的“豆瓣”[J].上海信息化,2007(5):76-79. 被引量：7
2Peter Harrington.机器学习实战[M].北京:人民邮电出版社,2013:184-239. 被引量：2
3SHIRKY C. Ontology is overrated: categories, links and tags.[ EB/OL ]. [ 2014-11-13 ]. http: //www. shirky, com/writ- ins/ontolozv overrated, html. 被引量：1
4WWINBERGER D. By their tags shall ye know them [ EB/ OLI. [2014-11-23]. http: //www. corante, conom/om/ar- chives/032470, html. 被引量：1
5RAFAILIDIS D, DARAS P. The TFC model: tensor factoriza- tion and tag clustering for item recommendation in social tagging systems [J]. IEEE Transactions on Systems, Man, and Cy- bernetics, Part A. Systems and Humans: A Publication of the IEEE Systems, Man, and Cybernetics Society, 2013, 45 (3) : 673-688. 被引量：1
6POLLNER P, PALLA G, VICSEK T, et al. Clustering of tag- induced subgraphs in complex networks [ J ]. Physica, A. Statistical Mechanics and Its Applications, 2010, 389 (24) : 5887-5894. 被引量：1
7tiOSSAIN, M. SHAHRIAR O, PRAVEEN K R, GRIMM C, et al. Scatter/gather clustering: flexibly incorporating user feed- back to steer clustering results [ J]. IEEE Transactions on Vi- sualization and Computer Graphics, 2012, 18 ( 12 ): 2829-2838. 被引量：1
8KIM H N, E1-SADDIK A, JO G S. Collaborative error-reflec- ted models for cold-start recommender systems [ J ]. Decision Support Systems, 2011, 51 (3) : 519-531. 被引量：1
9Wikipedia. Thomas Vander Wal. Folksonomy. [ EB/OL]. (2013-04-27) [ 2015-12-05 ] https ://en. wikipedia, org/ wiki/Thomas_Vander_Wal#Folksonomy. 被引量：1
10Hotho A, Jiischke R, Schmitz C, et al. Information Retrieval in Folksonomies: Search and Ranking [ J ]. Semantic Web Research & Applications, 2006, 4011: 411-426. 被引量：1

引证文献3

1王娜,葛毓彬.融合标签权值的用户模糊聚类方法研究[J].情报理论与实践,2016,39(3):140-144. 被引量：1
2熊回香,杨雪萍.社会化标注系统中的个性化信息推荐研究[J].情报学报,2016,35(5):549-560. 被引量：18
3朱东郡,李敬兆,谭大禹,杨大禹.基于标签聚类和兴趣划分的协同过滤推荐算法[J].计算机工程,2017,43(11):146-151. 被引量：8

二级引证文献27

1丁丽,方晓.融合用户兴趣和评论文本主题挖掘的推荐算法研究[J].青海师范大学学报（自然科学版）,2022,38(1):14-23.
2查琇山,刘方方.基于缺失值补全和SVD的手游推荐方法[J].计算机应用研究,2020,37(S02):166-169. 被引量：1
3孙雨生,徐鑫.国内基于社会化标签的信息推荐研究进展:架构与应用[J].计算机与数字工程,2023,51(1):42-50.
4田丹,刘奕杉,王玉琳.热点分析类文章的文献计量分析——以词频分析方法为例[J].情报科学,2017,35(8):164-169. 被引量：17
5熊回香,蒋武轩.基于标签与关系网络的用户聚类推荐研究[J].数据分析与知识发现,2017,1(6):36-46. 被引量：3
6刘奕杉,王玉琳,李明鑫.词频分析法中高频词阈值界定方法适用性的实证分析[J].数字图书馆论坛,2017(9):42-49. 被引量：173
7罗双玲,王涛,匡海波.层级标注系统及基于层级标签的分众分类生成算法研究[J].系统工程理论与实践,2018,38(7):1862-1869. 被引量：2
8李旭晖,李媛媛,马费成.我国图情领域社会化标签研究主要问题分析[J].图书情报工作,2018,62(16):120-131. 被引量：11
9闫俊霞.基于借阅数据的图书推荐研究[J].当代图书馆,2018(3):15-19. 被引量：3
10向菲,彭昱欣,邰杨芳.一种基于协同过滤的图书资源标签推荐方法研究[J].图书馆学研究,2018(15):46-52. 被引量：11

1陈梅梅,薛康杰.基于改进张量分解模型的个性化推荐算法研究[J].数据分析与知识发现,2017,1(3):38-45. 被引量：7
2周津,陈超,俞能海.采用对象特征向量表示法的标签聚类算法[J].小型微型计算机系统,2012,33(3):525-530. 被引量：8
3章成志,汤丽娟.基于多语言社会化标签聚类的潜在社会关系网络发现[J].情报理论与实践,2013,36(9):67-71. 被引量：4
4申超波,王志海,孙艳歌.基于标签聚类的多标签分类算法[J].软件,2014,35(8):16-21. 被引量：10
5王晓帅,覃华,丁立朵,马翩翩.用子空间粒子群聚类算法识别Folksonomy标签冗余的研究[J].计算机科学,2012,39(B06):283-287.
6马晓慧.一种改进的可并行的K-medoids聚类算法[J].智能计算机与应用,2016,6(3):100-102. 被引量：1
7夏宁霞,苏一丹,覃华,张敏.社会化标签系统中个性化的用户建模方法[J].计算机应用,2011,31(6):1667-1670. 被引量：10
8赵理,崔杜武.基于汉字拼音声调的文本水印算法[J].计算机工程,2009,35(10):142-144. 被引量：6
9赵理,崔杜武.一种基于遗传优化和汉字声调的文本水印算法[J].中文信息学报,2009,23(5):108-113. 被引量：3
10宋友平,王家宝,苗壮.基于共同属性和标签共现的标签消歧算法[J].解放军理工大学学报（自然科学版）,2016,17(5):409-412.

计算机工程与应用

2015年第2期

浏览历史

内容加载中请稍等...

标签共现的标签聚类算法研究被引量：3

参考文献15

二级参考文献46

共引文献1267

同被引文献38

引证文献3

二级引证文献27

相关作者

相关机构

相关主题

浏览历史

标签共现的标签聚类算法研究 被引量：3

参考文献15

二级参考文献46

共引文献1267

同被引文献38

引证文献3

二级引证文献27

相关作者

相关机构

相关主题

浏览历史

标签共现的标签聚类算法研究被引量：3