期刊文献+

Tag clustering algorithm LMMSK: improved K-means algorithm based on latent semantic analysis 被引量:7

Tag clustering algorithm LMMSK: improved K-means algorithm based on latent semantic analysis
下载PDF
导出
摘要 With the wide application of Web-2.0 and social software, there are more and more tag-related studies and applications. Because of the randomness and the personalization in users' tagging, tag research continues to encounter data space and semantics obstacles. With the min-max similarity (MMS) to establish the initial centroids, the traditional K-means clustering algorithm is firstly improved to the MMSK-means clustering algorithm, the superiority of which has been tested; based on MMSK-means and combined with latent semantic analysis (LSA), here secondly emerges a new tag clustering algorithm, LMMSK. Finally, three algorithms for tag clustering, MMSK-means, tag clustering based on LSA (LSA-based algorithm) and LMMSK, have been run on Matlab, using a real tag-resource dataset obtained from the Delicious Social Bookmarking System from 2004 to 2009. LMMSK's clustering result turns out to be the most effective and the most accurate. Thus, a better tag-clustering algorithm is found for greater application of social tags in personalized search, topic identification or knowledge community discovery. In addition, for a better comparison of the clustering results, the clustering corresponding results matrix (CCR matrix) is proposed, which is promisingly expected to be an effective tool to capture the evolutions of the social tagging system. © 2017 Beijing Institute of Aerospace Information. With the wide application of Web-2.0 and social software, there are more and more tag-related studies and applications. Because of the randomness and the personalization in users' tagging, tag research continues to encounter data space and semantics obstacles. With the min-max similarity (MMS) to establish the initial centroids, the traditional K-means clustering algorithm is firstly improved to the MMSK-means clustering algorithm, the superiority of which has been tested; based on MMSK-means and combined with latent semantic analysis (LSA), here secondly emerges a new tag clustering algorithm, LMMSK. Finally, three algorithms for tag clustering, MMSK-means, tag clustering based on LSA (LSA-based algorithm) and LMMSK, have been run on Matlab, using a real tag-resource dataset obtained from the Delicious Social Bookmarking System from 2004 to 2009. LMMSK's clustering result turns out to be the most effective and the most accurate. Thus, a better tag-clustering algorithm is found for greater application of social tags in personalized search, topic identification or knowledge community discovery. In addition, for a better comparison of the clustering results, the clustering corresponding results matrix (CCR matrix) is proposed, which is promisingly expected to be an effective tool to capture the evolutions of the social tagging system. © 2017 Beijing Institute of Aerospace Information.
出处 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2017年第2期374-384,共11页 系统工程与电子技术(英文版)
基金 supported by the National Natural Science Foundation of China(71271018 71531001)
关键词 Application programs Data mining MATLAB SEMANTICS Social networking (online) WEBSITES Application programs Data mining MATLAB Semantics Social networking (online) Websites
  • 相关文献

同被引文献55

引证文献7

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部