摘要
在社会网络中,标签聚类研究可以解决标签冗余和语义模糊等问题。为了提高聚类有效性,提出综合标签共现信息确定标签特征向量,通过特征向量的提取计算相似度,将传统聚类算法中用几何距离计算对象与中心对象的距离改为用皮尔森相关系数计算,提出结合K-means聚类算法对标签进行聚类的标签共现聚类算法,并分析了算法的复杂度。最后对不同聚类算法进行了相关对比实验,实验结果表明该聚类算法效果要好于其他的聚类算法,从而验证了该聚类算法的有效性和可行性。
In the social network, tag clustering analysis can deal with problems such as tag redundancy and semantic fuzziness and so on. In order to improve the effectiveness of clustering, it proposes to integrate label co-occurrence information and derive the feature vector of label, extracts the feature vector to calculate the similarity. The traditional clustering algorithm uses the geometric distance to calculate the distance to the object and the center of the object, now uses the Pearson correlation coefficient to calculate. The tag clustering algorithm that combines with K-means clustering algorithm to cluster label is proposed, and then analyzes the complexity of the algorithm. Finally, doing relevant comparative experiments for different clustering algorithms, the experimental results show that the proposed clustering algorithm enhances the clustering performance than other clustering algorithms, and verify the availability and effectiveness of the proposed clustering algorithm.
出处
《计算机工程与应用》
CSCD
北大核心
2015年第2期146-150,208,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61303117)
湖北省重点实验室开放基金资助项目(No.znss2013B012)
湖北省教育厅科研基金(No.B2014085
No.B20101104)
武汉科技大学大学生科技创新基金研究项目(No.12ZRC061)
关键词
标签聚类
标签共现
K-MEANS
皮尔森系数
特征向量
tag clustering
tag co-occurrence
K-means
Pearson correlation coefficient
feature vector