摘要
文章提出了一种热门微博分类的新思路,通过对热门微博的转发用户进行聚类分析,并根据不同的用户聚集状态来区分不同种类的热门微博。在用户聚类中采用了基于K-means聚类算法的改进算法X-means,并根据微博用户数据特点对X-means算法进行了进一步改进,将属性差异和用户节点差异考虑在聚类过程当中。其中,在对X-means算法改进过程中,对于用户属性的加权采用了基于对数函数的加权方式,确保聚类结果更加科学、准确;在对用户自身权重的加权中,通过建立重点人员信息库的方式,实现了对特殊用户节点的加权,并利用HITS算法对重点人员信息库实现动态更新。在完成用户聚类之后,将得到的重要用户的信息分领域录入重点人员信息库,实现聚类过程与信息库的反馈机制。另外,实验将相同数据分别代入改进前后的K-means算法与X-means算法中,并通过轮廓系数评价聚类结果,证明了改进后的X-means算法在微博用户聚类中更有优势。
This paper proposed a new idea for popular microblogging classifi cation, by analyzing the users who forwarded the popular microblogging to obtain the clustering result, and distinguishing the different kinds of popular microblogging depending on the aggregation state of user. The user clustering algorithm is called X-means algorithm which improved on the basis of K-means clustering algorithm, and improved further according to the characteristics of the microblogging user. Taking into account the difference of the user themselves and their attributes, this paper used a weighted approach based on the logarithmic function in the process of improving X-means algorithm,which can ensure that the clustering results more scientifi c and accurate. Simultaneously, this paper achieved a weighted approach for the special nodes by the way of establishing a Key-Personnel- Database, then this paper achieved the dynamic updates of the database with the HITS algorithm. After completing the user clustering, the experiment put the important user information into the Key-Personnel- Database in different fi elds, by which can achieve the feedback mechanism between the clustering processes and the database. In addition, clustered the microblogging user with the X-means algorithm and the k-means algorithm as well as their improved algorithm, and ultimately proved the improved X-means algorithm has more advantages in the microblogging user clustering.
出处
《信息网络安全》
2016年第1期81-87,共7页
Netinfo Security
基金
公安部重点研究计划[2011ZDYJGADX016]
关键词
微博分类
用户聚类
轮廓系数
microblogging classifi cation
user clustering
outline coeffi cient