摘要
智能手机和微博客户端强化了微博的媒体特性,实时发现微博话题具有现实意义。文章提出了一种基于关键字分类的中文微博热点话题发现方法,通过关键字对微博信息进行筛选和归类,以时间窗内词频和增长速度构造赋权函数提取主题词,词汇的同文本条件概率作为相似度判定依据,基于改进的单遍聚类算法进行主题词聚类。对系统运行结果分析表明,该方法可以实时有效地聚类发现微博热点话题。
Smart-phones and micro-blog client reinforce the micro-blog media features. Therefore, Micro-blog hot topic real-time detection can provide valuable research results in relevant ifelds. The paper introduces a real-time hot micro-blog topic detection method based on keywords classiifcation. Filtered micro-blog messages were classiifed according to keywords. A multi-weight function based on the word frequency and growth in the time window was used to extract the key words of micro-blog information. An improved single-pass clustering algorithm based on same-text conditional probability was used to ifnd the micro-blog hot topic. The results show that the approach is effect in clustering micro-blog hot topic in real time.
出处
《信息网络安全》
2014年第9期127-131,共5页
Netinfo Security
关键词
分类
微博
话题发现
聚类
classiifcation
micro-blog
topic detection
clustering