期刊文献+

基于用户角色定位的微博热点话题检测方法 被引量:9

Micro-blog hot topics detection method based on user role orientation
下载PDF
导出
摘要 针对在海量微博数据中提取热点话题效率较低的问题,在对用户角色分类的基础上,提出了一种新的热点话题检测方法。首先,根据用户关注度进行用户角色定位,过滤掉部分用户的噪声数据;其次,采用结合语义相似度的TF-IDF函数计算特征权重,降低语义表达形式带来的误差;然后,用改进的Single-Pass聚类算法进行话题聚类,提取出微博话题;最后,根据微博转发数、评论数等对话题热度进行评估排序,从而发现热点话题。实验表明,所提出的方法使漏检率和误检率分别平均降低12.09%和2.37%,有效地提高了话题检测的正确率,验证了该方法的可行性。 To solve the low extraction efficiency for extracting hot topics in huge amounts of micro-blog data, a new topics detection method based on user role orientation was proposed. Firstly, some noise data of parts of users were filtered out by user role orientation. Secondly, the feature weight was calculated by the Term Frequency-Inverse Document Frequency (TF- IDF) function combined with semantic similarity to reduce the error caused by semantic expression. Then, the improved Single-Pass clustering algorithm was used to extract the topics of micro-biog. Lastly, the heat evaluation of miero-blog topics was made according to the number of reposts and comments, thus the hot topics were found. The results show that the average missing rate and false detection rate respectively decrease by 12.09% and 2.37%, and further indicate the topic detection accuracy rate is effectively improved and the method is feasible.
出处 《计算机应用》 CSCD 北大核心 2013年第11期3076-3079,共4页 journal of Computer Applications
关键词 微博 话题检测 用户角色 语义相似度 Single—Pass聚类 micro-blog topic detection user role semantic similarity Single-Pass clustering
  • 相关文献

参考文献19

  • 1薛峰,周亚东,高峰,刘霁,赵俊舟,党琪.一种突发性热点话题在线发现与跟踪方法[J].西安交通大学学报,2011,45(12):64-69. 被引量:23
  • 2骆卫华,刘群,程学旗.话题检测与跟踪技术的发展与研究[C]//全国第七届计算语言学联合学术会议(JSCL-2003)论文集.北京:清华大学出版社.2003:560.566. 被引量:1
  • 3孙胜平..中文微博客热点话题检测与跟踪技术研究[D].北京交通大学,2011:
  • 4WAYNE C. Multilingual topic detection and tracking: successful re- search enabled by corpora and evaluation [ EB/OL]. (2012-10-18) [2013-04- 25]. http://www, lrec-conf, org/proceedings/lrec2000! html/summary/169, htm. 被引量:1
  • 5ALLAN J, CARBONELL J, DODDINGTON G, et al. Topic detec- tion and tracking pilot study: final report [ C]//EUROSPEECH'98: Proceedings of the Defense Advanced Research Projects Agency (DARPA) Broadcast News Transcriptions and Understanding Work- shop. San Francisco: Morgan Kaufmann, 1998:194-218. 被引量:1
  • 6ALLAN J, LAVRENKO V, FREY D, et al. UMass at TDT 2000 [ C]// Proceedings of Topic Detection and Tracking Workshop. Washington: National Institute of Standards and Technology, 2002:109 -115. 被引量:1
  • 7WALLS F, JIN H, SISTA S, et al. Topic detection in broadcast news [ C]// EUROSPEECH'99: Proceedings of the Defense Ad- vanced Research Projects Agency (DARPA) Broadcast News Work- shop. San Francisco: Morgan Kaufmann, 1999:248-255. 被引量:1
  • 8贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 9YANG Y, CARBONELL J, BROWN R, et al. Multi-strategy learn- ing for topic detection and tracking [ C]//Proceedings of the Topic Detection and Tracking: Event-based Information Organization. Nor- well: Kluwer Academic Publishers, 2002:85 - 114. 被引量:1
  • 10ALLAN J, LAVRENKO V, MALIN D, et al. Detections, Bounds, Timelines: UMass and TDT3 [ C]// Proceedings of the Topic Detection and Tracking Workshop. Washington: National In- stitute of Standards and Technology, 2000:167 - 174. 被引量:1

二级参考文献37

  • 1徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181-184. 被引量:56
  • 2罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量:55
  • 3张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:120
  • 4董振东,董强.知网[EB/OL].http://keenage.com/zhiwang/e_zhiwang.html,1999. 被引量:1
  • 5哈尔滨工业大学信息检索研究室.语言技术平台LTP[EB/0L].http://ir.hit.edu.cn/,2006. 被引量:1
  • 6CROFT B, METZLER D, STROHMAN T. Search engines: information retrieval in practice [M]. Reading, MA, USA: Addison-Wesley Publishing Company, 2009: 552. 被引量:1
  • 7LI Hong, WEI Jinfeng. Netnews bursty hot topic detection based on bursty features [C] // Proceedings of International Conference on E-Business and E-Government. Washington DC, USA: IEEE, 2010:1437- 1440. 被引量:1
  • 8HOLZ F, TERESNIAK S. Towards automatic detection and tracking of topic change[M] // GELBUKH A. Computational Linguistics and Intelligent Text Processing. Berlin, Germany: Springer-Verlag, 2010: 327-339. 被引量:1
  • 9JING Qiu, LIAO Lejian, DONG Xiujie. Topic detection and tracking for Chinese news web pages [C]// Proceedings of Seventh International Conference on Advanced Language Processing and Web Information Technology. Washington DC, USA: IEEE Computer Society, 2008: 114-120. 被引量:1
  • 10ALLAN J, PAPKA R, LAVRENKO V. On-line new event detection and tracking [C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 1998: 37-45. 被引量:1

共引文献109

同被引文献129

引证文献9

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部