期刊文献+

基于微博文本和元数据的话题检测

TOPIC DETECTION BASED ON MICROBLOGGING TEXT AND METADATA
下载PDF
导出
摘要 在微博热点话题发现中,微博文本短、词量少、时效性高,传统的话题检测方法不再适用。针对这些新的特点,提出一种基于微博文本和元数据的话题发现方法。首先利用微博发布时间、用户信息、微博转发评论等元数据构造描述微博词汇能量的复合权值,进而提取出话题的主题词汇,然后基于上下文关系构造主题词汇簇,最后对微博文本进行二次聚类,从而得到微博中的隐含话题以及相关微博文本。在真实微博数据上的实验表明,该方法能有效发现热门话题,提高话题检测的准确率和查全率。 Traditional topic detection method is no longer applicable on hot microblogging topic discovery,because microblogs are too short in text with fewer words and high timeliness. For these new characteristics,in this paper we present a topic discovery method which is based on microblogging text and metadata. First,we make use of the metadata,such as posting time of microblogs,users information,and forwarding and comments of microblogs,to construct the composite weight value of microblogging vocabulary energy,and then extract themes vocabulary of topics. After that we construct the themes vocabulary clusters based on the context. At last,we conduct secondary clustering on microblogging texts so that to get the implicit topics in microblogs and the related microblogging texts. Experiments on real microblogging data show that this method can effectively find the hot topics and improve the accuracy rate and recall rate of topics detection.
出处 《计算机应用与软件》 CSCD 2016年第3期67-70,86,共5页 Computer Applications and Software
基金 国家自然科学基金项目(61103046)
关键词 微博 元数据 聚类 话题检测 Microblog Metadata Cluster Topic detection
  • 相关文献

参考文献12

  • 1Kwak H,Lee C,Park H,et al.What is Twitter,a social network or a news media?[C]//Proceedings of the 19th international conference on World wide web,2010:591-600. 被引量:1
  • 2Mori M,Miura T,Shioya I.Topic detection and tracking for news web pages[C]//Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence,2006:338-342. 被引量:1
  • 3Allan J,Carbonell J G,Doddington G,et al.Topic detection and tracking pilot study final report[R].UMass Amherst,1998. 被引量:1
  • 4Yang Y,Carbonell J,Brown R,et al.Multi-strategy learning for topic detection and tracking[M].Topic detection and tracking.Springer US,2002:85-114. 被引量:1
  • 5Allan J,Lavrenko V,Swan R.Explorations within topic tracking and detection[M].Topic detection and tracking.Springer US,2002:197-224. 被引量:1
  • 6Yang Y,Pierce T,Carbonell J.A study of retrospective and on-line event detection[C]//Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval,1998:28-36. 被引量:1
  • 7Yang C,Yang J,Ding H,et al.A Hot Topic Detection Approach on Chinese Microblogging[C]//Proceedings of the International Conference on Information Engineering and Applications(IEA)2012,2013:411-420. 被引量:1
  • 8Zhang S,Luo J,Liu Y,et al.Hotspots detection on microblog[C]//Multimedia Information Networking and Security(MINES),2012Fourth International Conference on,2012:922-925. 被引量:1
  • 9赖锦辉,梁松.一种消除孤立点的微博热点话题发现方法[J].计算机应用与软件,2014,31(1):105-107. 被引量:9
  • 10赵文清,侯小可.基于词共现图的中文微博新闻话题识别[J].智能系统学报,2012,7(5):444-449. 被引量:30

二级参考文献34

  • 1耿焕同,蔡庆生,赵鹏,于琨.一种基于词共现图的文档自动摘要研究[J].情报学报,2005,24(6):651-656. 被引量:15
  • 2刘青宝,侯东风,邓苏,张维明.基于相对密度的增量式聚类算法[J].国防科技大学学报,2006,28(5):73-79. 被引量:13
  • 3MORI M, MIURA T, SHIOYA I. Topic detection and tracking for news web pages[C]//Proceedings of the 2006 ACM International Conference on Web Intelligence. Washington, DC, USA, 2006: 338-342. 被引量:1
  • 4ALLAN J, CARBONELL J, DODDINGTON G, et al. Topic detection and tracking pilot study: final report[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. San Francisco, USA: Morgan Kaufmann Publisher Inc, 1998: 194-218. 被引量:1
  • 5LIU Zitao, YU Wenchao, CHEN Wei, et al. Short text feature selection for microblog mining[C]//The 4th International Conference on Computational Intelligence and Software Engineering. Wuhan, China, 2010: 1-4. 被引量:1
  • 6张华平.NLPIR微博内容语料库-23万条[EB/OL]. (2012-02-14)[2012-05-20]. http://www.nlpir.org/?actionviewnewsitemid231.2012,02,14/2012,02,18. 被引量:1
  • 7张华平.ICTCLAS2012版本SDK发布(u0106版本修正了UTF8下的bug)[EB/OL]. (2011-12-31)[2012-05-20]. http://www.nlpir.org/?actionviewnewsitemid229.2011,12,31/2012,02,18. 被引量:1
  • 8TRIVISON D. Term cooccurrence in cited/citing journal articles as a measure of document similarity[J]. Information Processing & Management, 1987, 23(3): 183-194. 被引量:1
  • 9耿焕同,蔡庆生,于琨,等.一种基于词共现图的文档主题词自动抽取算法[J].南京大学学报:自然科学, 2006, 42(2): 156-162. 被引量:1
  • 10GIGA T. Counting the number of Tweets [ EB/OL ]. ( 2010- 08 ). http ://popaeular. eorrr/gigatweet. 被引量:1

共引文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部