期刊文献+

基于多特征融合Single-Pass-SOM组合模型的话题检测 被引量:2

Topic Detection of Single-Pass-SOM Combination Model Based on Multi Feature
下载PDF
导出
摘要 当今时代,网络舆情传播速度快、影响力大,而话题检测在网络舆情监管中有着不可替代的作用.针对传统方法提取文本特征不完整和特征维度过高的问题,本文提出了基于时间衰减因子的LDA&&Word2Vec文本表示模型,将LDA模型的隐含主题特征和Word2Vec模型的语义特征进行加权融合,并引入了时间衰减因子,同时起到了降维和提高文本特征完整度的作用.同时,本文又提出了Single-Pass-SOM组合聚类模型,该模型解决了SOM模型需要设定初始神经元的问题,提高了话题聚类的精度.实验结果表明,本文提出的文本表示模型和文本聚类方法较传统方法拥有更好的话题检测效果. Nowadays,internet public opinion has a rapid spread and great influence,and topic detection plays an irreplaceable role in the supervision of public opinion.Aiming at the problems of incomplete feature extraction and high feature dimension in traditional methods,this study proposes LDA&&Word2Vec text representation model based on time decay factor,which combines the hidden subject features by LDA model with the semantic features by Word2Vec model,and adds time decay factor,which can reduce the dimension and improve the integrity of text features.At the same time,this study proposes a Single-Pass-SOM clustering model,which solves the problem of setting initial neurons in SOM model,and improves the accuracy of topic clustering.Experimental results show that the text representation model and text clustering method proposed in this study have better topic detection effect than traditional methods.
作者 李丰男 孟祥茹 焦艳菲 张琳琳 刘念 LI Feng-Nan;MENG Xiang-Ru;JIAO Yan-Fei;ZHANG Lin-Lin;LIU Nian((School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 100049,China;Shenyang Institute of Computing Technology,Chinese Academy of Sciences,Shenyang 110168,China;Shenyang Golding NC Technology Co.Ltd.,Shenyang 110168,China)
出处 《计算机系统应用》 2020年第7期245-250,共6页 Computer Systems & Applications
关键词 话题检测 文本表示 SOM聚类 Single-Pass聚类 Single-Pass-SOM topic detection text representation SOM clustering Single-Pass clustering Single-Pass-SOM
  • 相关文献

参考文献3

二级参考文献33

  • 1骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 2Bollegala D, Matsuo Y, Ishizuka M. Measuring Semantic Similarity between Words Using Web Search Engines//Proc of the 16th Inter- national Conference on World Wide Web. Banff, Canada, 2007: 757 - 766. 被引量:1
  • 3Sahami M, Heilman T D. A Web-Based Kernel Function for Meas- uring the Similarity of Short Text Snippets//Pmc of the 15th Inter- national Conference on World Wide Web. Edinburgh, UK, 2006: 377 - 386. 被引量:1
  • 4Blei D M, Ng A Y, Jordan M I. Latent Diriehlet Allocation. Journal of Machine Learning Research, 2003, 3 : 993 - 1022. 被引量:1
  • 5Heinrich G. Parameter Estimation for Text Analysis [ EB/OL ]. [ 2010 -8-10 ]. http ://www. arbylon, net/publications/text-est, pdf. 被引量:1
  • 6Griffiths T L, Steyvers M. Finding Scientific Topies. Proe of the Na- tional Academy of Sciences of the United States of America, 2004, 101 ( zl ) : 5228 -5235. 被引量:1
  • 7Deerwester S, Dumais S T, Furnas G W, et al. Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science, 1990, 41(6) : 391 -407. 被引量:1
  • 8Hofmann T. Probabilistie Latent Semantic Analysis// Proe of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, USA, 1999: 289 - 296. 被引量:1
  • 9Allan J, Papka R, Lavrenko V. On-Line New Event Detection and Tracking/! Proc of the 21st Annum International ACM SIGIR Con- ference on Research and Development in Information Retrieval. Melbourne, Australia, 1998:37-45. 被引量:1
  • 10Allan J, Carbonell J, Doddington G, et al. Topic Detection and Tracking Pilot Study Final Report//Proc of the DARPA Broadcast News Transcription and Understanding Workshop. Landsdowne, USA, 1998:194-218. 被引量:1

共引文献87

同被引文献7

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部