期刊文献+

短文本分类技术研究综述 被引量:11

A Summary of The Research on Short text Classification
下载PDF
导出
摘要 短文本由于特征稀疏并且多歧义等特点,导致难以对其进行高效的分类。本文首先针对短文本的特点,介绍了短文本分类的研究现状,其次对短本文分类涉及到的技术及相关理论进行了阐述,并对文本预处理技术、Word2vec以及LDA模型等文本表示方法进行了重点分析。最后总结了短文本分类未来的发展趋势。 It is difficult to classify the short text efficiently because of its sparse features and multiple ambiguities.In this paper,according to the characteristic of short text,this paper introduces the research status quo of short text classification.Second,the classification of involved technology and related theory are expounded,and the text pretreatment technology,Word2vec and LDA model focuses on text representation methods are analyzed.Finally,summarizes the trend of the development of short text classification.
作者 邓丁朋 周亚建 池俊辉 李佳乐 DENG Ding-peng;ZHOU Ya-jian;CHI Jun-hui;LI Jia-le(School of Cyber Science and Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China)
出处 《软件》 2020年第2期141-144,共4页 Software
关键词 短文本分类 主题建模 分类器 文本表示 Short text Classification Topic modeling Classifier Text representation
  • 相关文献

参考文献9

二级参考文献78

  • 1张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:120
  • 2PARK E K, RA D Y, JANG M G. Techniques for improving Web retrieval effectiveness[J]. Information Processing Management, 2005, 41(5): 1207 -1223. 被引量:1
  • 3LIU W Y, HAO T Y, CHEN W, et al. A Web-based platform for user-interactive question-answering[J]. World Wide Web, 2009, 12(2): 107 -124. 被引量:1
  • 4SALTON G, WONG A, YANG C S. A vector space model for auto-matic indexing[J]. Communications of the ACM, 1975, 18 ( 11) : 613 -620. 被引量:1
  • 5PHAN X H, NGUYEN M L, HORIGUCHI S. Learning to classify short and sparse text & Web with hidden topics from large-scale data collections[C] / / Proceedings of the 17 th Conference on World Wide Web. New York: ACM, 2008: 91 -100. 被引量:1
  • 6WANG L, JIA Y, HAN W H. Instant message clustering based on extended vector space model[C] / / Proceedings of the 2nd Interna-tional Conference on Advances in Computation and Intelligence. Berlin: Springer-Verlag, 2007: 435 - 443. 被引量:1
  • 7SAHAMI M, HEILMAN T D. A Web - based kernel function for measuring the similarity of short text snippets[C] / / Proceedings of the 15th Conference on World Wide Web. New York: ACM, 2006: 377 -386. 被引量:1
  • 8YIH W, MEEK C. Improving similarity measures for short segments of text[C] / / Proceedings of the 22nd Conference on Artificial Intel-ligence. Menlo Park: AAAI Press, 2007: 1489 -1494. 被引量:1
  • 9BANERJEE S, RAMANATHAN K, GUPTA A. Clustering short texts using Wikipedia[C] / / Proceedings of the 30th Annual Inter-national ACM SIGIR Conference on on Research and Development in Information Retrieval. New York: ACM, 2007: 787 -788. 被引量:1
  • 10BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3 ( 3): 993 - 1022. 被引量:1

共引文献313

同被引文献99

引证文献11

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部