摘要
短文本由于特征稀疏并且多歧义等特点,导致难以对其进行高效的分类。本文首先针对短文本的特点,介绍了短文本分类的研究现状,其次对短本文分类涉及到的技术及相关理论进行了阐述,并对文本预处理技术、Word2vec以及LDA模型等文本表示方法进行了重点分析。最后总结了短文本分类未来的发展趋势。
It is difficult to classify the short text efficiently because of its sparse features and multiple ambiguities.In this paper,according to the characteristic of short text,this paper introduces the research status quo of short text classification.Second,the classification of involved technology and related theory are expounded,and the text pretreatment technology,Word2vec and LDA model focuses on text representation methods are analyzed.Finally,summarizes the trend of the development of short text classification.
作者
邓丁朋
周亚建
池俊辉
李佳乐
DENG Ding-peng;ZHOU Ya-jian;CHI Jun-hui;LI Jia-le(School of Cyber Science and Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China)
出处
《软件》
2020年第2期141-144,共4页
Software
关键词
短文本分类
主题建模
分类器
文本表示
Short text Classification
Topic modeling
Classifier
Text representation