期刊文献+

文本分类中基于单词表示的全局向量模型和隐含狄利克雷分布的文本表示改进方法 被引量:3

Improved Text Representation Method Based on GloVe and LDA for Text Classification
下载PDF
导出
摘要 针对文本分类中文本数据表示存在稀疏性、维度灾难、语义丢失的问题,提出一种基于单词表示的全局向量(global vectors for word representation,GloVe)模型和隐含狄利克雷分布(latent Dirichlet allocation,LDA)主题模型的文本表示改进方法。利用GloVe模型结合局部信息和全局词语共现的统计信息训练得到文本的稠密词向量,基于LDA主题模型生成文本隐含主题和相应的概率分布,构建文本向量以及基于概率信息的主题向量,并计算两者之间的相似性作为分类器的输入。实验结果表明,相比其他几种文本表示方法,改进方法在精确率、召回率和F1值上均有所提高,基于GloVe和LDA的文本表示改进方法能有效提升文本分类器的性能。 Aiming at the problems of sparseness,dimensionality disaster and semantic loss in text data representation in text classification,a text representation improvement method based on global vectors for word representation(GloVe)model and latent Dirichlet allocation(LDA)topic model was proposed.The GloVe model was used to combine local information and global word co-occurrence statistical information training to obtain the dense word vector of the text,the hidden topic of the text and the corresponding probability distribution was generated based on the LDA topic model,the text vector and topic vector were constructed based on probability information,and the similarity between the two was used as the input of the classifier.The experimental results show that compared with several other text representation methods,the accuracy rate,recall rate and F1 value are improved.The text representation improvement method based on GloVe and LDA can effectively improve the performance of the text classifier.
作者 陈可嘉 刘惠 CHEN Ke-jia;LIU Hui(School of Economics and Management, Fuzhou University, Fuzhou 350116, China)
出处 《科学技术与工程》 北大核心 2021年第29期12631-12637,共7页 Science Technology and Engineering
基金 国家自然科学基金(71701019)。
关键词 文本表示 GloVe模型 LDA主题模型 文本分类 词向量 text representation GloVe model latent Dirichlet allocation topic model text classification word embedding
  • 相关文献

参考文献13

二级参考文献95

共引文献212

同被引文献25

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部