期刊文献+

基于卷积神经网络的文档特征提取方法

Text feature extraction method based on convolution neural network
下载PDF
导出
摘要 随着上网用户的增多,人们在网络上贡献了各式各样的文献,这些文献形成了海量的文本数据,潜藏着巨大的价值。文献的文类和整理是一项非常具有挑战性的工作,抽取文档特征信息成了目前重要研究方向之一。针对传统方法对文本数据的特征提取时,文本特征维数大、处理效率低等问题,文章设计了基于卷积神经网络的文本特征提取方法,搭建了卷积神经网络模型,选取了卷积神经网络的各项参数,实验的输入数据集为中文语料库中的文本,使用Word2vec工具集进行文本向量转换,对文本特征提取采用卷积神经网络算法,通过K-means聚类算法对文本特征进行验证,验证了本文设计的基于卷积神经网络的文本特征提取方法的有效性。 With the increase of Internet users, people have contributed a variety of documents on the network. Thesedocuments have formed a huge amount of text data, which has a great value. Document classification and collation is achallenging task. Extracting feature information from documents has become one of the important research directions.In order to extract the feature of text data from traditional methods, the dimension of text feature is large and theprocessing efficiency is low. In this paper, a text feature extraction method based on convolution neural network isdesigned, and a convolution neural network model is built, and the parameters of the convolution neural network areselected. The input data set in the experiment is Chinese corpus. In the text, the text vector conversion is carried outusing the Word2vec tool set. The text feature extraction adopts the convolution neural network algorithm. The textfeatures are verified by the K-means clustering algorithm, and the effectiveness of the text feature extraction methodbased on the convolution neural network is verified.
作者 刘钢 李宗晨 郭建伟 Liu Gang;Li Zongchen;Guo Jianwei(College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China;Changchun Finance College, Modern Education Center, Changchun 130012, China)
出处 《江苏科技信息》 2018年第14期21-23,28,共4页 Jiangsu Science and Technology Information
基金 吉林省科技厅重大科技招标专项 项目编号:20160203010GX 吉林省发改委项目产业创新专项资金项目 项目编号:20170505MA2
关键词 Word2vec 文本分析 K-MEANS 卷积神经网络 Word2vec text analysis K-means convolution neural network
  • 相关文献

参考文献6

二级参考文献78

  • 1李孝明,曹万华.文本信息检索的精确匹配模型[J].计算机科学,2004,31(9):100-102. 被引量:7
  • 2张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:120
  • 3K.haled M Hammouda,Mohamed S Kamel.Efficient phrase-based document indexing for web document clustering[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(10):1279- 1296. 被引量:1
  • 4Joshua Zhexue Huang, Michael K Ng, Hongqiang Rong, et al. Automated variable weighting in k-means type clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):657-668. 被引量:1
  • 5Shehroz S Khan,Amir Ahmad.A cluster center initialization algorithm for k-means clustering[J].Pattem Recognition Letters, 2004,25(11):1293-1302. 被引量:1
  • 6Ramiz M Aliguliyev.Clustering of document collection- a weighting approach [J]. Expert Systems with Applications, 2009,36(4) :7904-7916. 被引量:1
  • 7Tapas Kanungo,David M Mount,Nathan S Net-anyahu,et al.An efficient k-means clustering algorithm [J]. Analysis and Implementation,IEEE Transactions on Pattern Analysis and Machine InteUigence,2002,24(7):881-892. 被引量:1
  • 8Ajith Abraham, Swagatam Das, Amit Konar. Document clustering using differential evolution[C].Vancouver, BC:IEEE Congress on Evolutionary Computation,2006:1784-1791. 被引量:1
  • 9Richard Nock, Frank Nielsen.On weighting clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006,28(8): 1223-1235. 被引量:1
  • 10Slonim N,Tishby N.Document clustering using word clusters via the information bottleneck method[C].Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press,2000:208-215. 被引量:1

共引文献291

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部