摘要
随着上网用户的增多,人们在网络上贡献了各式各样的文献,这些文献形成了海量的文本数据,潜藏着巨大的价值。文献的文类和整理是一项非常具有挑战性的工作,抽取文档特征信息成了目前重要研究方向之一。针对传统方法对文本数据的特征提取时,文本特征维数大、处理效率低等问题,文章设计了基于卷积神经网络的文本特征提取方法,搭建了卷积神经网络模型,选取了卷积神经网络的各项参数,实验的输入数据集为中文语料库中的文本,使用Word2vec工具集进行文本向量转换,对文本特征提取采用卷积神经网络算法,通过K-means聚类算法对文本特征进行验证,验证了本文设计的基于卷积神经网络的文本特征提取方法的有效性。
With the increase of Internet users, people have contributed a variety of documents on the network. Thesedocuments have formed a huge amount of text data, which has a great value. Document classification and collation is achallenging task. Extracting feature information from documents has become one of the important research directions.In order to extract the feature of text data from traditional methods, the dimension of text feature is large and theprocessing efficiency is low. In this paper, a text feature extraction method based on convolution neural network isdesigned, and a convolution neural network model is built, and the parameters of the convolution neural network areselected. The input data set in the experiment is Chinese corpus. In the text, the text vector conversion is carried outusing the Word2vec tool set. The text feature extraction adopts the convolution neural network algorithm. The textfeatures are verified by the K-means clustering algorithm, and the effectiveness of the text feature extraction methodbased on the convolution neural network is verified.
作者
刘钢
李宗晨
郭建伟
Liu Gang;Li Zongchen;Guo Jianwei(College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China;Changchun Finance College, Modern Education Center, Changchun 130012, China)
出处
《江苏科技信息》
2018年第14期21-23,28,共4页
Jiangsu Science and Technology Information
基金
吉林省科技厅重大科技招标专项
项目编号:20160203010GX
吉林省发改委项目产业创新专项资金项目
项目编号:20170505MA2