摘要
针对传统文本分类方法中没有考虑单词语义信息的问题,提出一种结合关联语义和卷积神经网络(CNN)的文本分类方法。首先,对文本进行预处理提取出词干。然后,将每个单词与其相关联的上下文单词相结合,以此构建包含语义信息的词向量。接着,将文本的词向量矩阵输入到CNN中,通过卷积层和最大池化层来获得最佳特征,通过输出层获得分类概率。最后,以最小化代价函数来训练CNN模型,以此构建最终的文本分类器。在2个中文数据集上的实验结果表明,该方法能够实现文本的准确分类,具有可行性和有效性。
For the issue that the semantic information of the word is not considered in the traditional text classification method, a text classification method combining the association semantics and convolution neural network (CNN) is proposed. Firstly, the text is pretreated to extract the stem. Then, each word is combined with its associated context word to construct a word vector containing semantic information. Then, the word vector matrix of the text is input into the CNN, and the best feature is obtained by the convolution layer and the maximum pooling layer, and the classification probability is obtained through the output layer. Finally, the CNN model is trained with a minimized cost function to construct the final text classifier. The experimental results on two Chinese datasets show that the method can achieve the accurate classification of the text, and it is feasible and effective.
出处
《控制工程》
CSCD
北大核心
2018年第2期367-370,共4页
Control Engineering of China
基金
河南省科技厅科技攻关项目(No.162102310606)
河南省教育厅资助项目(No.16A520067)
关键词
文本分类
关联语义
卷积神经网络
最大池化
Text classification
associative semantics
convolution neural network
maximum pooling