摘要
中文文本的情感倾向分析是网络舆情信息挖掘和分析的关键技术之一。提出了一种粒子群-高斯过程算法(PSO-GP)的中文文本情感倾向分类方法,采用粒子群优化算法(Particle Swarm optimization,PSO)进行高斯过程(Gaussian Process)超参数的最优搜索,解决了传统高斯过程中共轭梯度法迭代次数难确定、对初值依赖性强和易陷入局部极小值等问题。首先采用多线程网络爬虫技术采集文本数据组成语料库,构建特定领域情感词典,然后通过情感词匹配选择最有效的特征,降低数据维度,并利用TF-IDF算法计算特征词的权重以生成特征向量。最终,将测试样本输入PSO-GP分类模型。实验结果表明,与传统GP方法相比,提出的改进高斯过程分类模型的分类准确率提高了近15%。
The Chinese texts sentiment orientation analysis is one of the key technologies for network public opinion information mining and analysis.This paper proposed a Chinese texts sentiment classification method based on particle swarm optimization-Gaussian process(PSO-GP)algorithm,which employs optimal hyper parameter search.It solves problems of traditional Gauss iteration process,like difficultly to determine conjugated gradient,strongly dependent on initial value and easily to fall into local minimum.It can collect data set text corpus to construct a domain specific emotion dictionary using multi-threaded web crawler technology,select the most effective features by emotional words,reduce the data dimension,and generate feature vector from feature words by TF-IDF algorithm.Experimental results show that the classification accuracy of the improved classification model is improved by nearly 15%.
出处
《计算机科学》
CSCD
北大核心
2017年第S1期446-450,共5页
Computer Science
关键词
中文文本情感分类
网络爬虫
情感词典
粒子群优化算法
高斯过程
Chinese texts sentiment classification
Web crawlers
Semantic lexicon
Particle swarm optimization
Gaussian process