摘要
情感分类是目前自然语言处理领域的一个具有挑战性的研究热点,该文主要研究基于半监督的文本情感分类问题。传统基于Co-training的半监督情感分类方法要求文本具备大量有用的属性集,其训练过程是线性时间的计算复杂度并且不适用于非平衡语料。该文提出了一种基于多分类器投票集成的半监督情感分类方法,通过选取不同的训练集、特征参数和分类方法构建了一组有差异的子分类器,每轮通过简单投票挑选出置信度最高的样本使训练集扩大一倍并更新训练模型。该方法使得子分类器可共享有用的属性集,具有对数时间复杂度并且可用于非平衡语料。实验结果表明我们的方法在不同语种、不同领域、不同规模大小,平衡和非平衡语料的情感分类中均具有良好效果。
Recently,sentiment classification has become a hot research topic in natural language processing.In this paper,we focus on semi-supervised approaches for this issue.In contrast to the traditional method based on cotraining,this paper presents a semi-supervised sentiment classification via voting based ensemble learning.We construct a set of diversified sub classifiers by choosing different training sets,feature parameters and classification methods.During each voting round,samples with highest confidence are picked out to double the size of training set and then to update the model.This new method also allows sub classifiers to share useful attributes sets.It has a logarithmic time complexity and can be used for non-equilibrium corpus.Experiments show that this method has achieved good results in the sentiment classification task with corpus in different languages,areas,sizes,and both balanced and unbalanced corpus.
出处
《中文信息学报》
CSCD
北大核心
2016年第2期41-49,106,共10页
Journal of Chinese Information Processing
关键词
情感分类
集成学习
半监督学习
sentiment classification
ensemble learning
semi-supervised learning