摘要
情感分类是用于判断数据的情感极性,广泛用于商品评论、微博话题等数据。标记信息的昂贵使得传统的情感分类方法难以对不同领域的数据进行有效的分类。为此,跨领域情感分类问题引起广泛关注。已有的跨领域情感分类方法大多以共现为基础提取词汇特征和句法特征,而忽略了词语间的语义关系。基于此,提出了基于word2vec的跨领域情感分类方法 WEEF(cross-domain classification based on word embedding extension feature),选取高质量的领域共现特征作为桥梁,并以这些特征作为种子,基于词向量的相似度计算,将领域专有特征扩充到这些种子中,形成特征簇,从而减小领域间的差异。在SRAA和Amazon产品评论数据集上的实验结果表明了方法的有效性,尤其在数据量较大时。
Sentiment classification aims to judge the sentiment polarity of review holders,which is popularly and widely applied in commodity comments and weibo topics etc.Due to the expensive cost in the labeling,the issue of cross-domain sentiment classification attracts more attention recently.However,most of cross-domain sentiment classification methods extract lexical features and syntactic characteristics based on the co-occurrence relationship,which ignore the semantic information among words.Motivated by this,this paper proposed a feature extension approach based on word embedding in word2vec,called WEEF,for cross-domain sentiment classification.It first selected high-quality domain-independent features as bridge,and used these features as the seeds.Second,it expanded domain-specific features to the seeds based on the similarity of word embedding,and generated the feature-clusters,which was beneficial to reduce the divergence between domain-specific words in different domains.Finally,experimental results conducte on SRAA and Amazon product reviews datasets show the effectiveness of the proposed approach especially in large scale of data sets.
作者
王勤勤
张玉红
李培培
胡学钢
Wang Qinqin;Zhang Yuhong;Li Peipei;Hu Xuegang(School of Computer Science&Information Engineering,Hefei University of Technology,Hefei 230009,China)
出处
《计算机应用研究》
CSCD
北大核心
2018年第10期2924-2927,共4页
Application Research of Computers
基金
国家重点研发计划资助项目(2016YFC0801406)
国家自然科学基金资助项目(61673152
61503112)
关键词
语义特征
共现特征
词向量
跨领域情感分类
semantic characteristics
co-occurrence characteristics
word vector
cross-domain sentiment classification