摘要
分析了句型结构中的否定句和转折句对倾向词极性的影响,提出一种基于句型结构的领域倾向性词表构建算法.该方法不仅考虑了词与词之间的相关性,也考虑了词与文档之间的相关性信息.该算法利用改进的拉普拉斯平滑方法来计算候选词和基准词之间的语义相关性,同时结合词与文档的相关性信息,加入了对转折句和否定句的处理,最后采用改进的信息瓶颈算法进行聚类.实验结果显示,采用该方法对酒店、电脑和书籍三个领域的语料分别构建领域倾向性词表,可以得到最高为85.2%准确率.
Analyze the affect of sentence structure to the polarity of sentiment word and propose an approach based on sentence structure for constructing domain-oriented sentiment lexicon.This approach not only takes the semantic relevance between words into count,but also the semantic relevance between words and documents.The algorithm adopts the improved Laplace smoothing to calculate the semantic relevance between words;and turning sentences and negative sentences are handled simultaneously;finally use modified IB algorithm to cluster the candidate words.The experiments show that while constructing domain-oriented sentiment lexicon in three areas the highest rate of accuracy we got is 85.2%.
出处
《福州大学学报(自然科学版)》
CAS
CSCD
北大核心
2011年第4期517-521,共5页
Journal of Fuzhou University(Natural Science Edition)
基金
福建省自然科学基金资助项目(2010J05133)
福建省科技创新平台计划资助项目(2009J1007)
福州大学科技发展基金资助项目(2010-XQ-22)
关键词
倾向性分析
领域倾向词表
信息瓶颈算法
句型结构
propensity analyze
domain-oriented sentiment lexicon
information bottleneck algorithm
sentence structure