摘要
由于社交媒体的普及和灵活性,微博中涌现出越来越多的新词来表达情感态度,新词的发现和情感倾向已成为微博研究的热点问题。主要介绍COAE2014评测任务3的方法与技术。首先提出了一个广义后缀树的词串抽取方法,利用左右灵活度等指标发现潜在新词。然后根据上下文信息对前一步发现的潜在新词采用多重词典,基于模板,统计情感词共现手段判断其情感倾向。最后利用搜索引擎从语义角度进一步优化情感倾向结果。实验结果表明此方法对新词发现和情感倾向判断问题是有效的。
Due to popularity and flexibility of social media,more increasingly created words were used to express peoples feelings and attitudes. Newword detection and sentiment orientation has become a hot issue in M icro-blog analysis. The methods and techniques used in Task 3 of COAE 2014 were introduced. Generalized suffix tree was employed in string extraction,which was determined as newwords with metrics like left-right-flexibility of words etc. Then,with pattern-based and statistic-based methods combined with multiple lexicons,sentiment orientation of newwords was decided. Search engine was also used to optimize result as a supplement from semantic perspective. Results have shown our methods effective in newword detection and sentiment orientation analysis.
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2015年第1期20-25,共6页
Journal of Shandong University(Natural Science)
基金
高等学校学科创新引智计划(111计划)项目(B08004)
新一代宽带无线移动通信网国家科技重大专项(2011ZX03002-005-01)
国家自然科学基金资助项目(61273217)
博士点基金资助项目(20130005110004)
关键词
广义后缀树
新词发现
情感倾向分析
微博
generalized suffix tree
new word detection
sentiment orientation analysis
Micro-blog