摘要
文本情绪分析研究近年来发展迅速,但相关的中文情绪语料库,特别是面向微博文本的语料库构建尚不完善。为了对微博文本情绪表达特点进行分析以及对情绪分析算法性能进行评估,该文在对微博文本情绪表达特点进行深入观察和分析的基础上,设计了一套完整的情绪标注规范。遵循这一规范,首先对微博文本进行了微博级情绪标注,对微博是否包含情绪及有情绪微博所包含的情绪类别进行多标签标注。而后,对微博中的句子进行有无情绪及情绪类别进行标注,并标注了各情绪类别对应的强度。目前,已完成14 000条微博,45 431句子的情绪标注语料库构建。应用该语料库组织了NLP&CC2013中文微博情绪分析评测,有力地促进了微博情绪分析相关研究。
The research on text emotion analysis has made substantial progesses in recent years. However, the emotion annotated corpus is less developed, especially the ones on micro-blog text. To support the analysis on the emotion expression in Chinese micro-blog text and the evaluation of the emotion classification algorithms, an emotion annotated corpus on Chinese micro-blog text is designed and constructed. Based on the observation and analysis on the emotion expression in micro-blog text, a set of emotion annotation specification is developed. Following this specification, the emotion annotation on micro-blog level is firstly performed. The annotated information includes whether the micro-blog text has emotion expression and the emotion categories corresponding to the micro-blog with emotion expressions. Next, the sentence-level annotation is conducted. Meanwhile, the annotation on whether the sentence has emotion expression and the emotion categories, the strength corresponding to each emotion category is annota- ted. Currently, this emotion annotated corpus consists of 14 000 micro-blogs, totaling 45 431 sentences. This corpus was used as the standard resource in the NLP&CC2013 Chinese micro-blog emotion analysis evaluation, facilitating the research on emotion analysis to a great extent.
出处
《中文信息学报》
CSCD
北大核心
2014年第5期83-91,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(61203378
61300112
61370165)
高等院校博士学科点专项基金(20122302120 070)
广东省自然科学基金(S2012040007390
S2013010014475)
模式识别国家重点实验室开放课题基金
深圳市基础研究计划(JCYJ20120613152557576
JC201005260118A)
深圳市国际合作计划(GJHZ201206131 106 1217)
百度高校合作项目
关键词
情绪语料库
语料库构建
情绪标注
微博文本
emotion corpus
corpus construction
emotion annotation
micro-blog text