摘要
针对饮食、娱乐、购物、景观、交通和住宿6个旅游主题,基于机器学习方法,开展游客微博主题情感分析方法比较研究。以人工标注的53140条赴日游客微博为数据基础,应用两种机器学习模型开展建模实验,并分析不同特征对建模效果的影响。实验结果显示,两种模型的建模效果良好,适用于游客微博主题情感分析,其中最大熵模型效果略优于支持向量机。研究还表明,在词特征的基础上引入表情符号和主题词进行特征扩展,可以提高模型的建模效果。
Six tourism themes, diet, entertainment, shopping, view, transportation, and accommodation, are selected for thematic sentiment analysis. 53140 Weibo items published by Chinese tourists in Japan are collected and manually labeled as the case study dataset. Maximum Entropy model and Support Vector Machine are adopted. The training results are both fairly good, where the resulting Maximum Entropy model prevails slightly. It can be concluded that machine learning models are reasonably feasible in tourist thematic sentiment analysis. Moreover, the experiment also shows that the models can be improved by introducing emoticon icons and thematic words as supplements to traditional word features.
作者
刘思叶
田原
冯雨宁
庄育龙
LIU Siye;TIAN Yuan;FENG Yuning;ZHUANG Yulong(Institute of Remote Sensing and Geographical Information System,Peking University,Beijing 100871)
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2018年第4期687-692,共6页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
国家重点研发计划(2018YFB0505500
2018YFB0505504)
测绘遥感信息工程国家重点实验室开放研究基金((16)重02)资助
关键词
主题情感分析
游客微博
最大熵模型
支持向量机
thematic sentiment analysis
Weibo of tourists
Maximum Entropy
Support Vector Machine (SVM)