摘要
如何从海量微博数据中挖掘出有意义的信息,理解热点事件发生的全过程,并发现其中的拐点事件,显得越来越重要.传统的单一依靠词频的方法缺乏对子话题的抽象描述,因此存在一定的局限性.为此结合主题提取和词频统计的技术,提出了一种交互式可视分析方法,对热点事件子话题的演化过程进行不同粒度的展示;再通过比较相邻时间区间子话题词分布的变化,发现关于某些子话题的拐点事件,进而利用词项共现图在微博原文中找到具体信息.这里,用户可以在交互过程中发现最优的参数配置,从而更加有效地分析拐点事件,并理解热点事件发生的全过程.在真实的数据集上进行了实验,并与传统的基于词频的方法和基于主题变化趋势的方法做比较,结果验证了该方法的有效性.
Abundant information can be gained from massive microblog data. Microblogs record the whole process of hot events and people's reactions. It is increasingly important to obtain meaningful and useful information from microblogs, shape a clear picture of the evolution process of hot event and discover some turning points in the hot event. Existing solutions are mainly based on word frequency, which lacks abstract description to sub-topics. This paper proposes a new interactive visualization method that combines the techniques of topic extraction and word frequency statistics, to visualize the evolution process of sub-topics in different granularities. By observing the variation of word distributions in sub-topics for adjacent time intervals, turning-point events related to some sub-topics can be discovered, and then corresponding contents in the microblog can be tracked with the aid of word co-occurrence graphs. During the interactive process, the parameters in the method can be adjusted by users and optimal values can be eventually determined for a better understanding of turning-point events as well as the evolution process of the hot event. Experiments are conducted on real Sina Weibo datasets, and the results demonstrate that this method is more effective than existing ones based on word frequency and topic trends separately.
基金
国家重点基础研究发展计划(973)课题(2013CB329305)
国家自然科学基金青年基金(61402452)资助
关键词
子事件检测
微博
可视分析
主题模型
Sub-topic detection
microblog
topic model
visualization