摘要
微博中存在着数以亿计的用户,这些用户每天发布大量的信息。这些海量的微博信息给热点话题发现提出了严峻的挑战。应用LDA(Latent Dirichlet Allocation)模型对微博中隐含的话题进行建模,利用话题间的共享词汇将话题构成一个无向加权图,并通过PageRank算法将话题进行排名。实验结果表明,排名后返回给用户的话题的准确性明显高于未排名的结果。
There are tens of millions of users in microblogging, and they publish massive messages every day. These massive microblogging messages impose severe challenge to hot topics detection. In this paper we model the concealed topics in microblogging with LDA (latent Dirichlet Allocation) model, form the topics as an undirected weighted graph by utilising the sharing words among the topics, and rank the topics in the graph with PageRank algorithm. Experimental results show that the topics returned to users after the PageRank ranking is more accurate than that of non-ranking.
出处
《计算机应用与软件》
CSCD
北大核心
2014年第10期24-26,66,共4页
Computer Applications and Software
基金
江苏省自然科学基金项目(3202uj221)
关键词
微博
话题
图
排名
LDA模型
Microblogging
Topic
Graph
Ranking
Latent Dirichlet allocation model