摘要
针对微博用户兴趣属性缺失问题,提出一种基于发文内容分析的微博用户兴趣挖掘方法。利用基于短语的主题模型和自动构建的用户兴趣知识库,能够有效地从发文内容中挖掘出高质量的用户兴趣短语并标志其类别,从而实现对微博用户的兴趣挖掘。在SMP CUP 2016数据集上的实验结果表明,主题短语模型在困惑度和短语质量上取得的效果均优于传统的主题模型,用户兴趣挖掘的准确率和召回率最高可达到78%和82%。
To abstract missing interests of microblog users,this paper proposed an data mining approach based on posting message analysis. Using the phrase-LDA and the user interest knowledge base constructed automatically,it could extract high-quality candidate interest phrases from posting messages and implement the interest classification. The experimental results on SMP CUP 2016 dataset show that the phrase-LDA can achieve better results than traditional topic model on perplexity and phrase quality. The accuracy rate and the recall rate of user interest mining can reach 78% and 82% at best respectively.
作者
熊才伟
曹亚男
Xiong Caiwei;Cao Yanan(National Key Engineering Laboratory,Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;School of Computer & Control Engineering,University of Chinese Academy of Sciences,Beijing 100093,China)
出处
《计算机应用研究》
CSCD
北大核心
2018年第6期1619-1623,共5页
Application Research of Computers
基金
国家自然科学基金青年基金资助项目(61403369)
国家科技部重大专项资助项目(2016YFB0801300)
关键词
微博
发文内容
兴趣挖掘
主题短语模型
知识库
mieroblog
mieroblog posts
interests mining
phrase-LDA
knowledge base