摘要
【目的】微博用户兴趣发现对微博社交网络的个性化推荐和提升用户满意度具有重要的意义和价值。【方法】不仅通过挖掘用户自身微博数据识别出用户兴趣,而且进一步挖掘其关注用户的微博数据以及他们之间的社交联系,并通过计算用户微博与其关注用户兴趣的相似度以及用户与其关注用户间的亲密度,进一步发现用户兴趣。最后将从两方面发现的兴趣进行合并,得出用户的兴趣。【结果】基于爬取的新浪微博数据集进行实验,准确率和召回率较传统的方法提升15%以上。【局限】数据预处理中,停用词表不充分,没有实现停用词表的自动学习;需人工标注用户兴趣集计算准确率和召回率。【结论】实验结果表明,该方法明显优于传统方法,能够更加有效和准确地发现用户兴趣。
[Objective] Discovering the micro-blog user interests plays an important role in the personalized recom- mendation of micro-blog social network to improve users' satisfaction. [Methods] In this paper, apart from the data mining from the user's own micro-blog, analyze the data of the micro-blogs that followed by this user, as well as the social correlation among them. By computing the similarity between their micro-blogs and intimacy, uncover the user interests further. Also combine the results coming from the two aforementioned aspects to get the interest set of users. [Results] This paper experiments on the dataset gained from Sina Micro-blog, and the precision rate and recall rate rise both more than 15% compared with the traditional method. [Limitations] The stop words are not full in the process of data preprocessing, because of not realize the automatic learning the list of stop words. And needs manually tagging user interest set to calculate the precision rate and recall rate. [Conclusions] The experimental results show that the method is better than the traditional method, and it's more effective and accurate to discover user interests.
出处
《现代图书情报技术》
CSSCI
2015年第1期52-58,共7页
New Technology of Library and Information Service
基金
国家自然科学基金项目"基于语义分析的中文微博信息挖掘方法研究"(项目编号:61370139)
网络文化与数字传播北京市重点实验室资助项目"面向微博的社交网络研究及其舆情分析"(项目编号:ICDD201309)
北京市属高等学校创新团队建设与教师职业发展计划项目"大数据内容理解的理论基础及智能化处理技术"(项目编号:IDHT20130519)的研究成果之一
关键词
微博
兴趣发现
关注用户
Micro-blog Discover interests Following users