摘要
面向消费者的公司或者企业都希望了解他们用户的需求,而大量的用户产生的数据在很大程度上就体现了用户的兴趣和需求.提出一种用于社交网站上,针对用户生成内容(User Generate Content UGC)和用户关注信息的用户兴趣发掘方法.首先通过启发式初始化的PLSA模型训练得到贴近兴趣类别的话题模型,然后从训练结果中抽取可靠的话题并以此构建分类器,对用户的分享数据进行分类,最后根据用户的分享数据分类结果来识别用户的兴趣类别.在初始化PLSA模型时,用关键词抽取算法抽取每个分类的关键词,并给这些关键词赋予较高的PLSA初始权重,以此来引导PLSA模型的训练.实验的结果表明:本文方法可以有效的构建用户兴趣类别,并对用户兴趣的挖掘比较理想.
The company that directly faces consumers would like to know about the requirement of their client, and the requirement and interest can be found in these UGC data. A user interest mining system for SNS is proposed based on the large number of User Generated Content and user following information. Firstly, a heuristic initial method is used to train a PLSA model to get a topic model which well reflects the users' interests distribution. Then, we manually pick up the reliable topic to build a classifier to identify the user's interest. During the PLSA initialization period, we use key phrase extraction method to extract the key words of each class and then give these key words higher weights. This can guide the process of PLSA training. The experimental results show that the method can effectively build the user interest classes, and is more effective for mining user interest.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第11期2385-2389,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61070083
61303115)资助