期刊文献+

基于PLSA方法的用户兴趣聚类 被引量:5

User Interests Clustering Based on PLSA
下载PDF
导出
摘要 为了在个性化搜索过程中能够准确地挖掘到用户的潜在兴趣并进行相应的聚类分析,提出采用潜语义空间的Zipf分布的特性,并结合PLSA(概率潜在语义分析)来获取全文的语义.即先通过Zipf分布原理找到文档的潜在语义空间,在此空间中对用户的兴趣进行聚类,并建立用户兴趣描述文件(user profile),即建立用户兴趣层次树.实验表明,所提出聚类算法的聚类效果明显优于传统的VSM(向量空间模型)的聚类效果,同时,在著名的CTI数据集上的个性化推荐实验结果也充分说明基于潜在语义空间构建的用户兴趣描述与用户真实兴趣相符合. To mine user's latent interests and make relevantly the clustering analysis during personalized search, it is proposed to combine the characteristics of Zipf distribution in latent semantic space with PLSA (the probability latent semantic analysis ), so as to gain the semantemes of the whole text. Namely, the principle of Zipf distribution is introduced to find out the latent semantic space of files, where the user interest is clustered according to underlying factors and a user interest hierarchy tree is built in user profile. Experimental results show that the clustering result as proposed is clearly superior to that by the conventional VSM (vector space model) algorithm. In addition, the results of the recommended personalized experiment based on well-known CTI data set also indicates fully that the description of user profile on the basis of latent semantic space coincides actually with the user interest.
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2008年第1期53-56,共4页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(60573090 60673139)
关键词 用户兴趣描述文件 PLSA 潜语义空间 ZIPF分布 用户兴趣层次树 user profile PLSA(the probability latent semantic analysis) latent semantic space Zipf distribution user interest hierarchy tree
  • 相关文献

参考文献8

  • 1Ding C H Q.A similarity-based probability model for latent semantic indexing[C]∥Proceedings of the 22nd Annual International ACM SIGIR Conference.New York:ACM Press,1999:59-65. 被引量:1
  • 2Deerwester S,Dumais S,Landauer T,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407. 被引量:1
  • 3Hofmann T.Probabilistic latent semantic analysis[C]∥The 22nd Annual ACM Conference on Research and Development in Information Retrieval.Berkeley:ACM Press,1999:50-57. 被引量:1
  • 4Zhang Y,Xu G,Zhou X.A latent usage approach for clustering web transaction and building user profile[C]∥The 1st International Conference on Advanced Data Mining and Applications(ADMA 2005).New York:Springer-Verlag,2005:31-42. 被引量:1
  • 5Xu G,Zhang Y,Zhou X.A web recommendation technique based on probabilistic latent semantic analysis[C]∥The 6th International Conference on Web Information Systems Engineering(WISE 2005).New York:Springer-Verlag,2005:15-28. 被引量:1
  • 6Chen B.Exploring the use of latent topical information for statistical Chinese spoken document retrieval[J].Pattern Recognition Letters,2006,27(1):9-18. 被引量:1
  • 7Ricardo A,Berthier A.Modern information retrieval[M].Sydney:Addison Wesley,1999. 被引量:1
  • 8Zipf G K.Human behavior and the principle of least effort:an introduction to human ecology[M].Cambridge:Addison-Wesley Press,1949. 被引量:1

同被引文献55

引证文献5

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部