基于PLSA方法的用户兴趣聚类被引量：5

User Interests Clustering Based on PLSA

下载PDF

导出

摘要为了在个性化搜索过程中能够准确地挖掘到用户的潜在兴趣并进行相应的聚类分析,提出采用潜语义空间的Zipf分布的特性,并结合PLSA(概率潜在语义分析)来获取全文的语义.即先通过Zipf分布原理找到文档的潜在语义空间,在此空间中对用户的兴趣进行聚类,并建立用户兴趣描述文件(user profile),即建立用户兴趣层次树.实验表明,所提出聚类算法的聚类效果明显优于传统的VSM(向量空间模型)的聚类效果,同时,在著名的CTI数据集上的个性化推荐实验结果也充分说明基于潜在语义空间构建的用户兴趣描述与用户真实兴趣相符合. To mine user＇s latent interests and make relevantly the clustering analysis during personalized search, it is proposed to combine the characteristics of Zipf distribution in latent semantic space with PLSA （the probability latent semantic analysis ）, so as to gain the semantemes of the whole text. Namely, the principle of Zipf distribution is introduced to find out the latent semantic space of files, where the user interest is clustered according to underlying factors and a user interest hierarchy tree is built in user profile. Experimental results show that the clustering result as proposed is clearly superior to that by the conventional VSM （vector space model） algorithm. In addition, the results of the recommended personalized experiment based on well-known CTI data set also indicates fully that the description of user profile on the basis of latent semantic space coincides actually with the user interest.

作者陈冬玲王大玲于戈于芳

机构地区东北大学信息科学与工程学院

出处《东北大学学报（自然科学版）》 EI CAS CSCD 北大核心 2008年第1期53-56,共4页 Journal of Northeastern University(Natural Science)

基金国家自然科学基金资助项目(60573090 60673139)

关键词用户兴趣描述文件 PLSA 潜语义空间 ZIPF分布用户兴趣层次树 user profile PLSA（the probability latent semantic analysis） latent semantic space Zipf distribution user interest hierarchy tree

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Ding C H Q.A similarity-based probability model for latent semantic indexing[C]∥Proceedings of the 22nd Annual International ACM SIGIR Conference.New York:ACM Press,1999:59-65. 被引量：1
2Deerwester S,Dumais S,Landauer T,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407. 被引量：1
3Hofmann T.Probabilistic latent semantic analysis[C]∥The 22nd Annual ACM Conference on Research and Development in Information Retrieval.Berkeley:ACM Press,1999:50-57. 被引量：1
4Zhang Y,Xu G,Zhou X.A latent usage approach for clustering web transaction and building user profile[C]∥The 1st International Conference on Advanced Data Mining and Applications(ADMA 2005).New York:Springer-Verlag,2005:31-42. 被引量：1
5Xu G,Zhang Y,Zhou X.A web recommendation technique based on probabilistic latent semantic analysis[C]∥The 6th International Conference on Web Information Systems Engineering(WISE 2005).New York:Springer-Verlag,2005:15-28. 被引量：1
6Chen B.Exploring the use of latent topical information for statistical Chinese spoken document retrieval[J].Pattern Recognition Letters,2006,27(1):9-18. 被引量：1
7Ricardo A,Berthier A.Modern information retrieval[M].Sydney:Addison Wesley,1999. 被引量：1
8Zipf G K.Human behavior and the principle of least effort:an introduction to human ecology[M].Cambridge:Addison-Wesley Press,1949. 被引量：1

同被引文献55

1佘正炜,钱松荣.基于神经网络的文本倾向性分析系统的研究[J].微型电脑应用,2011(12):20-23. 被引量：2
2张引,陈敏,廖小飞.大数据应用的现状与展望[J].计算机研究与发展,2013,50(S2):216-233. 被引量：375
3刘丽.DNA序列分类模型[J].安徽农业大学学报,2005,32(3):393-396. 被引量：6
4石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量：25
5李士进,朱跃龙,刘净.一种基于k-prototype的多层次聚类改进算法[J].河海大学学报（自然科学版）,2007,35(3):342-347. 被引量：1
6KANUNGO T, MOUNT D, NETANYAHU N, et al. An efficient Kmeans clustering algorithm : analysis and implementation [ J ]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 24 (7) :881- 892. 被引量：1
7DEERWESTER S, DUMAIS S T, FURNAS G W, et al. Indexing by latent semantic analysis[ J]. Joumal of the American Society for Information Science, 1990,41 (6) :391- 407. 被引量：1
8KOHONEN T. The self-organizing map [ J]. Proc IEEE, 1990,78 (9) :1464-1480. 被引量：1
9HOFMANN T. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine Learning, 2001,42(1-2) :177-196. 被引量：1
10BOUCHACHIA A. PEDRYCZ W. Enhancement of fuzzy clustering by mechanisms of partial supervision [ J ]. Fuzzy Sets and Systems, 2006, 157(13) :1733- 1759. 被引量：1

引证文献5

1钟将,刘杰.一种基于文本分类的知识树自动构建方法[J].计算机应用研究,2010,27(2):475-478. 被引量：4
2李德玉,翁小奎,李艳红.基于用户兴趣域的混合数据聚类标签算法[J].山西大学学报（自然科学版）,2013,36(2):180-186. 被引量：1
3韦相.基于模糊聚类的家庭成员识别算法[J].电子世界,2013(10):241-241.
4雷鸣,朱明.情感分析在电影推荐系统中的应用[J].计算机工程与应用,2016,52(10):59-63. 被引量：15
5应璇,孙济庆.面向用户兴趣的知识关联挖掘模型研究[J].信息系统学报,2017,11(1):38-48.

二级引证文献20

1李艳红,李德玉,王素格.一种符号型增量数据标签算法[J].计算机科学,2015,42(6):223-227. 被引量：3
2魏三强.大数据在动漫产业中的应用[J].吉首大学学报（自然科学版）,2017,38(3):40-45. 被引量：1
3罗浩,周文静.基于情感词对的高校论坛用户兴趣提取方法[J].东南大学学报（自然科学版）,2017,47(A01):183-186.
4陈娜,毋江波.基于神经网络的多准则决策推荐系统[J].控制工程,2018,25(5):841-848. 被引量：4
5王右雪,苏清华,胡中波.一类BP神经网络优化评分预测的协同过滤推荐算法[J].长江大学学报（自然科学版）,2018,15(17):42-47.
6陈桂芬,张钘铭,刘灿.社区居民健康信息服务方式的初探研究——基于循证医学的视角[J].长春大学学报,2019,29(2):23-27. 被引量：3
7潘洋,陈盛双,李石君.融合因子分解机和用户行为预测的音乐推荐[J].计算机工程与应用,2017,53(17):101-107. 被引量：1
8钟足峰,段尧清,杨曼.可提高多样性的基于重排序图书推荐算法研究[J].现代情报,2017,37(12):59-63. 被引量：5
9张志强,王伟钧,杨晋浩,周晓清,郑加林.一种行业领域词库标识树的正确性检测算法研究[J].现代电子技术,2018,41(18):88-91. 被引量：1
10毕达天,王福.移动图书馆场景化信息接受过程的情感变化研究[J].图书情报工作,2019,63(6):20-28. 被引量：16

1李春妍,王勇.个性化服务中用户兴趣聚类算法研究[J].信息技术,2007,31(10):77-80. 被引量：3
2方晓薇.用SA方法开发管理系统软件[J].职大学报,1995(3):23-25.
3张裕杰,牛文章.由数据流图生成源程序[J].微机发展,1994,4(5):15-20. 被引量：1
4赵静.高校图书馆搜索引擎中Web挖掘的应用研究[J].甘肃科技,2012,28(23):10-14.
5李志先,刘爱萍.基于PLSA模型的用户兴趣聚类算法研究[J].微计算机信息,2009,25(27):214-215.
6窦志成,袁晓洁,何松柏.大规模中文搜索日志中查询重复性分析[J].计算机工程,2008,34(21):40-41. 被引量：10
7如何找回“User Profile”文件夹中的密码[J].电脑迷,2010(20):90-90.
8宋晓华,黄河清,曹元大.基于用户访问统计特性的流媒体文件复制策略[J].南京理工大学学报,2007,31(5):617-621. 被引量：5
9张晋芳,王清心,丁家满,刘彦君,黄心.一种云计算环境下大数据动态迁移策略[J].计算机工程,2016,42(5):13-17. 被引量：12
10梁弼,蒲国林,肖丽利.一种改进的用户兴趣模型构建及应用[J].软件导刊,2014,13(9):141-143. 被引量：1

东北大学学报（自然科学版）

2008年第1期

浏览历史

内容加载中请稍等...

基于PLSA方法的用户兴趣聚类被引量：5

参考文献8

同被引文献55

引证文献5

二级引证文献20

相关作者

相关机构

相关主题

浏览历史

基于PLSA方法的用户兴趣聚类 被引量：5

参考文献8

同被引文献55

引证文献5

二级引证文献20

相关作者

相关机构

相关主题

浏览历史

基于PLSA方法的用户兴趣聚类被引量：5