Online belief propagation algorithm for probabilistic latent semantic analysis 被引量：2

Online belief propagation algorithm for probabilistic latent semantic analysis

导出

摘要 Probabilistic latent semantic analysis （PLSA） is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation （OBP） algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation （BP） algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memoryefficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online ex- pectation maximization （OEM） in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm. Probabilistic latent semantic analysis （PLSA） is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation （OBP） algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation （BP） algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memoryefficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online ex- pectation maximization （OEM） in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm.

作者 Yun YE Shengrong GONG Chunping LIU Jia ZENG Ning JIA YiZHANG

机构地区 School of Computer Science & Technology Feng Chao Revenue

出处《Frontiers of Computer Science》 SCIE EI CSCD 2013年第4期526-535,共10页 中国计算机科学前沿（英文版）

关键词 probabilistic latent semantic analysis topicmodels expectation maximization belief propagation probabilistic latent semantic analysis, topicmodels, expectation maximization, belief propagation

分类号 TP311.12 [自动化与计算机技术—计算机软件与理论] TP18 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献27

1Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Communications of the ACM, 1975, 18(11): 613-620. 被引量：1
2Thomas K, Landauer P W F, Laham A F. An introduction to latent semantic analysis. Communications of the ACM, 1998,25: 259-284. 被引量：1
3Hoffman T. Probabilistic latent semantic analysis: uncertainty in arti- ficial intelligence. 1999. 被引量：1
4Blei 0 M, Ng A Y, Jordan M I. Latent dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993-1022. 被引量：1
5Canini K R, Shi L, Griffiths T L. Online inference of topics with latent dirichlet allocation. In: Proceedings of the 2009lnternational Confer-ence on Artificial Intelligence and Statistics. 2009, 65-72. 被引量：1
6Zeng J, Cheung W K, Liu J. Learning topic models by belief prop-agation. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 2012, I. 被引量：1
7Zhuang L, She L, Jiang Y, Tang K, Yu N. Image classification via semi-supervised PLSA. In: Proceedings of the 5th International Con-ference on Image and Graphics. 2009, 205-208. 被引量：1
8Xu J, Ye G, Wang Y, Wang W, Yang J. Online learning for plsa-based visual recognition. Computer Vision-ACCV 2010, 2011, 95-108. 被引量：1
9AISumait L, Barbara 0, Domeniconi C. On-line LOA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE International Confer-ence on Data Mining. 2008,3-12. 被引量：1
10Yao L, Mimno 0, McCallum A. Efficient methods for topic model inference on streaming document collections. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discov-ery and Data Mining. 2009, 937-946. 被引量：1

同被引文献2

1曾嘉,严建峰,龚声蓉.复杂文本网数据的主题建模进展[J].计算机学报,2012,35(12):2431-2445. 被引量：5
2高阳,杨璐,刘晓升,严建峰.LDA语义理解研究[J].计算机科学,2015,42(8):279-282. 被引量：2

引证文献2

1龚声蓉,叶芸,刘纯平,季怡.基于在线消息传递的主题追踪方法[J].计算机学报,2015,38(2):249-260. 被引量：2
2常东亚,严建峰,杨璐,刘晓升.基于滑动窗口的主题模型[J].计算机科学,2016,43(12):101-107. 被引量：4

二级引证文献6

1王景田,杨赴云,张月英.单胺氧化酶抑制剂及其相互作用[J].中国药学杂志,2000,35(5):351-353. 被引量：16
2张健伟,严建峰,刘晓升,杨璐.一种基于动态词汇表的在线LDA算法[J].计算机科学,2016,43(12):120-124.
3刘文祺,范明钰,赵永福.隐藏关系下计算机异常干扰检测方法仿真研究[J].计算机仿真,2018,35(1):424-427. 被引量：3
4何伟林,谢红玲,奉国和.潜在狄利克雷分布模型研究综述[J].信息资源管理学报,2018,8(1):55-64. 被引量：25
5居亚亚,杨璐,严建峰.基于动态权重的LDA算法[J].计算机科学,2019,46(8):260-265. 被引量：6
6胡玉兰,赵青杉,牛永洁,陈莉.基于分层Attention机制的Bi-GRU中文文本分类模型[J].长春师范大学学报,2021,40(2):39-45. 被引量：1

1王英林.Automatic Semantic Analysis of Software Requirements Through Machine Learning and Ontology Approach[J].Journal of Shanghai Jiaotong university(Science),2016,21(6):692-701.
2张敏,戈文航.基于概率计算的重叠双聚类算法[J].计算机工程与设计,2012,33(9):3579-3583. 被引量：3
3孙杳如,王芳.自寻优OBP学习算法[J].微型计算机,1995,15(6):38-39.
4陈翠萍,李曼珍.基于BP网络改进算法的热舒适性指标预测方法[J].建筑热能通风空调,2011,30(2):21-24. 被引量：1
5叶仲泉.线性三层降秩神经网络的广义逆矩阵训练算法[J].系统仿真学报,2002,14(10):1306-1309.
6Marta Pikor-Niedziatek.The Semantic Analysis of National Geographic Headlines： The Case Study of English and Polish[J].US-China Foreign Language,2014,12(11):886-894. 被引量：1
7MA Kexiang LIYongzhao ZHANG Hailin ZHU Caizhi ZHANG Yuming.A Belief Propagation Algorithm with Set-Breaking to Lower Error-Floors of Low-Density Parity-Check Codes[J].Chinese Journal of Electronics,2013,22(3):604-608. 被引量：2
8蔡青松,李子木,覃少华,胡建平.基于批处理补丁的流媒体后缀动态缓存算法[J].计算机科学,2004,31(11):31-37.
9赵耀,陈志敏.上下文广告中的一种文本分类方法[J].扬州大学学报（自然科学版）,2011,14(4):43-46.
10YU Jie,YANG Haiquan,TAN Ming,ZHANG Guoning.Building Extraction from LIDAR Based Semantic Analysis[J].Geo-Spatial Information Science,2006,9(4):281-284. 被引量：2

Frontiers of Computer Science

2013年第4期

浏览历史

内容加载中请稍等...

Online belief propagation algorithm for probabilistic latent semantic analysis 被引量：2

参考文献27

同被引文献2

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史