期刊文献+

Online belief propagation algorithm for probabilistic latent semantic analysis 被引量:2

Online belief propagation algorithm for probabilistic latent semantic analysis
原文传递
导出
摘要 Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memoryefficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online ex- pectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm. Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memoryefficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online ex- pectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2013年第4期526-535,共10页 中国计算机科学前沿(英文版)
关键词 probabilistic latent semantic analysis topicmodels expectation maximization belief propagation probabilistic latent semantic analysis, topicmodels, expectation maximization, belief propagation
  • 相关文献

参考文献27

  • 1Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Communications of the ACM, 1975, 18(11): 613-620. 被引量:1
  • 2Thomas K, Landauer P W F, Laham A F. An introduction to latent semantic analysis. Communications of the ACM, 1998,25: 259-284. 被引量:1
  • 3Hoffman T. Probabilistic latent semantic analysis: uncertainty in arti- ficial intelligence. 1999. 被引量:1
  • 4Blei 0 M, Ng A Y, Jordan M I. Latent dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993-1022. 被引量:1
  • 5Canini K R, Shi L, Griffiths T L. Online inference of topics with latent dirichlet allocation. In: Proceedings of the 2009lnternational Confer-ence on Artificial Intelligence and Statistics. 2009, 65-72. 被引量:1
  • 6Zeng J, Cheung W K, Liu J. Learning topic models by belief prop-agation. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 2012, I. 被引量:1
  • 7Zhuang L, She L, Jiang Y, Tang K, Yu N. Image classification via semi-supervised PLSA. In: Proceedings of the 5th International Con-ference on Image and Graphics. 2009, 205-208. 被引量:1
  • 8Xu J, Ye G, Wang Y, Wang W, Yang J. Online learning for plsa-based visual recognition. Computer Vision-ACCV 2010, 2011, 95-108. 被引量:1
  • 9AISumait L, Barbara 0, Domeniconi C. On-line LOA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE International Confer-ence on Data Mining. 2008,3-12. 被引量:1
  • 10Yao L, Mimno 0, McCallum A. Efficient methods for topic model inference on streaming document collections. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discov-ery and Data Mining. 2009, 937-946. 被引量:1

同被引文献2

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部