期刊文献+

基于退火过渡采样的无向主题模型学习方法 被引量:1

A tempered transition based learning algorithm for undirected topic model
下载PDF
导出
摘要 Replicated Softmax model,是用于文本数据挖掘的无向概率主题模型,为描述语料库的主题分布提供了一个功能强大的框架.然而,作为一个无向的概率图模型,由于归一化常数的存在,该模型的参数学习是十分困难的.针对这一问题,利用退火过渡马尔科夫蒙特卡洛采样方法,借助近似极大似然学习的思想,实现了模型的参数学习.该算法采用基于退火过渡的马尔科夫蒙特卡洛采样方法,高效地探索存在多个孤立的模态的概率分布,提高对概率分布的逼近程度,从而提高了参数学习的效率和精度.实验结果证明了算法在训练时间、泛化能力和文档检索等三个方面的优势. Replicated Softmax model,an undirected topic model for text data mining,provides a powerful framework for extracting semantic topics form document collections.Compared to the directed topic models,it has a better way of dealing with documents of different lengths,and computing the posterior distribution over the latent topic values is easy.However,due to the presence of the global normalizing constant,maximum learning procedure for this model is intractable.Constrastive Divergence(CD)algorithm,is one of the dominant learning schemes for RBMs based on Markov chain Monte Carlo(MCMC)methods.It relies on approximating the negative phase contribution to the gradient with samples drawn from a short alternating Gibbs Markov chain starting from the observed training sample.However,using these short chains yields a low variance,but biased estimate of the gradient,which makes the learning procedure rather slow.The main problem here is the inability of Markov chain to efficiently explore distributions with many isolated modes.In this paper,a new class of stochastic approximation algorithms is considered to learn Replicated Softmax model.To efficiently explore highly multimodal distributions,we use a MCMC sampling scheme based on tempered transitions to generate sample states of a thermodynamic system.The tempered transitions move systematically from the desired distribution,to the easily-sampled distribution,and back to the desired distribution.This allows the Markov chain to produce less correlated samples between successive parameter updates,and henceconsiderably improves parameter estimates.The experiments are conducted on three popular text datasets,and the results demonstrate that we can successfully learn good generative model of real text data that performs well on topic modelling and document retrieval.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2016年第2期335-342,共8页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(61472423 61432008 61532006 U1135005)
关键词 概率主题模型 概率推理 马尔科夫蒙特卡洛 退火过渡采样 probabilistic topic model probabilistic reasoning Markov chain Monte Carlo tempered transitions
  • 相关文献

参考文献24

  • 1Blei D M. Probabilis cations of the ACM, Blei D M, Ng A tic topic models. Communi- 2012,55 (4,) ~ 77-- 84. 被引量:1
  • 2Y, Jordan M I. Latent Dirichllocation. The Journal of Machine Learning Research, 2003,3 : 993 -- 1022. 被引量:1
  • 3Teh Y W,Jordan M I. Hierarchical Bayesian non- parametric models with applications. Bayesian Nonparametrics, 2010,158 -- 202. 被引量:1
  • 4Welling M, Rosen-Zvi M, Hinton G E. Exponential family harmoniums with an applica-tion to information retrieval. In: Advances in Neural Information Processing Systems. Cambridge. MA: MIT Press, 2004,1481 -- 1488. 被引量:1
  • 5Zeng J, Cheung W K, Liu J. Learning topic models by belief propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(5) :1121--1134. 被引量:1
  • 6Xing E P, Yah R, Hauptmann A. Mining associated text and images with dual-wing har- moniums. In: Proceedings of the 21~t Conference on Uncertainty in Artificial Intelligence (UAI2005). Arlington, Virginia: AUAI Press, 2005,633--641. 被引量:1
  • 7Inouye D, Ravikumar P, Dhillon I. Admixture of poisson MRFs: A topic model with word dependencies. In: Proceedings of The 31~t International Conference on Machine Learning. Philadelphia, PA : ACM Press, 2014,683 -- 691. 被引量:1
  • 8周淑媛,肖鹏峰,冯学智,朱榴骏,郭金金.基于马尔可夫随机场模型的SAR图像积雪识别[J].南京大学学报(自然科学版),2015,51(5):976-986. 被引量:2
  • 9Hinton G E, Salakhutdinov R R. Replicated softmax:An undirected topic model. In:Advances in Neural Information Processing Systems. Cambridge. MA: MIT Press, 2009,1607-- 1614. 被引量:1
  • 10Hinton G, Salakhutdinov R. Discovering binary codes for documents by learning deep generative models. Topics in Cognitive Science, 2011,3 ( 1 ) .. 74--91. 被引量:1

二级参考文献37

  • 1Hall D K, Riggs G A, Salomonson V V. Development of methods for mapping global snow cover using moderate resolution imaging spectroradiometer data. Remote sensing of Environment, 1995, 54(2): 127–140. 被引量:1
  • 2Robinson D A, Dewey K F, Heim Jr R R. Global snow cover monitoring: An update. Bulletin of the American Meteorological Society, 1993, 74(9): 1689–1696. 被引量:1
  • 3Rees W G. Remote sensing of snow and ice.CRC Press, 2006. 被引量:1
  • 4Shi J, Dozier J, Rott H. Snow mapping in alpine regions with synthetic aperture radar. IEEE Transactions on Geoscience and Remote Sensing, 1994, 32(1): 152–158. 被引量:1
  • 5Shi J, Dozier J. Mapping seasonal snow with SIR-C/X-SAR in mountainous areas. Remote Sensing of Environment, 1997, 59(2): 294–307. 被引量:1
  • 6Nagler T, Rott H. Retrieval of wet snow by means of multitemporal SAR data. IEEE Transactions on Geoscience and Remote Sensing,2000, 38(2): 754–765. 被引量:1
  • 7Schellenberger T, Ventura B, Zebisch M, et al. Wet snow cover mapping algorithm based on multitemporal COSMO-SkyMed X-band SAR images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2012, 5(3): 1045–1053. 被引量:1
  • 8Strozzi T, Wegmuller U, Matzler C. Mapping wet snowcovers with SAR interferometry. International Journal of Remote Sensing, 1999, 20(12): 2395–2403. 被引量:1
  • 9Singh G, Venkataraman G. Application of incoherent target decomposition theorems to classify snow cover over the Himalayan region. International Journal of Remote Sensing, 2012, 33(13): 4161–4177. 被引量:1
  • 10Singh G, Venkataraman G, Yamaguchi Y, et al. Capability Assessment of Fully Polarimetric ALOS–PALSAR data for Discriminating Wet Snow from Other Scattering Types in Mountainous Regions. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(2): 1177–1196. 被引量:1

共引文献1

同被引文献9

引证文献1

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部