期刊文献+

基于LDA主题模型的标签传递算法 被引量:5

Label propagation algorithm based on LDA model
下载PDF
导出
摘要 标签传递算法是一种半监督分类方法,由于该算法存在要求数据分类结果符合流行假设、数据维数较高时计算复杂度高等问题,在文本分类中效果较差。针对这些问题,经过对LDA主题模型和标签传递算法原理及复杂度的分析,将两者结合,提出一种基于LDA主题模型的标签传递算法LPLDA。该算法用LDA主题模型中的主题表示文本数据,一方面使用LDA主题模型表示文本保证分类结果符合流行假设,另一方面有效减少标签传递算法相似度计算时间。经过实验证明,该算法在标记数据少于待测样本时,分类效果优于传统的有监督分类方法。 Label Propagation (LP) algorithm is one kind of semi-supervised learning methods. However, its performance in text classification is not good enough, because LP algorithm demands manifold assumption and it has high computational complexity in calculating the similarity of high dimension data. A new method was proposed to combine Latent Dirichlet Allocation (LDA) model with LP algorithm to solve the above problems after analyzing their principles and complexities. It represented documents with latent topics in LDA. On one hand, it reduces the dimension of matrixes; on the other hand, it can help LDA model lead to the classification results with manifold assumption. The experimental results show that the new method performs better than traditional supervised text classification methods in testing sets when labeled data is less than unlabeled data.
出处 《计算机应用》 CSCD 北大核心 2012年第2期403-406,410,共5页 journal of Computer Applications
关键词 LDA主题模型 标签传递算法 半监督学习 数据降维 流行假设 Latent Diriehlet Allocation (LDA) model Label Propagation (LP) algorithm semi-supervised learning dimensional reduction manifold assumption
  • 相关文献

参考文献15

  • 1NIGAM K,MCCALLUM A K,THRUN S,et al.Text classification from labeled and unlabeled documents using EM[J]. Machine Learning,1999,39(2):103-134. 被引量:1
  • 2许震,沙朝锋,王晓玲,周傲英.基于KL距离的非平衡数据半监督学习算法[J].计算机研究与发展,2010,47(1):81-87. 被引量:11
  • 3孔祥南,黎铭,姜远,周志华.一种针对弱标记的直推式多标记分类方法[J].计算机研究与发展,2010,47(8):1392-1399. 被引量:13
  • 4ZHU XIAOJIN.Semi-supervised learning literature survey,Computer Sciences TR 1530[R/OL].Madison:University of Wisconsin-Madison,Department of Computer Sciences,2006[2011-05-12].http://pages.cs.wisc.edu/- jerryzhu/pub/ssl_survey.pdf. 被引量:1
  • 5ZHU X,GHAHRAMANI Z,LAFFERTY J.Semi-supervised learning using Gaussian fields and harmonic functions[C]// ICML 2003:The 20th International Conference on Machine Learning.Palo Alto:AAAI,2003:912-919. 被引量:1
  • 6ZHOU D,BOUSQUET O,LAL T N,et al.Learning with local and global consistency[C]// Advances in Neural Information Processing Systems 16:Proceedings of the 2003 Conference.Cambridge:MIT Press,2004:321-328. 被引量:1
  • 7WANG FEI,ZHANG CHANGSHUI.Label propagation through linear neighborhoods[C]//The 23th International Conference on Machine Learning.New York:ACM,2006:985-992. 被引量:1
  • 8ZHU X,GHAHRAMANI Z.Learning from labeled and unlabeled data with label propagation,CMU-CALD-02-107[R].Pittsburgh:Carnegie Mellon University, Department of Computer Science,2002. 被引量:1
  • 9BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(5):993-1022. 被引量:1
  • 10DUMAIS D S,LANDAUER T,FURNAS G,et al.Indexing by latent semantic analysis[J].Journal of the American Society of Information Science,1998,41 (6):391-407. 被引量:1

二级参考文献49

  • 1朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564. 被引量:15
  • 2石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 3Manevitz L M, Yousef M, Cristianini N, et al. One-class SVMs for document classification [J]. Journal of Machine Learning Research, 2001, 2 : 139-154. 被引量:1
  • 4Yu H, Han J, Chang K. PEBL: Positive examples based learning for Web page classification using SVM [C]//Proc of the 8th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2002: 239-248. 被引量:1
  • 5Li X, Liu B, Ng S. Learning to identify unexpected instances in the test set [C]//Proc of the 20th IJCAI. San Francisco: Morgan Kaufmann, 2007:2802-2807. 被引量:1
  • 6Sha C, Xu Z, Wang X, et al. Directly identify unexpected instances in the test set by entropy maximization [C]//Proc of APWEB/WAIM 2009. Berlin: SPringer, 2009: 659-664. 被引量:1
  • 7Manning C D, Raghavan P, Schutze H. An Introduction to Information Retrieval [M]. Cambridge, Cambridge University Press, 2007: 117-119. 被引量:1
  • 8Maimon O, Rokach L. The Data Mining and Knowledge Discovery Handbook [M]. Berlin: Springer, 2005:853-867. 被引量:1
  • 9Gyorfi L, Gyorfi Z, Vajda I. Bayesian decision with rejection [J]. Problems of Control and Information Theory, 1979, 8 (5) : 445-452. 被引量:1
  • 10McCallum A, Nigam K. A comparison of event models for naive Bayes text classification [C]//Proc of AAAI-98 Workshop on Learning for Text Categorization. Menlo Park, CA: AAAI, 1998:41-48. 被引量:1

共引文献75

同被引文献65

  • 1段瑞雪,王小捷,孙月萍,李文峰.HDP主题模型的用户意图聚类[J].北京邮电大学学报,2011,34(S1):55-58. 被引量:6
  • 2陈刚,杨志强,刘秉权.一种基于PLS的概率神经网络分类算法[J].微电子学与计算机,2015,32(5):73-78. 被引量:4
  • 3Blei D M, Ng A Y, Jordan M. Latent Dirichlet Allocation [ J ]. Journal of Machine Learning Research, 2003(3 ) : 993-1022. 被引量:1
  • 4Blei D,Griths T,Jordan M. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hier- archies[J]. Journal of the ACM,2010,57(2) : 1-30. 被引量:1
  • 5Reisinger J, Waters A, Silverthorn B, et al. Spherical topic models [ C ]//The 27th International Conference on Machine Learning (ICML- 10). Haifa: Israel Press, 2010. 被引量:1
  • 6Homan M,Blei D,Bach F. On-line learning for latent dirichlet allocation[C]//In Neural Information Processing Systems. Vancouver: NZPS, 2010. 被引量:1
  • 7Boyd-Graber D, Blei D,Zhu X. A topic model for word sense disambiguation [ C J//Proceedings of the Joint Con- ference of Empirical Methods in Natural Language Pro- cessing and Computational Natural Language Learning. New York : ACM, 2007 : 1024-1033. 被引量:1
  • 8Hofmann T. Probabilistic latent semantic indexing [C]// Proceedings of the 22nd Annual ACM Conference on Re- search and Development in Information Retrieval. New York : ACM Press, 1999 : 50-57. 被引量:1
  • 9Teh Y W,Jordan M,Beal M,et al. Hierarchical dirichlet processes [J]. Journal of the American Statistical Associa- tion,2006, 101(476) : 1566-1581. 被引量:1
  • 10Jones M N, Mewhort D J K, Representing word meaning and order information in a composite holographic lexicon [J]. Psychological Review, 2007,114 (2) : 1-37. 被引量:1

引证文献5

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部