基于LDA主题模型的标签传递算法被引量：5

Label propagation algorithm based on LDA model

下载PDF

导出

摘要标签传递算法是一种半监督分类方法,由于该算法存在要求数据分类结果符合流行假设、数据维数较高时计算复杂度高等问题,在文本分类中效果较差。针对这些问题,经过对LDA主题模型和标签传递算法原理及复杂度的分析,将两者结合,提出一种基于LDA主题模型的标签传递算法LPLDA。该算法用LDA主题模型中的主题表示文本数据,一方面使用LDA主题模型表示文本保证分类结果符合流行假设,另一方面有效减少标签传递算法相似度计算时间。经过实验证明,该算法在标记数据少于待测样本时,分类效果优于传统的有监督分类方法。 Label Propagation （LP） algorithm is one kind of semi-supervised learning methods. However, its performance in text classification is not good enough, because LP algorithm demands manifold assumption and it has high computational complexity in calculating the similarity of high dimension data. A new method was proposed to combine Latent Dirichlet Allocation （LDA） model with LP algorithm to solve the above problems after analyzing their principles and complexities. It represented documents with latent topics in LDA. On one hand, it reduces the dimension of matrixes; on the other hand, it can help LDA model lead to the classification results with manifold assumption. The experimental results show that the new method performs better than traditional supervised text classification methods in testing sets when labeled data is less than unlabeled data.

作者刘培奇孙捷焓

机构地区西安建筑科技大学信息与控制工程学院

出处《计算机应用》 CSCD 北大核心 2012年第2期403-406,410,共5页 journal of Computer Applications

关键词 LDA主题模型标签传递算法半监督学习数据降维流行假设 Latent Diriehlet Allocation （LDA） model Label Propagation （LP） algorithm semi-supervised learning dimensional reduction manifold assumption

分类号 TP181 [自动化与计算机技术—控制理论与控制工程] TP391.4 [自动化与计算机技术—控制科学与工程]

引文网络
相关文献

参考文献15

1NIGAM K,MCCALLUM A K,THRUN S,et al.Text classification from labeled and unlabeled documents using EM[J]. Machine Learning,1999,39(2):103-134. 被引量：1
2许震,沙朝锋,王晓玲,周傲英.基于KL距离的非平衡数据半监督学习算法[J].计算机研究与发展,2010,47(1):81-87. 被引量：11
3孔祥南,黎铭,姜远,周志华.一种针对弱标记的直推式多标记分类方法[J].计算机研究与发展,2010,47(8):1392-1399. 被引量：13
4ZHU XIAOJIN.Semi-supervised learning literature survey,Computer Sciences TR 1530[R/OL].Madison:University of Wisconsin-Madison,Department of Computer Sciences,2006[2011-05-12].http://pages.cs.wisc.edu/- jerryzhu/pub/ssl_survey.pdf. 被引量：1
5ZHU X,GHAHRAMANI Z,LAFFERTY J.Semi-supervised learning using Gaussian fields and harmonic functions[C]// ICML 2003:The 20th International Conference on Machine Learning.Palo Alto:AAAI,2003:912-919. 被引量：1
6ZHOU D,BOUSQUET O,LAL T N,et al.Learning with local and global consistency[C]// Advances in Neural Information Processing Systems 16:Proceedings of the 2003 Conference.Cambridge:MIT Press,2004:321-328. 被引量：1
7WANG FEI,ZHANG CHANGSHUI.Label propagation through linear neighborhoods[C]//The 23th International Conference on Machine Learning.New York:ACM,2006:985-992. 被引量：1
8ZHU X,GHAHRAMANI Z.Learning from labeled and unlabeled data with label propagation,CMU-CALD-02-107[R].Pittsburgh:Carnegie Mellon University, Department of Computer Science,2002. 被引量：1
9BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(5):993-1022. 被引量：1
10DUMAIS D S,LANDAUER T,FURNAS G,et al.Indexing by latent semantic analysis[J].Journal of the American Society of Information Science,1998,41 (6):391-407. 被引量：1

二级参考文献49

1朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564. 被引量：15
2石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量：25
3Manevitz L M, Yousef M, Cristianini N, et al. One-class SVMs for document classification [J]. Journal of Machine Learning Research, 2001, 2 : 139-154. 被引量：1
4Yu H, Han J, Chang K. PEBL: Positive examples based learning for Web page classification using SVM [C]//Proc of the 8th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2002: 239-248. 被引量：1
5Li X, Liu B, Ng S. Learning to identify unexpected instances in the test set [C]//Proc of the 20th IJCAI. San Francisco: Morgan Kaufmann, 2007:2802-2807. 被引量：1
6Sha C, Xu Z, Wang X, et al. Directly identify unexpected instances in the test set by entropy maximization [C]//Proc of APWEB/WAIM 2009. Berlin: SPringer, 2009: 659-664. 被引量：1
7Manning C D, Raghavan P, Schutze H. An Introduction to Information Retrieval [M]. Cambridge, Cambridge University Press, 2007: 117-119. 被引量：1
8Maimon O, Rokach L. The Data Mining and Knowledge Discovery Handbook [M]. Berlin: Springer, 2005:853-867. 被引量：1
9Gyorfi L, Gyorfi Z, Vajda I. Bayesian decision with rejection [J]. Problems of Control and Information Theory, 1979, 8 (5) : 445-452. 被引量：1
10McCallum A, Nigam K. A comparison of event models for naive Bayes text classification [C]//Proc of AAAI-98 Workshop on Learning for Text Categorization. Menlo Park, CA: AAAI, 1998:41-48. 被引量：1

共引文献75

1包乾辉,李佳利,石淑珍,戴引,刘雪.基于DSLML的鸡蛋消费在线评论情感分析[J].农业机械学报,2021,52(S01):496-503. 被引量：5
2王小芳,王瑞芳,张树功.基于最优化控制模型的文本主题域划分[J].吉林大学学报（理学版）,2009,47(4):769-776.
3赵煜,蔡皖东,樊娜,刘念.采用并行遗传算法的文本分割研究[J].西安交通大学学报,2009,43(12):40-44. 被引量：1
4张小平,周雪忠,黄厚宽,冯奇,陈世波.基于词相似性与CRP的主题模型[J].模式识别与人工智能,2010,23(1):72-76. 被引量：8
5杨潇,马军,杨同峰,杜言琦,邵海敏.主题模型LDA的多文档自动文摘[J].智能系统学报,2010,5(2):169-176. 被引量：23
6高隽,谢昭,张骏,吴克伟.图像语义分析与理解综述[J].模式识别与人工智能,2010,23(2):191-202. 被引量：20
7石晶,李万龙.基于LDA模型的主题词抽取方法[J].计算机工程,2010,36(19):81-83. 被引量：47
8武浩,王美姣,冯佳明,裴以建.专家检索研究进展[J].计算机应用研究,2010,27(10):3633-3638. 被引量：5
9崔凯,周斌,贾焰,梁政.一种基于LDA的在线主题演化挖掘模型[J].计算机科学,2010,37(11):156-159. 被引量：34
10王嵩,李保珍,代逸生.基于Dirichlet先验贝叶斯推理的社会化标注主题聚类[J].情报理论与实践,2010,33(12):124-128. 被引量：3

同被引文献65

1段瑞雪,王小捷,孙月萍,李文峰.HDP主题模型的用户意图聚类[J].北京邮电大学学报,2011,34(S1):55-58. 被引量：6
2陈刚,杨志强,刘秉权.一种基于PLS的概率神经网络分类算法[J].微电子学与计算机,2015,32(5):73-78. 被引量：4
3Blei D M, Ng A Y, Jordan M. Latent Dirichlet Allocation [ J ]. Journal of Machine Learning Research, 2003(3 ) : 993-1022. 被引量：1
4Blei D,Griths T,Jordan M. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hier- archies[J]. Journal of the ACM,2010,57(2) : 1-30. 被引量：1
5Reisinger J, Waters A, Silverthorn B, et al. Spherical topic models [ C ]//The 27th International Conference on Machine Learning (ICML- 10). Haifa: Israel Press, 2010. 被引量：1
6Homan M,Blei D,Bach F. On-line learning for latent dirichlet allocation[C]//In Neural Information Processing Systems. Vancouver: NZPS, 2010. 被引量：1
7Boyd-Graber D, Blei D,Zhu X. A topic model for word sense disambiguation [ C J//Proceedings of the Joint Con- ference of Empirical Methods in Natural Language Pro- cessing and Computational Natural Language Learning. New York : ACM, 2007 : 1024-1033. 被引量：1
8Hofmann T. Probabilistic latent semantic indexing [C]// Proceedings of the 22nd Annual ACM Conference on Re- search and Development in Information Retrieval. New York : ACM Press, 1999 : 50-57. 被引量：1
9Teh Y W,Jordan M,Beal M,et al. Hierarchical dirichlet processes [J]. Journal of the American Statistical Associa- tion,2006, 101(476) : 1566-1581. 被引量：1
10Jones M N, Mewhort D J K, Representing word meaning and order information in a composite holographic lexicon [J]. Psychological Review, 2007,114 (2) : 1-37. 被引量：1

引证文献5

1何锦群,刘朋杰.基于LDA的文本分类算法[J].天津理工大学学报,2014,30(4):28-31.
2郭毅,黄磊.基于LPA和Tri-Training的半监督文本倾向性分类[J].北京交通大学学报,2015,39(6):114-121. 被引量：1
3颜端武,陶志恒,李兰彬.一种基于HDP模型的主题文献自动推荐方法及应用研究[J].情报理论与实践,2016,39(1):128-132. 被引量：9
4蒋璐,陈云伟.多节点多关系的混合网络社团划分研究综述[J].图书情报工作,2021,65(19):142-150. 被引量：4
5何敏,吴帮吕,葛建洪,余江,徐涛,周朝旭.综合多维度优化的网络党建用户关系评价研究[J].云南大学学报（自然科学版）,2019,41(5):891-899.

二级引证文献14

1孙晓玲,丁堃.管理科学研究主题及其演化趋势——基于NSFC基础研究知识库的分析[J].科学学与科学技术管理,2017,38(6):3-11. 被引量：8
2何伟林,谢红玲,奉国和.潜在狄利克雷分布模型研究综述[J].信息资源管理学报,2018,8(1):55-64. 被引量：25
3王婷婷,韩满,王宇.LDA模型的优化及其主题数量选择研究——以科技文献为例[J].数据分析与知识发现,2018,2(1):29-40. 被引量：37
4和敬涵,李长城,张沛,王小君,冯丽.基于改进标签传播算法的电力系统并行恢复分区方法[J].电网技术,2018,42(6):1776-1782. 被引量：21
5李燊,夏晨曦,马敬东.主题模型在临床文本挖掘中的应用现状[J].医学信息学杂志,2018,39(5):51-56. 被引量：2
6杨秀璋,于小民,李娜,夏换.基于随机森林和LDA的论文自动分类及主题挖掘研究[J].计算机时代,2018(11):14-18. 被引量：3
7杨秀璋,夏换,于小民,杨琪,汪瑜斌.基于多视图融合的论文自动分类方法研究[J].现代电子技术,2020,43(8):120-124. 被引量：4
8崔金栋,陈思远,郭天成,梁雯豪,郭元婕.大数据时代融媒体信息资源管理技术需求与热点分析研究[J].情报科学,2020,38(8):35-41. 被引量：9
9王战平,夏榕.基于主题和多重信任关系的微博推荐方法研究[J].现代情报,2021,41(2):3-9. 被引量：2
10Yunwei Chen,Qiuyang Chen,Lingjing Cao.The scientific applications of big data in science of science[J].Data Science and Informetrics,2022,2(3):37-48.

1赵志国,鞠哲,顾宏.低分辨率多姿态人脸识别算法研究[J].控制工程,2016,23(7):1057-1062. 被引量：3
2兰远东,邓辉舫,陈涛.基于哈希表结构和图像分割的快速图像标注[J].沈阳工业大学学报,2013,35(4):438-444.
3郝建柏,陈贤富,黄双福,杨俊.一种基于模糊近邻标签传递的半监督分类算法[J].微电子学与计算机,2010,27(2):30-33. 被引量：6
4魏韡,向阳.一种新的中文词语情感极性判别方法[J].微电子学与计算机,2013,30(5):84-86.
5陶新民,曹盼东,宋少宇,付丹丹.基于两阶段学习的半监督支持向量机分类算法[J].信息与控制,2012,41(1):7-13. 被引量：4
6苗海飞,陈晓云,程建军,马涛.基于LPA和频繁项集的社团检测方法[J].宁夏师范学院学报,2016,37(6):67-76.
7闫光辉,舒昕,马志程,李祥.基于主题和链接分析的微博社区发现算法[J].计算机应用研究,2013,30(7):1953-1957. 被引量：28
8侯秀艳,刘培玉,孟凡龙.基于可信标签扩展传递的跨领域倾向性分析[J].计算机应用研究,2016,33(5):1379-1383.
9于智,朱银龙,王灿,卜佳俊.基于标签传递的地图区域化搜索[J].中国科技论文,2015,10(2):180-185. 被引量：1
10梅松青.基于自适应图的半监督学习方法[J].计算机系统应用,2014,23(2):173-177. 被引量：2

计算机应用

2012年第2期

浏览历史

内容加载中请稍等...

基于LDA主题模型的标签传递算法被引量：5

参考文献15

二级参考文献49

共引文献75

同被引文献65

引证文献5

二级引证文献14

相关作者

相关机构

相关主题

浏览历史

基于LDA主题模型的标签传递算法 被引量：5

参考文献15

二级参考文献49

共引文献75

同被引文献65

引证文献5

二级引证文献14

相关作者

相关机构

相关主题

浏览历史

基于LDA主题模型的标签传递算法被引量：5