面向专利分析的Patent Classification LDA模型被引量：10

Patent Classification LDA:Topic Model for Patent Analysis

下载PDF

导出

摘要作为文本挖掘的热门技术,主题模型在专利分析上的应用日益增多,但由于常用作语料的专利摘要中存在科技术语繁多、同义词大量存在和文本长度较短等特点,导致使用传统主题模型如LDA所抽取主题晦涩难懂,技术指代不明,限制其进一步深入应用。对此,本文提出一种新的主题模型Patent Classification LDA,该模型结合专利分类体系以及专利所属分类号信息来协助主题抽取,以提高所抽取主题的可读性,进而推算出专利在专利分类体系上的概率分布。之后,本文给出一种估计该主题模型参数的吉布斯采样方法。最后,以硬盘磁头领域专利作为实验数据,验证了Patent Classification LDA的可行性和有效性。 As hotspot of text mining techniques,topic model has been used increasingly in patent analysis. However,due to some characteristics of patent abstracts,such as short text,various terminologies consists of multiple words and numorous synonyms,the topics extracted by tranditional topic models like LDA are always hard to explain. In this paper we propose a new topic model-Patent Classification LDA, which takes advantage of patent classification taxonomy and class codes of patents to benefit topic＇s interpretability. Then Gibbs sampling method is utilized to estimate corresponding parameters. Finally,experiments were conducted on the patents of hard disk drive head to demonstrate Patent Classification LDA＇s feasibility and effectiveness.

作者陈亮

机构地区中国科学技术信息研究所

出处《情报学报》 CSSCI 北大核心 2016年第8期864-874,共11页 Journal of the China Society for Scientific and Technical Information

关键词主题模型专利分析吉布斯采样困惑度硬盘驱动器 topic model, patent analysis, gibbs sampling, perplexity, hard disk drive

分类号 G350 [文化科学—情报学]

引文网络
相关文献

参考文献22

1Blei D M,Andrew N,Jordan M I.Latent Dirichlet allocation [J].Journal of Machine Learning Research,2003(3):993-1022. 被引量：1
2Salton G,Wong A,Yang C S.A vector space model for automatic indexing [J].Communications of the ACM,1975,18(11):613-620. 被引量：1
3PonteJ M,Croft W B.A language modeling approach to information retrieval [G]//Croft W B,Moffat A,Rijsbergen C J V,et al.Proceedings of the 21st annual international ACM SIGIR conference on Research and development in informationretrieval,New York:ACM Press,1998:275-281. 被引量：1
4Landauer T K,Foltz P W,Laham D.An introduction to latent semantic analysis [J].Discourse Processes,1998,25(2-3):259-284. 被引量：1
5Hofmann T.Unsupervised learning by probabilistic latent semantic analysis [J].Machine Learning.2001,42(1-2):177-196. 被引量：1
6Mei Q Z,Zhai C X.Discovering Evolutionary theme patterns from text- an exploration of temporal text mining [G]//Grossman R,Bayardo R J,Bennett K P.KDD-2005:Proceedings of the Eleventh ACM Sigkdd International Conference on Knowledge Discovery and Data Mining,New York:ACM Press,2005:198-207. 被引量：1
7Mei Q Z,Liu C,Su H,et al.A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs [G]// Carr L,Roure D D,Iyengar A et al.Proceedings of the 15th international conference on World Wide Web.New YorkrACM Press,2006:533-542. 被引量：1
8Deng C,Mei Q Z,Han J W.Modeling hidden topics on document manifold [G]//Shanahan J,Amer-Yahia S,Manolescu I,et al.Proceedings of the 17th ACM conference on Information and knowledge management.New York:ACM Press,2008:911-920. 被引量：1
9Griffiths T,Steyvers M.Finding scientific topics [J]. Proceedings of the National Academy of Sciences,2004,101(Suppl.1):5228-5235. 被引量：1
10Krestel R,Smyth P.Recommending patents based on latent topics[G]//Yang Q,King I,Li Q.Proceedings of the 7th ACM conference on Recommender systems.New York:ACM Press,2013:395-398. 被引量：1

二级参考文献86

1杨祖国,李文兰.中国专利被专利文献引用的主题分析[J].情报科学,2005,23(12):1845-1851. 被引量：14
2方曙,张娴,肖国华.专利情报分析方法及应用研究[J].图书情报知识,2007,24(4):64-69. 被引量：114
3HanJ CamberM 数据挖掘范明孟小峰译.概念与技术[M].北京:机械工业出版社,2001.. 被引量：7
4Christensenc.创新者的窘境[M].胡建桥,泽.北京:中信出版社.2010. 被引量：5
5罗旋.基于数据挖掘技术的专利信息分析及应用研究[D].北京:首都经济贸易大学,2011. 被引量：1
6BLEI D, NG A, JORADN M. Latent Dirichlet allocation[ J]. Jour-nal of Machine Learning Research, 2003,3:993 - 1022. 被引量：1
7HEINRICH G. Parameter estimation for text analysis [ R / OL ].[2012-04-03]. http: //www. arbylon. net/publications/text-est. pdf. 被引量：1
8UMASS AMHERST. MALLET[ CP/0L]. [2012- 04- 20]. http: //mallet, cs. umass. edu/topics, php. 被引量：1
9Porter M.E..竞争优势[M].陈小悦译.北京:华夏出版社,2005. 被引量：2
10Rosenberg N. Technological Change in the Machine tool Indus-try,1840-1910[J]. The Journal of Economic History, 1963,23(4);414-443. 被引量：1