期刊文献+

基于HybridDL模型的文本相似度检测方法 被引量:3

Text similarity detection method based on HybridDL model
下载PDF
导出
摘要 为了提高文本相似度检测算法的准确度,提出一种结合潜在狄利克雷分布(Latent Dirichlet Allocation,LDA)与Doc2Vec模型的文本相似度检测方法,并把该算法得到的模型命名为HybridDL模型。该算法通过Doc2Vec对文档训练得到文档向量,再利用LDA模型得到文档主题与各个主题下特征词出现的概率,对文档中各主题及特征词计算概率加权和,映射到Doc2Vec文档向量中。实验结果表明,新算法模型比传统的Doc2Vec模型对相似文本的判断更加敏感,在文本相似度检测上具有更高的准确度。 In order to improve the accuracy of text similarity detection algorithm,this paper proposes a text similarity detection method combining latent Dirichlet Allocation(LDA)and Doc2Vec model,and names the model obtained by the algorithm HybridDL model.This algorithm obtains the document vector through Doc2Vec training of the document,and then obtains the probability of the occurrence of the document topic and the feature words under each topic with the LDA model,calculates the probability weighted sum of each topic and feature words in the document,and maps them to the Doc2Vec document vector.Experimental results show that the new algorithm model is more sensitive to the judgment of similar text than the traditional Doc2Vec model,and has higher accuracy in the detection of text similarity.
作者 肖晗 毛雪松 朱泽德 Xiao Han;Mao Xuesong;Zhu Zede(School of Information Science and Engineering,Wuhan University of Science and Technology,Wuhan 430081,China;Institute of Technology Innovation,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230031,China)
出处 《电子技术应用》 2020年第6期28-31,35,共5页 Application of Electronic Technique
基金 国家自然科学基金(61806187)。
关键词 Doc2Vec 潜在狄利克雷分布 文本表示 文本相似度 Doc2Vec latent Dirichlet allocation text representation text similarity
  • 相关文献

参考文献7

二级参考文献55

  • 1王燕.一种改进的K-means聚类算法[J].计算机应用与软件,2004,21(10):122-123. 被引量:9
  • 2Salton G,Wong A, Yang C S. A Vector Space Model for Auto- matic lndexing[J]. Communications of the ACM, 1975,18: 613- 620. 被引量:1
  • 3Blei D, Ng A, Jordan M. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003,3 : 993. 被引量:1
  • 4石晶,范猛,李万龙.基于LDA模型的主题分析[J].自动化报,2009,36:1586-1593. 被引量:1
  • 5Wei Xing,Croft W Bo LDA-Based Document Models for Ad-hoc Retrieval[C]//SIGIR' 06. Seattle, WA, USA, August 2006. 被引量:1
  • 6Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classi- fiers[J]. Machine Learning, 1997,2 : 131. 被引量:1
  • 7Doueet A, Godsill S, Andrieu C. On sequential Monte Carlo sam- piing methods for Bayesian filtering[J]. Statistics and Compu- ting,2000,3:197. 被引量:1
  • 8Duda R O, Hart P E, Stork D G. Pattern Classification(2ed)[M].李宏东,姚天翔,等译.机械工业出版社,2003:508. 被引量:1
  • 9Lin J. Divergence measures based on Shannon entropy[J]. IEEE Transactions on Infommtion Theory, 1991,37(14) 145. 被引量:1
  • 10Baeza-Yates R,Ribeiro-Neto B.Modern Information Retrieval[M].New York:ACM press,1999. 被引量:1

共引文献271

同被引文献18

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部