摘要
文章概述了主题概率模型(LDA)的计算原理和方法,以及开源R语言中lda程序包采用快速压缩吉普抽样算法分析语料库的处理流程。设计了基于LDA模型的查新辅助分析系统设计功能框架,对其功能、编程实现思路和工作流程做了描述。最后结合课题查新实例,详述了采用LDA模型通过相关文献关键词进行潜在主题挖掘,对比分析课题研究内容,对课题给出客观评价的过程。结果表明,基于主题模型的查新辅助分析系统可以快速有效挖掘相关文献主题,降低查新员对相关文献的分析难度,提高课题评价的客观性,整体辅助分析效果良好。
This paper summarized the calculation principle and method of Latent Dirichlet Allocation( LDA),and the treatment flowsheet using the fast collapsed Gibbs sampling' s algorithm to analyze the corpus in open source R language. The paper designed the function framework of the novelty retrieval aided analysis system based on LDA model,and described its functions,programming mentality and workflow. Finally,with a novelty retrieval case,this paper explained the basic process of using LDA model,mining potential theme using the keywords of relevant literature,comparing comparative analysis the subject of research content,giving an objective to the research topic. The results showed that the novelty retrieval aided analysis system based on LDA could quickly and effectively mining related literature,reduced the difficulty of analyzing relevant literature topics to Novelty Consultant,improved the objectivity of evaluation subject. The overall analysis effect was good.
出处
《现代情报》
CSSCI
2018年第2期111-115,共5页
Journal of Modern Information
基金
安徽高校人文社会科学研究重点项目"基于泛在学习需求的图书馆空间智能服务研究"(项目编号:SK2017A0606)
安徽高校自然科学研究重点项目"基于上下文相关性的网络编码可靠多播技术的研究"(项目编号:KJ2016A609)
关键词
主题模型
R语言
查新
课题评价
latent dirichlet allocation
R language
novelty retrieval
subject evaluation