摘要
探索对多种类型文献进行混合分类组织时LDA主题模型的可行性及优越性。以图书、期刊、网页等不同类型的馆藏文献作为实验对象,分别采用LDA主题模型与VSM模型对实验材料进行建模,采用SVM算法实现文本混合自动分类。仿真实验表明:LDA主题模型相对VSM模型具有一定优势,混合自动分类准确率最大差距达19.9%;图书与学术性期刊、网页与非学术性期刊之间的混合分类效果较好,分类准确率可达72%以上。实验证明LDA主题模型对实现多种类型文献统一组织具有较高的可行性和适用性。
The paper explores the feasibility and superiority of using LDA model to categorize and organize muhiple types of document. Selecting books, journals and web pages as the experimental objeets, the authors model the experimental materials with LDA model and VSM model respectively, and use algorithm SVM to realize the mixed automatic text classification. The simulation experiment results show that LDA model have quite a few advantages over traditional VSM model, with a largest difference of 19.9% in the accuracy of mixed automatie text classification; mixed classification performs better between books and academic journals, and between web pages and non-academic journals, with the accuracy of above 72%. Thus, it is proved that LDA model has a high feasibility and usability tor organizing multiple types of document uniformly.
出处
《图书馆论坛》
CSSCI
北大核心
2015年第1期74-80,共7页
Library Tribune
关键词
LDA模型
混合分类
多种类型文献
数字图书馆
LDA model
mixed categorization
multiple types of document
digital library