摘要
针对当前跨类型数字资源自动分类研究中未充分利用不同类型数字资源特征之间的潜在语义关联以及无法解决跨类型数字资源分类过程中的维度灾难和特征稀疏等问题,提出一种基于主题相关性挖掘的跨类型数字资源分类方法。通过TG-LDA模型对数字资源进行语义建模,并结合开放知识库Wikipedia对建模后的共享主题空间进行语义概念的扩展,最后通过Max Ent、SVM等多种算法实现对数字资源的跨类型分类。实验表明,该方法能有效增强不同类型数字资源间的亲和性,提高不同类型数字资源间的分类性能。
To solve the shortcomings that the current studies of automatic classification of cross-type digital resources cannot fully use the latent semantic association among various digital resource features and the problems of curse of dimensionality and feature sparse in the classification process of cross-type digital resources,this paper proposes a cross-type digital resources classification method based on topic association mining. The paper carries on the semantic modeling for digital resources based on TG-LDA model,and extends the semantic concept of sharing topic space via the open knowledge database Wikipedia. Finally,the paper realizes the cross-type classification of digital resources based on Max Ent and SVM. The experiments demonstrate that the proposed approach can effectively increase the affinity of different types of digital resources and ultimately improve the performance of classification.
出处
《情报理论与实践》
CSSCI
北大核心
2015年第11期108-114,共7页
Information Studies:Theory & Application
基金
国家自然科学基金项目"基于社会网络的协作模型及推荐技术研究"的成果
项目编号:71102111
关键词
数字资源
主题挖掘
跨类型分类
分类方法
digital resources
topic mining
cross-type classification
classification methods