期刊文献+

一种大规模数据的快速潜在语义索引 被引量:10

Fast Latent Semantic Indexing on Large-scale Dataset
下载PDF
导出
摘要 潜在语义索引(LSI)已应用到现代信息检索的多个领域,但矩阵奇异值分解的高复杂度阻碍了该技术在大规模数据上的应用。提出一种大规模数据的快速LSI方法。给出一个降维问题的统一框架,LSI作为一种特征提取算法,可以在这个框架下转化为一个特征选择问题。利用该技术在最大程度保持LSI降维效果的同时,简化LSI的计算,使其能够应用于大规模数据。 Latent Semantic Indexing(LSI) has been successfully applied to various fields in modern information retrieval. However, the high computational complexity of Singular Value Decomposition(SVD) makes it improbable on the application of large-scale dataset. This paper proposes a fast LSI approach to solve this problem. It gives a unified framework of dimension reduction problem. As a feature extraction method, LSI can be transformed into a feature selection method within this framework. This new strategy can simplify significantly the computation of LSI.
作者 卫威 王建民
出处 《计算机工程》 CAS CSCD 北大核心 2009年第15期35-37,40,共4页 Computer Engineering
关键词 潜在语义索引 降维 特征选择 特征提取 Latent Semantic Indexing(LSI) dimension reduction feature selection feature extraction
  • 相关文献

参考文献6

  • 1Scott C D,Dumais S T,Thomas K L,et al.Indexing by Latent Semantic Analysis[J].Journal of the American Society for Information Sciences,1990,41 (6):391-407. 被引量:1
  • 2何明,冯博琴,傅向华.基于Rough集潜在语义索引的Web文档分类[J].计算机工程,2004,30(13):3-5. 被引量:7
  • 3Tang Chunqiang,Dwarkadas S,Xu Zhichen.On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems[C]//Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval.NY,USA:ACM Press,2004:112-121. 被引量:1
  • 4Kolda T G,O'Leary D P.A Semidiscrete Matrix Decomposition for Latent Semantic Indexing Information Retrieval[J].ACM Trans.on Inf.Syst.,1998,16(4):322-346. 被引量:1
  • 5Karypis G,Hart E H S.Concept Indexing:A Fast Dimensionality Reduction Algorithm with Application to Document Retrieval and Categorization[C]//Proceedings of CIKM'00.McLean,VA,USA:[s.n.],2000:12-19. 被引量:1
  • 6Bingham E,Mannila H.Random Projection in Dimensionality Reduction:Applications to Image and Text Data[C]//Proceedings of KDD'01.San Francisco,CA,USA:[s.n.],2001:245-250. 被引量:1

二级参考文献5

  • 1Pawlak Z. Rough Sets. International Journal of Information and Computer Science, 1982, 11(5): 341-356 被引量:1
  • 2Pawlak Z, Grzymla-Busse J. Rough Sets. Communications of the ACM, 1995,38(11):88-95 被引量:1
  • 3Deerwester S, Dumains S, Fumas G, et al. Indexing by Latent Semantic Analysis [J]. Journal of the American Society for Information Science, 1990, 41(6):391-407 被引量:1
  • 4Bao Yongguang, Aoyama S, Du Xiaoyong. A Rough Set-based Hybrid Method to Text Categorization. Second International Conference on Web Information Systems Engineering (WISE′01) Volumel.2002:254-261 被引量:1
  • 5Chouchoulas A, Shen Q. A Rough Set-Based Approach to Text Classification. In 7th International Workshop, RSFDGrC99, Yamaguchi,Japan, 1999:118-129 被引量:1

共引文献6

同被引文献115

  • 1罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量:55
  • 2贺玲,吴玲达,蔡益朝.数据挖掘中的聚类算法综述[J].计算机应用研究,2007,24(1):10-13. 被引量:225
  • 3居斌.潜在语义标引在中文信息检索中的研究与实现[J].计算机工程,2007,33(5):193-196. 被引量:16
  • 4金千里,赵军,徐波.弱指导的统计隐含语义分析及其在跨语言信息检索中的应用[C]//全国第七届计算语言学联合学术会议.北京:清华大学,2003-08-01:527-533. 被引量:4
  • 5Yu Lei, Liu Huan. Efficient Feature Selection via Analysis of Relevance and Redundancy[J]. Journal of Machine Learning Research, 2004, 5(10): 1205-1224. 被引量:1
  • 6Yan Jun, Liu Ning, Zhang Brnyu, et al. OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization[C]//Proc. of ACM SIG on Information Retrieval. [S. l.]: ACM Press, 2005: 122-129. 被引量:1
  • 7Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3(1): 993-1022. 被引量:1
  • 8Tan Pangning, Steinbach M, Kurnar V. Introduction to Data Mining[M]. [S. l.]: Pearson Education, 2006. 被引量:1
  • 9Yang Yiming, Pedersen J Q. A Comparative Study on Feature Selection in Text Categorization[C]//Proc. of the 14th International Conference on Machine Learning. [S. l.]: Morgan Kaufmann Publishers, 1997: 412-420. 被引量:1
  • 10Deerwester S, Dumais S T, Furnas G W, et al. Indexing by Latent Semantic Analysis[J]. Journal of the American Society of Information Science, 1990, 41(6): 391-407. 被引量:1

引证文献10

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部