摘要
潜在语义索引(LSI)已应用到现代信息检索的多个领域,但矩阵奇异值分解的高复杂度阻碍了该技术在大规模数据上的应用。提出一种大规模数据的快速LSI方法。给出一个降维问题的统一框架,LSI作为一种特征提取算法,可以在这个框架下转化为一个特征选择问题。利用该技术在最大程度保持LSI降维效果的同时,简化LSI的计算,使其能够应用于大规模数据。
Latent Semantic Indexing(LSI) has been successfully applied to various fields in modern information retrieval. However, the high computational complexity of Singular Value Decomposition(SVD) makes it improbable on the application of large-scale dataset. This paper proposes a fast LSI approach to solve this problem. It gives a unified framework of dimension reduction problem. As a feature extraction method, LSI can be transformed into a feature selection method within this framework. This new strategy can simplify significantly the computation of LSI.
出处
《计算机工程》
CAS
CSCD
北大核心
2009年第15期35-37,40,共4页
Computer Engineering
关键词
潜在语义索引
降维
特征选择
特征提取
Latent Semantic Indexing(LSI)
dimension reduction
feature selection
feature extraction