期刊文献+

使用基于SVM的局部潜在语义索引进行文本分类 被引量:4

Using SVM-based LLSI for text classification
下载PDF
导出
摘要 潜在语义索引(LSI)通过奇异值分解(SVD)获得原始词—文档矩阵的潜在语义结构,在一定程度上解决了一词多义和多词一义问题。但目前文本分类中使用LSI方法的效果并不理想,这是因为没有充分考虑分类信息。为解决该问题,提出一种改进的局部潜在语义索引(LLSI)方法,使用支持向量机(SVM)来产生局部区域。实验结果表明,该方法是有效的。 Latent Semantic Indexing (LSI) uses Singular Value Decomposition (SVD) to obtain latent semantic structure of original term-document matrix, and problems of polysemy and synonymy can be dealt with to some extent. However, the present available methods of applying LSI to text classification are not satisfying, since they do not take full account of classification information. To solve the problem, an improved Local LSI (LLSI) method was proposed, using Support Vector Machine (SVM) to produce the local region. Experimental results suggest that the proposed method is effective.
作者 张秋余 刘洋
出处 《计算机应用》 CSCD 北大核心 2007年第6期1382-1384,共3页 journal of Computer Applications
基金 甘肃省科技攻关计划资助项目(2GS047-A52-002-03)
关键词 文本分类 潜在语义索引 支持向量机 局部区域 text classification Latent Semantic Indexing (LSI) Support Vector Machine (SVM) local region
  • 相关文献

参考文献13

  • 1SEBASTIANI F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(1):1 -47. 被引量:1
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 3DEERWESTER S,DUMAIS ST,LANDAUER TK,et al.Indexing by Latent Semantic Analysis[J].Journal of the Society for Information Science,1990,41(6):391-407. 被引量:1
  • 4LIU T,CHEN Z,ZHANG BY,et al.Improving Text Classification using Local Latent Semantic Indexing[A].Proceedingsof the 4th IEEE International Conference on Data Mining[C].2004.162 -169. 被引量:1
  • 5SHIMA K,TODORIKI M,SUZUKI A.SVM-based feature selection of latent semantic features[J].Pattern Recognition Letters,2004,25(2):1051-1057. 被引量:1
  • 6曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报(自然科学版),2004,32(z1):99-102. 被引量:27
  • 7ZELIKOVITZ S.TransductiveLSI for Short Text Classification Problems[A].Proceedings of the 17th International Florida Artificial Intelligence Research Society Conference[C].2004.556 -561. 被引量:1
  • 8BURGES CJ.A tutorial on support vector machines for pattern recognition[J].Data Mining and Knowledge Discovery,1998,2(2):121 -167. 被引量:1
  • 9陈涛,谢阳群.文本分类中的特征降维方法综述[J].情报学报,2005,24(6):690-695. 被引量:79
  • 10YANG YM,LIU X.A re-examination of text categorization methods[A].Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C].1999.42-49. 被引量:1

二级参考文献35

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 3[1]Sebastiani F. Machine learning in automated text categorization [J]. ACM Computing Survey, 2002,34 (1):1 -47. 被引量:1
  • 4[2]Deerwester S,Dumais S T,Furnas G W,et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990,41 (6) :391 - 407. 被引量:1
  • 5[3]Dumais S T. Using LSI for information filtering [A].Harman D. The Third Text Retrieval Conference ( TREC - 3) [C]. USA: National Institute of Standards and Technology Special Publication, 1995. 被引量:1
  • 6[4]Baker L D,McCallum A K. Distributional clustering of words for text classification [A]. Proc. ACM-SIGIR-98[C]. Australia: ACM Press, 1998. 96 - 103. 被引量:1
  • 7[5]Park H,Howland P,Jeon M. Cluster structure preserving dimension reduction based on the generalized singular value decompositon [J]. SIAM Journal on Matrix Analysis and Applications ,2003,25 (1): 165 - 179. 被引量:1
  • 8[6]Wold H. Encyclopedia of Statistical Science [M]. New York: Wiley, 1985. 被引量:1
  • 9[7]Tenenhaus M. La Régreesion PLS. Théorie et Pratique [M]. Paris: éditions Technip, 1998. 被引量:1
  • 10Apte C, Damerau F J, and Weiss S M. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 1994, 12:233- 251. 被引量:1

共引文献470

同被引文献61

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部