期刊文献+

全局和局部特征提取相融合的中文文本特征提取方法研究

Method for Feature Extraction in Chinese Text by Fusing Global and Local Features
下载PDF
导出
摘要 文本分类中特征质量的好坏,会直接影响到分类的准确率,从特征提取这一环节出发,实现了一种改进的基于基尼指数的特征提取方法Gini,提出一种全局和局部特征提取相融合的特征提取方法。当MI、IG、CE、WET、Gini与χ2这6种特征提取方法用于SVM分类实验时,发现Gini全局特征提取能力强,χ2方法适合局部特征提取;当Gini与χ2两种方法相融合进行特征提取时表现出较强的特征提取能力,明显优于全局和局部的提取效果. The feature quality in the text categorization has a direct influence on the accuracy rate of categorization.From the link of feature extraction,one kind method of feature extraction based on Gini-Index named Gini was realized and a method for feature extration in chinese text by fusing global and local features was proposed.When the six kinds of feature extraction methods(MI,IG,CE,Wet,Gini and χ2)were used for categorization experiments,it was found that Gini had a capability to extract the global feature and χ2 was suitable for local feature extraction.When fused method of Gini and χ2 was used to extract feature,its stronger feature extraction capability had significantly better effects of than global and local extraction methods.
作者 王荣荣
出处 《河北北方学院学报(自然科学版)》 2013年第3期35-38,共4页 Journal of Hebei North University:Natural Science Edition
关键词 基尼指数 特征提取 文本分类 Gini-Index feature extraction text categorization
  • 相关文献

参考文献11

二级参考文献48

  • 1王聃,贾云伟,林福严.人脸识别系统中的特征提取[J].微计算机信息,2005,21(07X):53-55. 被引量:18
  • 2陈涛,谢阳群.文本分类中的特征降维方法综述[J].情报学报,2005,24(6):690-695. 被引量:79
  • 3田晓宇,梁静国.支持向量机在文本自动分类中的应用研究[J].情报学报,2006,25(2):208-214. 被引量:7
  • 4尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 5Yang Y, Hu X. A re- examination of text categorization methods [ A]. Proceedings 22^nd Annual International ACM SIGIR Confetence on Research and Develolanent in Information Retrieval(SIGIR '99)[C]. Berkeley: ACM Press, 1999.42-49. 被引量:1
  • 6Yah Qiu Chen; Nixca, M. S.; Damper, R. I. Implementing the k - nearest neighbour rule via a neural network[A]. Neural Netwodm, 1995 [ C ]. Proceedings., IEEE.International Coderence on, 1995. 136- 140. 被引量:1
  • 7Soucy, P.; Mineau, G. W. A simple KNN algorithm for text categodzation[ A]. Data Mining, 2001. ICDM 2001[C], Proceedings IEEE International Codeaevce on, 2001.647-648. 被引量:1
  • 8徐建锁 王正欧.一种基于Kohonen网络和模式聚合理论的高效文本分类新方法[R].天津:天津大学系统工程研究所,2004.. 被引量:1
  • 9Yang Y, Pedersen JP. A comparative study on feature selection in text categorization[ A]. Proceedings of the Fourteenth Intematlonal Confemnce on Machine Learning (ICML'7)[C]. San Francisco: Morgan Ksufmann Publishers, 1997.412 - 420. 被引量:1
  • 10Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420. 被引量:2

共引文献263

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部