期刊文献+

基于特征项扩展的中文文本分类方法 被引量:1

A Chinese text classification method based on feature expansion
下载PDF
导出
摘要 提出了一种基于特征项扩展的中文文本分类方法.该方法首先对文档的特征词进行分析,然后利用HowNet抽取最能代表主题的特征义原,接着根据这些义原对特征项进行扩展,并赋予扩展的特征项适当权值来说明其描述能力.最后利用扩展的特征项集提取特征进行分类.该文重点研究了如何抽取特征义原,如何给扩展项设定一个合适的权值.实验证明,该文方法能增加有效的特征项的数目,使分类正确率和稳定性均得到提高. A Chinese text classification method based on feature expansion is proposed. First the feature keys of each type of texts are analyzed. By the help of HowNet, the sememes which are most closely related to the theme are abstracted. These sememes are used to expand features. And then the feature expansion set is generated and each expansion term is given with proper weight to present its description power. Finally, we use the expansion set to classify texts. This article focuses on how to extract characteristics, and how to set an appropriate weight to expansion terms. Experimental results show this method can increase the effective number of features, so that both of the classification accuracy and stability are improved.
出处 《应用科技》 CAS 2010年第3期1-4,29,共5页 Applied Science and Technology
基金 国家自然科学基金资助项目(607702053)
关键词 文本分类 特征选择 特征项扩展 特征义原 text classification feature selection feature expansion feature sememe
  • 相关文献

参考文献10

二级参考文献53

共引文献38

同被引文献14

  • 1贺慧,王俊义.主动支持向量机的研究及其在蒙文文本分类中的应用[J].内蒙古大学学报(自然科学版),2006,37(5):560-563. 被引量:2
  • 2张彰,樊孝忠.一种改进的基于VSM的文本分类算法[J].计算机工程与设计,2006,27(21):4078-4080. 被引量:8
  • 3LV lin, LIU Yushu. Research of English text classification methods based on semantic meaning[ C ]//2005 Internation- al Conference on ln~brmation and Communication Tec, hnolo- gies. Karachi, Pakistan, 2005: 689-700. 被引量:1
  • 4JOACHIMS T. Text categorization with support vector ma- chines : learning with many relevant features [ C ]//Proceed- ings of the 10th European Conference on Machine Learning (ECML-98). Chemnitz, Germany: Springer Verlag, 1998 : 137-142. 被引量:1
  • 5VAPNIK V N. Statistical learning theory [ M ]. New York, USS: John Wiley & Sons Inc, 1998: 375-570. 被引量:1
  • 6SEBASTIANI F. Machine learning in automated text catego- rization[ J ]. ACM Computing Surveys, 2002, 34 ( 1 ) : 1- 47. 被引量:1
  • 7TONG S, KOLLER D. Support vector machine active learn- ing with applications to text classification [ J ] Journal of Machine Learning Research, 2001, 2( 1 ) : 45-66. 被引量:1
  • 8BURGES J C. A tutorial on support vector machines for pat- tern recognition[J ]. Data Mining and Knowledge Discover- y, 1998, 2(2): 121-167. 被引量:1
  • 9PLATI" J C, CRISTIANINI N, SHAWE-TAYLOR J. Large margin DAGs for multiclass classification [ C ]//Proceed- ings of Neural Information Processing Systems. Cambridge, USA: MIT Press. 2000 : 547-553. 被引量:1
  • 10SCHOHN G, COHN D. Less is more: active learning with support vector machines [ C ]//Proceedings of the Seven- teenth International Conference on Machine Learning (IC- ML-2000). Stanford, USA, 2000 : 839-846. 被引量:1

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部