摘要
提出了一种基于特征项扩展的中文文本分类方法.该方法首先对文档的特征词进行分析,然后利用HowNet抽取最能代表主题的特征义原,接着根据这些义原对特征项进行扩展,并赋予扩展的特征项适当权值来说明其描述能力.最后利用扩展的特征项集提取特征进行分类.该文重点研究了如何抽取特征义原,如何给扩展项设定一个合适的权值.实验证明,该文方法能增加有效的特征项的数目,使分类正确率和稳定性均得到提高.
A Chinese text classification method based on feature expansion is proposed. First the feature keys of each type of texts are analyzed. By the help of HowNet, the sememes which are most closely related to the theme are abstracted. These sememes are used to expand features. And then the feature expansion set is generated and each expansion term is given with proper weight to present its description power. Finally, we use the expansion set to classify texts. This article focuses on how to extract characteristics, and how to set an appropriate weight to expansion terms. Experimental results show this method can increase the effective number of features, so that both of the classification accuracy and stability are improved.
出处
《应用科技》
CAS
2010年第3期1-4,29,共5页
Applied Science and Technology
基金
国家自然科学基金资助项目(607702053)
关键词
文本分类
特征选择
特征项扩展
特征义原
text classification
feature selection
feature expansion
feature sememe