摘要
随着网络技术与数字图书馆的迅猛发展,在线文档迅速增加,自动文本分类已成为处理和组织大量文档数据的关键技术。kNN方法作为一种简单、有效、非参数的分类方法,在文本分类中得到广泛的应用。本文介绍了kNN分类算法的思想以及两种不同的决策规则,并通过实现的文本分类系统对基于离散值规则的kNN方法和基于相似度加权的kNN方法进行实验比较。实验结果表明,基于相似度加权的kNN方法的分类性能要优于基于离散值规则的kNN方法。
With the rapid development of network technology and digital libraries, online documents are rapidly increasing. Automatic text classification has become a key technology for massive documents processing. As a simple, effective, non-parametric method of classification, kNN method is widely used in the text classification. This paper introduces the basis theory of the kNN algorithm and two different decision-making rules. Experiments which compared two different decision-making rules are also pres- ented in this paper. The experimental results show that the performance of similarity-weighted function is better than the performance of discrete-valued function.
出处
《计算机与现代化》
2008年第11期69-72,共4页
Computer and Modernization
基金
唐山市重点实验室资助项目(06360301A-6)
关键词
文本分类
KNN
特征选择
text categorization
kNN
feature selection