期刊文献+

基于DBSCAN聚类的改进KNN文本分类算法 被引量:5

An Improved KNN Text Categorization Algorithm Based on DBSCAN
下载PDF
导出
摘要 K最近邻算法(KNN)在分类时,需要计算待分类样本与训练样本集中每个样本之间的相似度。当训练样本过多时,计算代价大,分类效率降低。因此,提出一种基于DBSCAN聚类的改进算法。利用DBSCAN聚类消除训练样本的噪声数据。同时,对于核心样本集中的样本,根据其样本相似度阈值和密度进行样本裁剪,以缩减与待分类样本计算相似度的训练样本个数。实验表明此算法能够在保持基本分类能力不变的情况下,有效地降低分类计算量。 In order to find k neighbors of classification, KNN algorithm needs to calculate the similarity be- tween the test sample and every training sample in sample space, with the increasing in the number of training sam- ples, the computational overhead becomes higher. Aiming at the problem of the KNN, an improved algorithm is proposed based on DBSCAN to reduce the number of training samples. The noisy data in sample space were re- duced with DBSCAN algorithm, furthermore, the part of highly similar samples in kernel set of training data were reduced according to the similarity threshold and density. It is shown that the improved method can reduce compu- tational overhead effectively.
出处 《科学技术与工程》 北大核心 2013年第1期219-222,共4页 Science Technology and Engineering
基金 教育部科学技术研究重点项目(208148) 琼台师范高等专科学校项目(qtkz201006)资助
关键词 K最近邻 文本分类 样本裁剪 KNN text classification sample reduction
  • 相关文献

参考文献4

二级参考文献34

共引文献36

同被引文献54

引证文献5

二级引证文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部