摘要
提出了一种应用于文本分类的KNN和SVM相结合的算法,将SVM近似看成每类只有一个代表点的1NN分类器,对于待识别样本,如果其离支持向量机的最优分界面较远,则用SVM分类;如果其离分界面较近,采用KNN对测试样本分类,将每个支持向量作为代表点,计算待识别样本和每个支持向量的距离对其作出判断。该算法综合了KNN和SVM在分类问题中的优势,既有效地降低了分类候选的数目,又提高了文本分类的精度。最后用实验验证了该算法的有效性。
An algorithm based on the fusion of KNN and SVM in text categorization is this algorithm puts SVM as a classifier which has ordy one representative point in each class. For the point we want to categories, if the distance to optimal decision surface for classification is far, then we use SVM , else we use KNN algorithm, make each support vector as a representative point, make decision by the distance in the point and each support vector. This algorithm combines the advantages of KNN and SVM in text categorization, it reduces the number of the point in training set and improves largely classification performance largely. At last the experimental results based on text categorization verifies the algorithm is effective.
出处
《信息技术》
2008年第1期83-84,88,共3页
Information Technology
关键词
文本分类
支持向量机
KNN算法
text categorization
support vector machine
KNN algorithm