摘要
支持向量机(SVM)在许多实际应用中由于训练样本集规模较大且具有类内混杂孤立点数据,引发了学习速度慢、存储需求量大、泛化能力降低等问题,成为直接使用该技术的瓶颈。针对这些问题,通过在点集理论的基础上分析训练样本集的结构,提出了一种新的支持向量机大规模训练样本集缩减策略。该策略运用模糊聚类方法快速的提取出潜在支持向量并去除类内非边界孤立点,在减小训练样本集规模的同时,能够有效地避免孤立点数据所造成的过学习现象,提高了SVM的泛化性能,在保证不降低分类精度的前提下提高训练速度。
It has become a bottleneck to use Support Vector Machine (SVM) due to such problems as slow learning speed, large buffer memory requirement, low generalization performance and so on, which are caused by large-scale training sample set and outlier data immixed in the other class. Concerning these problems, this paper proposed a new reduction strategy for large-scale training sample set according to the analysis on the structure of the training sample set based on the point set theory. This new strategy gets the potential support vectors and removes the non-boundary outlier data immixed in the other class by using fuzzy clustering. That can greatly reduce the scale of the training sample set and improve the generalization performance by effectively avoiding over-learning caused by outlier data, and finally speed up learning rate without reducing the classification accuracy.
出处
《计算机应用》
CSCD
北大核心
2009年第10期2736-2740,共5页
journal of Computer Applications
基金
天津市自然科学基金资助项目(07JCZDJC10800)
关键词
支持向量机
点集
模糊C-均值
潜在支持向量
孤立点
Support Vector Machine (SVM)
point set
Fuzzy C-Means (FCM)
potential support vector
outlier