摘要
SMOTE算法是处理不平衡数据的一种经典的过采样算法,文中对该算法进行改进。首先采用k-means算法对原始数据进行聚类,利用类判别函数对聚类样本进行筛选,筛选出“安全样本”。然后利用新的过采样率对“安全样本”进行线性插值,并且在插值过程中采用LMKNN方法。分别将该算法与SMOTE、KNSMOTE应用至实际数据中,使用SVM分类算法分类并进行性能对比。结果表明,对Abalone、Ecoli等不平衡数据集分类时,文中使用的算法分类效果最佳,验证了该算法的有效性。
The SMOTE algorithm is a classic oversampling algorithm for handling imbalanced data,and this article improves it.Firstly,the k-means algorithm is used tOcluster the original dataset.Use the class discriminant function tOfilter the clustering samples and select"safe samples".Finally,a new oversampling rate is used tOlinearly interpolate the"safe samples",and the LMKNN method is used during the interpolation process.This algorithm was applied tOimbalanced datasets separately with SMOTE and KNSMOTE,and the classification performance was compared using SVM classification algorithm.The results show that the algorithm used in this paper has better classification performance in imbalanced datasets such as Abalone and Ecoli,verifying the effectiveness of the algorithm.
作者
马宝霖
胡茜
MA Baolin;HU Qian(School of Mathematics&Statistics,Changchun University of Technology,Changchun 130012,China)
出处
《长春工业大学学报》
CAS
2024年第3期259-264,共6页
Journal of Changchun University of Technology
基金
吉林省重大科技专项(20210301038GX)。