基于加权聚类质心的SVM不平衡分类方法被引量：4

Support vector machine imbalanced data classification based on weighted clustering centroid

下载PDF

导出

摘要不平衡数据分类是机器学习研究的热点问题,传统分类算法假定不同类别具有平衡分布或误分代价相同,难以得到理想的分类结果.提出一种基于加权聚类质心的SVM分类方法,在正负类样本上分别进行聚类,对每个聚类,用聚类质心和权重因子代表聚类内样本分布和数量,相等类别数量的质心和权重因子参与SVM模型训练.实验结果表明,该方法使模型的训练样本具有较高的代表性,分类性能与其他采样方法相比得到了提升. Classification of imbalanced data has become a research hot topic in machine learning. Traditional classi- fication algorithms assume that different classes have balanced distribution or equal misclassification cost, thus, making it hard to get ideal result of classifications. A support vector machine （ SVM） classification method based on weighted clustering centroid was proposed in this paper. First, unsupervised clustering was applied to the positive and negative samples respectively to extract the clustering centroid of each clustering, which was represented the most in compactness of the clustering sample. Next, all clustering centroids formed a new set of balance training. In order to minimize the information loss during clustering, each clustering centroid was associated with a weight factor that was defined proportional to the number of samples of the class. Finally, all clustering centroids and weight fac- tors participated in the training of the improved SVM model. Experimental results show that the proposed method can make the sample selected from model train sets more typical and improve the classification performance better than other sampling techniques for dealing with imbalanced data.

作者胡小生钟勇

机构地区佛山科学技术学院电子与信息工程学院

出处《智能系统学报》 CSCD 北大核心 2013年第3期261-265,共5页 CAAI Transactions on Intelligent Systems

基金佛山市科技发展专项资金资助项目(2011AA100061) 佛山市产学研专项资金资助项目(2012HC100272) 佛山市教育局智能评价指标体系研究项目(DX20120220)

关键词机器学习不平衡数据分类聚类质心支持向量机 machine learning imbalanced data classification clustering centroid support vector machine

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献13

1叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,4(2):148-156. 被引量：72
2RONALDO C P, GUSTAVO E A, MARIA C M. A study with class imbalance and random sampling for a decision tree learning system [ C ]//International Conference for Information Processing. Milano, Italy, 2008: 131-140. 被引量：1
3WU Junjie, XIONG Hui, WU Peng, et al. Local decomposition for rare class analysis[ C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2007: 814-823. 被引量：1
4HE Haibo, GARCIA E A. Learning from imbalanced data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284. 被引量：1
5李雄飞,李军,董元方,屈成伟.一种新的不平衡数据学习算法PCBoost[J].计算机学报,2012,35(2):202-209. 被引量：64
6付忠良.不平衡多分类问题的连续AdaBoost算法研究[J].计算机研究与发展,2011,48(12):2326-2333. 被引量：17
7VEROPOULOS K, CAMPBELL C, CRISTIANINI N. Controlling the sensitivity of support vector machines[ C ]//Proceedings of the International Joint Conference on Artificial Intelligence. San Francisco, USA, 1999 : 55-60. 被引量：1
8AKBANI R, KWEK S, JAPKOWICZ N. Applying support vetor machines to imbalanced datasets [ C ]//Proceedings of 15th European Conference on Machine Learning. Pisa, Italy, 2004: 39-50. 被引量：1
9WU G, CHANG E Y. KBA: kernel boundary alignment considering imbalanced data distribution [ J ]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17 (6) : 786-795. 被引量：1
10ERTEKIN S, HUAN J, BOTTON L, et al. Learning on the border: active learning in imbalanced data classification [ C ]//Proceedings of the ACM Conference on Information and Knowledge Management. Lisbon, Portugal, 2007 : 127-136. 被引量：1

二级参考文献42

1武勃,黄畅,艾海舟,劳世竑.基于连续Adaboost算法的多视角人脸检测[J].计算机研究与发展,2005,42(9):1612-1621. 被引量：66
2凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量：35
3Schapire R E. The strength of weak learnability [J]. Machine Learning, 1990, 5(2): 197-227. 被引量：1
4Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting [J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. 被引量：1
5Schapire R E, Singer Y. Improved boosting algorithms using confidence-rated predictions [J]. Machine Learning, 1999, 37(3): 297-336. 被引量：1
6Schapire R E, Freund Y, Bartlett P, et al. Boosting the margin: A new explanation for the effectiveness of voting methods [J]. The Annals of Statistics, 1998, 26 (5) : 1651- 1686. 被引量：1
7Viola P, Jones M. Robust real-time face detection [J]. Int Journal of Computer Vision, 2004, 57(2): 137-154. 被引量：1
8Breiman L, Random forests[J]. Machine Learning, 2001, 45(1), 5-32. 被引量：1
9Friedman J, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting [J]. Annals of Statistics, 2000, 28(2): 337-374. 被引量：1
10Fu Zhongliartg, Yao Yu, Zhao Xianghui. The best combining of classifiers with prior probabilities [C]//Proc of the 6th Int Conf on Machine Learning and Data Mining in Pattern Recognition. Leipzig, Germany: IBM, 2009:104-114. 被引量：1

共引文献145

1罗丹.一种基于多维高斯云模型的过采样方法[J].周口师范学院学报,2020(2):104-107. 被引量：1
2高子寒,宋燕.基于边界增强和去噪的自适应双权重过采样方法研究[J].智能计算机与应用,2022,12(1):58-64.
3文益民,李健,杜飞明,陈方.集成学习算法在不平衡分类中的应用研究[J].计算技术与自动化,2009,28(2):103-106.
4王成,刘亚峰,王新成,闫桂荣.分类器的分类性能评价指标[J].电子设计工程,2011,19(8):13-15. 被引量：30
5王瑞伟,李志华.离群数据规则挖掘的决策树构造方法[J].计算机工程与设计,2011,32(5):1781-1784.
6方磊,马溪骏.基于信息熵的改进型支持向量机客户流失预测模型应用研究[J].情报学报,2011,30(6):643-648. 被引量：5
7秦姣龙,王蔚.Bagging组合的不平衡数据分类方法[J].计算机工程,2011,37(14):178-179. 被引量：12
8付忠良.多分类问题代价敏感AdaBoost算法[J].自动化学报,2011,37(8):973-983. 被引量：32
9杨明生,张春光,杨晓东.醒脑通腑液治疗急性期脑出血30例观察[J].实用中医药杂志,2000,16(2):6-6.
10李艳玲,郭文普,徐东辉.一种不平衡数据的分类方法[J].中国电子科学研究院学报,2012,7(3):246-251. 被引量：5

同被引文献44

1沈徐辉,罗小平.基于模糊的改进KPCA方法[C]//Proceedings of the 29th Chinese Control Conference.Beijing:2010(7):29-31. 被引量：3
2程鹏.矩阵论[M].西安:西北工业大学出版社,1989. 被引量：3
3Bezdek J. Pattern Recognition with Fuzzy Objective Func-tion Algorithms[M], New York: Plenum Press, 198]. 被引量：1
4Pal N R,Bezdek J C. On Cluster Validity for the Fuzzy C-mean model [J]. IEEE Trans on Fuzzy Systems, 1995,3(3):370-379. 被引量：1
5Blake C,Keogh E,Merz C J. UCI repository of machinelearning databases [EB/OL]. http://www. ics. uci. edu/.mlearn /MLRepository. htm. 被引量：1
6CATENI S, COLLA V, VANNUCCI M. A method for resampling imbalanced datasets in binary classification tasks for real-world problems[J]. Neurocomputing, 2014, 135: 32-41. 被引量：1
7ZHANG Huaxiang, LI Mingfang. RWO-Sampling: a random walk over-sampling approach to imbalanced data classification[J]. Information fusion, 2014, 20: 99-116. 被引量：1
8CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16(1): 321-357. 被引量：1
9CHEN Xiaolin, SONG Enming, MA Guangzhi. An adaptive cost-sensitive classifier[C]//Proceedings of the 2nd International Conference on Computer and Automation Engineering. Singapore: IEEE, 2010, 1: 699-701. 被引量：1
10WANG Shijin, XI Lifeng. Condition monitoring system design with one-class and imbalanced-data classifier[C]//Proceedings of the 16th International Conference on Industrial Engineering and Engineering Management. Beijing, China: IEEE, 2009: 779-783. 被引量：1

引证文献4

1殷士勇.基于FCM-KFDA判别的不平衡数据集分类[J].华中师范大学学报（自然科学版）,2013,47(6):776-780.
2胡小生.基于双支持向量机的大样本分类算法[J].佛山科学技术学院学报（自然科学版）,2015,33(4):26-30. 被引量：1
3胡小生,温菊屏,钟勇.动态平衡采样的不平衡数据集成分类方法[J].智能系统学报,2016,11(2):257-263. 被引量：13
4赵振冲,王晓丹.引入拒识的最小风险弹道目标识别[J].西安交通大学学报,2018,52(4):132-138. 被引量：1

二级引证文献15

1曹曼曼,汪勉.关于分布式数据库准确分类仿真研究[J].计算机仿真,2019,36(1):354-357. 被引量：2
2祁斌,詹国华,李志华.关于自然语言交互中语音信号优化识别仿真[J].计算机仿真,2018,35(4):137-140. 被引量：5
3李文,张林郁.智能型医疗器械产品并行开发数据集成仿真[J].计算机仿真,2018,35(8):357-360. 被引量：1
4王思晨,丁家满.一种不平衡数据集成分类方法[J].软件导刊,2018,17(8):76-80. 被引量：2
5孙海霞,木合塔尔.克力木,王晨,李卉.RS-CS-SVM在电液伺服系统故障诊断中的应用[J].组合机床与自动化加工技术,2018(6):47-50. 被引量：1
6王莉,陈红梅.基于NKSMOTE算法的非平衡数据集分类方法[J].计算机科学,2018,45(9):260-265. 被引量：6
7顾海艳,王权.基于随机森林算法的吸毒人员甄别模型研究[J].南京师大学报（自然科学版）,2019,42(2):44-49. 被引量：1
8童威,黄启萍.优化BP神经网络在非均衡数据分类中的应用[J].长春工业大学学报,2019,40(3):263-269. 被引量：2
9程显生,王俊,王寿东.基于知识获取的网络增量数据自动分片仿真[J].计算机仿真,2020,37(5):322-325.
10毕春光,逄锦秀,袁帅,皇可.关联数据信息深度摘取中的核心特征聚类仿真[J].计算机仿真,2020,37(9):312-316.

1祁亨年,杨建刚,方陆明.基于多类支持向量机的遥感图像分类及其半监督式改进策略[J].复旦学报（自然科学版）,2004,43(5):781-784. 被引量：14
2滕金芳,钟诚.基于聚类的敏感属性-多样性匿名化算法[J].计算机工程与设计,2010,31(20):4378-4381. 被引量：6
3曾志强,高济,朱顺痣.基于约简SVM的网络入侵检测模型[J].计算机工程,2009,35(17):132-134. 被引量：7
4闫丽颖,王欢,杨颖.模糊c均值聚类在wav格式音频检索中的研究[J].中国科技信息,2006(02A):15-15. 被引量：1
5马仕玉,李益才,蓝章礼.一种具有优良抗噪性能的初始聚类质心选择算法[J].计算机科学,2014,41(S1):406-408.
6胡小生.基于双支持向量机的大样本分类算法[J].佛山科学技术学院学报（自然科学版）,2015,33(4):26-30. 被引量：1
7岑涌,罗林开.一种改善非平衡分布数据SVM分类能力的新策略[J].计算机与数字工程,2006,34(11):103-105. 被引量：3
8张银明,黄廷磊,林科,张嫱嫱.一种改进的k均值文本聚类算法[J].桂林电子科技大学学报,2016,36(4):311-314. 被引量：5
9陈建超,胡桂武,杨志华,严桂夺.基于全局性确定聚类中心的文本聚类[J].计算机工程与应用,2011,47(10):147-150. 被引量：5
10胡小生,张润晶,钟勇.一种基于聚类提升的不平衡数据分类算法[J].集成技术,2014,3(2):35-41. 被引量：6

智能系统学报

2013年第3期

浏览历史

内容加载中请稍等...

基于加权聚类质心的SVM不平衡分类方法被引量：4

参考文献13

二级参考文献42

共引文献145

同被引文献44

引证文献4

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

基于加权聚类质心的SVM不平衡分类方法 被引量：4

参考文献13

二级参考文献42

共引文献145

同被引文献44

引证文献4

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

基于加权聚类质心的SVM不平衡分类方法被引量：4