摘要
类别不平衡数据的分类问题是数据挖掘及机器学习过程中的一个研究热点,基于代价敏感学习方法通常用于解决类别不平衡数据分类问题,然而,它在实际应用过程中通常因样本的误分类成本未知而受到限制.针对此问题,文中采用群体智能算法优化样本的误分类代价.果蝇优化算法(Fruit fly optimization algorithm,FOA)是一种全局优化群智能算法,该算法具有原理简单、调节参数较少、收敛速度较高等优点.本研究首先提出了一种基于动态调整寻优步长的果蝇优化算法;其次,利用此果蝇优化算法良好的全局和局部搜索性能,对类别不平衡数据中样本的误分类代价进行了优化;最后,将改进果蝇优化算法学习样本误分类代价的策略应用到乳腺组织数据集的分类研究中.实验结果表明,本算法对类别不平衡数据的分类结果较好,能够有效的识别正、负两类样,解决了因误分类成本的先验信息无法直接获取而使基于代价敏感的不平衡数据分类方法使用受限的问题.
The classification problem of class-imbalanced data is a hot research topic in the process of data mining and machine learning.The cost-sensitive learning method is usually used to solve the classification problem of class-imbalanced data.However,it is usually due to the misclassification of samples in the actual application process.The cost is unknown and the use is limited.To resolve this problem,this study uses a swarm intelligence algorithm to optimize the cost of misclassification of samples.Fruit fly optimization algorithm is a global optimization swarm intelligence algorithm.The algorithm has the advantages of simple principle,fewer adjustment parameters,and higher convergence speed.This research first proposed a fruit fly optimization algorithm based on dynamically adjusting the optimization step length.Secondly,the good global and local search performance of this fruit fly optimization algorithm is used to optimize the misclassification cost of the samples in the class-imbalanced data.Finally,the improved FOA learning samples’misclassification cost strategy is applied to the classification research of breast tissue dataset.The experimental results show that the algorithm has a good classification result for class-imbalanced data,and can effectively identify positive and negative samples.This solves the problem of the limited use of cost-sensitive imbalanced data classification methods because the prior information of misclassification cost cannot be directly obtained.
作者
李锦珑
包理群
周彬
LI Jin-long;BAO Li-qun;ZHOU Bin(College of Electronic Information Engineering,Lanzhou Institute of Technology,Lanzhou 730050,Gansu,China)
出处
《西北师范大学学报(自然科学版)》
CAS
北大核心
2021年第3期57-61,共5页
Journal of Northwest Normal University(Natural Science)
基金
甘肃省科技计划资助项目(20JR10RA279)
甘肃省自然科学基金资助项目(20JR5RA378)
资源环境信息化甘肃省国际科技合作基地(2018-3-16)
甘肃省自然科学基金资助项目(20JR5RA377)
甘肃省高等学校创新基金资助项目(2020B-239)
高校创新能力提升项目(2020B-240)。
关键词
不平衡数据
分类
果蝇优化算法
代价敏感
支持向量机
imbalanced data
classification
fruit fly optimization algorithm
cost-sensitive
support vector machine