摘要
传统的分类方法对不平衡数据集进行分类时对数据集中少数类的分类准确率不高,而少数类往往对结果的影响尤为重要.为此提出一种适应于不平衡数据集的改进树扩展型朴素贝叶斯(TANC)算法,该算法首先利用Relief算法对样本中的少数类进行权重分配,然后通过训练数据集,使缺失数据补齐,并通过将属性分割成多个有限区间,使连续数据离散化,将修改后的训练集用以训练TANC,最后通过TANC算法对数据集进行分类.基于UCI标准数据集上的实验结果表明,该算法的整体性能优于TANC算法.
During the classification of unbalanced data sets with traditional classification method,the accuracy of classification of the minor class in them is low,while the influence of the minor class on the results will be specifically significant.Thus,an improved TANC algorithm adapted to unbalanced data sets is proposed in this paper.In this algorithm the Relief algorithm is employed first to make weight distribution for the minor class in the sample,then the missed data are supplemented by training the data set,the continuous data are discretized by dividing attributes into multiple finite intervals,and the modified training set is used to train TANC.Finally,the data set is classified by using TANC algorithm.It is shown by the result of the experiment on UCI standard data sets that the overall performance of the proposed algorithm is superior to that of the TANC algorithm.
出处
《兰州理工大学学报》
CAS
北大核心
2014年第5期86-89,共4页
Journal of Lanzhou University of Technology
基金
国家自然科学基金(61263003)
甘肃省自然科学基金(1112RJZA028)
甘肃省高校基本科研项目(1203ZTC061)