期刊文献+

一种适应于不平衡数据集的改进TANC算法 被引量:1

An improved TANC algorithm adapted to unbalanced data sets
下载PDF
导出
摘要 传统的分类方法对不平衡数据集进行分类时对数据集中少数类的分类准确率不高,而少数类往往对结果的影响尤为重要.为此提出一种适应于不平衡数据集的改进树扩展型朴素贝叶斯(TANC)算法,该算法首先利用Relief算法对样本中的少数类进行权重分配,然后通过训练数据集,使缺失数据补齐,并通过将属性分割成多个有限区间,使连续数据离散化,将修改后的训练集用以训练TANC,最后通过TANC算法对数据集进行分类.基于UCI标准数据集上的实验结果表明,该算法的整体性能优于TANC算法. During the classification of unbalanced data sets with traditional classification method,the accuracy of classification of the minor class in them is low,while the influence of the minor class on the results will be specifically significant.Thus,an improved TANC algorithm adapted to unbalanced data sets is proposed in this paper.In this algorithm the Relief algorithm is employed first to make weight distribution for the minor class in the sample,then the missed data are supplemented by training the data set,the continuous data are discretized by dividing attributes into multiple finite intervals,and the modified training set is used to train TANC.Finally,the data set is classified by using TANC algorithm.It is shown by the result of the experiment on UCI standard data sets that the overall performance of the proposed algorithm is superior to that of the TANC algorithm.
出处 《兰州理工大学学报》 CAS 北大核心 2014年第5期86-89,共4页 Journal of Lanzhou University of Technology
基金 国家自然科学基金(61263003) 甘肃省自然科学基金(1112RJZA028) 甘肃省高校基本科研项目(1203ZTC061)
关键词 机器学习 不平衡数据集 TANC算法 RELIEF算法 machine learning unbalanced data sets TANC algorithm Relief algorithm
  • 相关文献

参考文献9

二级参考文献112

  • 1徐燕,李锦涛,王斌,孙春明,张森.不均衡数据集上文本分类的特征选择研究[J].计算机研究与发展,2007,44(z2):58-62. 被引量:20
  • 2郑恩辉,李平,宋执环.不平衡数据知识挖掘:类分布对支持向量机分类的影响[J].信息与控制,2005,34(6):703-708. 被引量:17
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:388
  • 4Japkowicz N. Learning from imbalanced data sets: A comparison of various strategies, WS-00-05 [R]. Menlo Park, CA: AAAI Press, 2000 被引量:1
  • 5Chawla N V, Japkowicz N, Kotcz A. Editorial: Special issue on learning from imbalaneed data sets [J]. Sigkdd Explorations Newsletters, 2004, 6( 1 ) : 1-6 被引量:1
  • 6Weiss Gary M. Mining with rarity: A unifying frameworks [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 7-19 被引量:1
  • 7Maloof M A. Learning when data sets are imbalanced and when costs are unequal and unknown [OL]. [2008-01-06]. http://www. site. uottawa. ca/-nat/workshop2003/workshop 2003. html 被引量:1
  • 8Chawla N V, Hall L O, Bowyer K W, et al. SMOTE: Synthetic minority oversampling technique [J]. Journal of Artificial Intelligence Research, 2002, 16 : 321-357 被引量:1
  • 9Jo Taeho, Japkowicz Nathalie. Class imbalances versus small disjunets [J]. SIGKDD Explorations Newsletters, 2004, 6 (1): 40-49 被引量:1
  • 10Batista E A P A, Prati R C, Monard M C. A study of the behavior of several methods for halaneing machine learning training data [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 20-29 被引量:1

共引文献115

同被引文献18

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部