期刊文献+

数据挖掘中分类算法分析与量化研究 被引量:8

Quantitative Evaluation of Classification Algorithms Used in Data Mining
下载PDF
导出
摘要 采用交叉验证方法对C4.5、Bayesian置信网络、序贯最小优化(SMO)3种主流数据挖掘分类算法进行了实验分析,分别得出了在相同训练、测试样本数据下3种算法建立模型所需时间、分类准确性、覆盖率及margin曲线。分析了训练样本数量对3种算法的不同影响,为使用者在不同的样本质量下选择相应的分类算法提供理论和实验依据。 Aim. In our opinion, it is important to know how to select for use the best one out of the following three typical classification algorithms: C4. 5, Bayesian network and sequential minimal optimization (SMO). We now present our experimental results that can, in our opinion, be helpful in such selection. In the full paper, we explain in some detail how we obtain and analyze these experimental results; in this abstract, we just add some pertinent remarks to naming the first two sections in the full paper. Section 1 is: the classification algorithms. In this section, we use the cross validation method to compare the advantages and disadvantages of the three classification algorithms. Section 2 is: experimental analysis. In this section, in light of the five criteria of accuracy, speed, robustness, cover rate and comprehensibility, we analyze our experimental results and obtain the amounts of time needed respectively for the three classification algorithms to establish their models with the same training set and testing set. We also obtain their classification accuracy and margin curves as shown in Figs 1 through 3. The evaluation results, given in Tables 1 through 3, show preliminarily that the Bayesian network algorithm should be selected because its calculation speed, accuracy and robustness all satisfy its requirements.
作者 张原 高向阳
出处 《西北工业大学学报》 EI CAS CSCD 北大核心 2008年第6期718-722,共5页 Journal of Northwestern Polytechnical University
关键词 数据挖掘 分类算法 训练样本 margin曲线 data mining, Bayesian networks, classification algorithm, margin curve
  • 相关文献

参考文献2

二级参考文献19

  • 1[1]Heckerman D. Bayesian networks for data mining [J]. Data Mining and Knowledge Discovery, 1997, 1: 79~119. 被引量:1
  • 2[2]Heckerman D, Geiger D, Chickering D. Learning Bayesian Networks: the combination of knowledge and statistical data [J]. Machine Learning, 1995, 20: 196~243. 被引量:1
  • 3[3]Geiger D, Heckerman D. A characterization of the Dirichlet distribution with applicable to learning Bayesian networks [A]. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence [C]. Montreal, QU, 1995. 196~207. 被引量:1
  • 4[4]Cooper G, Herskovits E. A Bayesian method for the induction of probabilistic networks from data [J]. Machine Learning, 1992, 9: 309~347. 被引量:1
  • 5[5]Dagum P, Luby M. Approximating probabilistic inference in Bayesian belief networks is NP-hard [J]. Artificial Intelligence, 1993, 60: 141~153. 被引量:1
  • 6[6]Chickering D. Learning equivalence classes of Bayesian-network structures [A]. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence [C]. Portland, OR: Morgan Kaufmann, 1996. 被引量:1
  • 7[7]Heckerman D, Mamdani A, Wellman M. Real-world applications of Bayesian networks [J]. Communications of the ACM, 1995, 38 (3): 24~26. 被引量:1
  • 8[8]Sewell W, Shah V. Social class, parental encouragement, and educational aspirations [J]. American Journal of Sociology, 1968, 73: 559~572. 被引量:1
  • 9[9]Spirtes P, Glymour C, Scheines R. Causation, Predication, and Search [M]. New York: Springer-Verlag, 1993. 被引量:1
  • 10[10]Cheeseman P, Stutz J. Bayesian classification (AutoClass): Theory and results [A]. Fayyad U, Piatesky-Shapiro G, Smyth P, et al (Eds.). Advances in Knowledge Discovery and Data Mining [C]. Menlo Park, CA: AAAI Press, 1995. 被引量:1

共引文献94

同被引文献44

引证文献8

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部