摘要
采用交叉验证方法对C4.5、Bayesian置信网络、序贯最小优化(SMO)3种主流数据挖掘分类算法进行了实验分析,分别得出了在相同训练、测试样本数据下3种算法建立模型所需时间、分类准确性、覆盖率及margin曲线。分析了训练样本数量对3种算法的不同影响,为使用者在不同的样本质量下选择相应的分类算法提供理论和实验依据。
Aim. In our opinion, it is important to know how to select for use the best one out of the following three typical classification algorithms: C4. 5, Bayesian network and sequential minimal optimization (SMO). We now present our experimental results that can, in our opinion, be helpful in such selection. In the full paper, we explain in some detail how we obtain and analyze these experimental results; in this abstract, we just add some pertinent remarks to naming the first two sections in the full paper. Section 1 is: the classification algorithms. In this section, we use the cross validation method to compare the advantages and disadvantages of the three classification algorithms. Section 2 is: experimental analysis. In this section, in light of the five criteria of accuracy, speed, robustness, cover rate and comprehensibility, we analyze our experimental results and obtain the amounts of time needed respectively for the three classification algorithms to establish their models with the same training set and testing set. We also obtain their classification accuracy and margin curves as shown in Figs 1 through 3. The evaluation results, given in Tables 1 through 3, show preliminarily that the Bayesian network algorithm should be selected because its calculation speed, accuracy and robustness all satisfy its requirements.
出处
《西北工业大学学报》
EI
CAS
CSCD
北大核心
2008年第6期718-722,共5页
Journal of Northwestern Polytechnical University