摘要
化学数据挖掘可从海量数据中提取蕴含的知识,决策树方法是一种重要的挖掘工具。鉴于决策树在处理连续数据上的局限性,本研究提出先进行预处理,将连续属性离散化,通过特征选择删除其冗余量,以此为基础构建决策树。该方法可防止决策树模型“过细”,使之具有良好的预报性能。将此方法应用于两个化学样品分类实例,效果良好。与贝叶斯分析和单一的决策树方法相比,其预报正确率有显著提高,且表达形式直观明确,易于理解和分析,适用于化学分类知识模式的挖掘。
Chemical data mining can discover valuable knowledge from a large amount of data. As a data mining technique, decision tree is an important tool. Considering its limitation in dealing with continuous datasets. The pretreatment including discretization and feature selection was used to discretize continuous data and reduce the redundant attributes. Based on these steps, application of the decision tree classifier that was built can not only avoid over-fitting, but also have good predicting capacity. This method was applied to the deection of the glass and wine chemcial classification instances with good result that the prediction correct rates are 94.7% and 96.67 and the self -check correct rates are 95.5% and 96.88%, respectively. Compared with Bayes discriminant analysis and traditional decision tree algorithm, the correct prediction rate of this model is greatly improved and the classification rules that it produces are explicit and easy to understand. All these merits show that decision tree is a good tool for mining chemical pattern classification rules.
出处
《分析化学》
SCIE
EI
CAS
CSCD
北大核心
2005年第8期1091-1094,共4页
Chinese Journal of Analytical Chemistry
基金
国家自然科学基金项目(No.20276063)
浙江省重点科技项目(No.2004C21054)资助课题
关键词
预处理
决策树
化学数据挖掘
离散化
特征选择
化学模式分类
Data mining, decision tree, discretization, feature selection, chemical pattern classification