摘要
数据挖掘就是从大量数据中发现以前未知的有用信息、模式、趋势的过程,分类是数据挖掘的一种主要方法。文章指出分类的实质是找出各属性对分类的贡献大小,然后采用分而治之的思想,先用条件概率的方法计算单个属性对分类的贡献,再利用遗传算法计算各属性对分类的重要程度,提出了条件概率与遗传算法相结合的分类方法,利用UCI数据集进行验证,并与相同条件下的其它分类方法进行了比较,实验表明该方法是一种简单有效的分类方法。
Data mining is a process of extracting useful information, patterns and trends, which are often previously unknown, from a database, Classification is a main method of data mining. This paper indicates that the essentiality of classification is finding every attributes' contribution to class, with illumination of this perspective, a novel method of classification based on genetic algorithm combined with conditional probability (GACP) is proposed, the solution of classification is divided into two steps in the method, the first step is to calculate contribution of individual attribute to class, the second step is to calculate coefficients, which is used to reflect important degree of attribute to class. To verify the efficiency, standard UCI source of data is adopted and comparison with other classification method is carried on. Results of experiment illustrates that the method proposed in this paper is very simple and efficient.
出处
《微电子学与计算机》
CSCD
北大核心
2006年第10期170-172,共3页
Microelectronics & Computer
基金
国家自然科学基金资助项目(60473083)
国家"863"计划项目(2005AA103110-2)
关键词
分类
数据挖掘
遗传算法
条件概率
Classification, Data mining, Genetic algorithm, Conditional probability