摘要
在数据挖掘过程中,缺损数据是不可避免的,因此,数据预处理是必不可少的前提工作。在传统的数据预处理工作中,朴素贝叶斯算法是最常用的缺损数据修补算法。然而,现实世界中的数据经常不满足其属性独立性假设,分类结果不令人满意。文章基于聚类分析思想,提出了一种改进的贝叶斯算法。对大量数据的计算结果表明此方法的合理性、可信度优于朴素贝叶斯算法。
The problem of defective data often arises during the course of data mining.Thus data preprocessing is necessary.In the traditional data preprocessing,naive Bayesian method is commonly used to remedy defective data. However,the assumption that attributes are independent is always unfit for data of real world,and the classification result is unsatisfactory.On the basis of clustering analysis ,this paper presents an improved Bayesian method.The result of calculations on mass data shows that this method is more reasonable and believable compared with naive Bayesian method.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第28期159-160,163,共3页
Computer Engineering and Applications
关键词
数据挖掘
数据预处理
聚类分析
贝叶斯算法
data mining,data preproeessing,elustering analysis,Bayesian method