摘要
离群数据挖掘与分析在网络入侵控制、信用卡检测、通信欺诈分析等诸多领域具有十分重要的意义。结合粗糙集理论的属性约简技术,定义了α-离群约简等概念,提出了一种以属性离群贡献率和离群划分相似水平为基础的基于遗传算法的α-离群约简算法。这种方法通过维数更小的属性子空间去获得相同或相近的离群数据集,使对离群数据来源及出现原因的分析和理解更加集中于较小的目标域。通过对现实数据集的实验表明,该算法可有效地产生出约简并具有较好的规模适应性。
Mining and analyzing for outliers is of great importance in many applications, including network invasion control, credit card and teleeom fraud detection, etc. A concept of a-outlying reduction is defined in the paper based on the approach of attribute reduction in the theory of rough set. Along with the discussion of outlying contribution rate of attributes and the level of outlying partition similarity, this paper proposes a searching algorithm for α-outlying reduction based on genetic algorithm. The approach can help us obtain similar outlier sets by means of searching in an attributes subspace with lesser dimension, which leads to that analyzing for origins and appearance reasons of outliers is focused better on narrow and specific object fields. Experimental results on real world data sets show that the proposed algorithm is scalable and efficient and it can result in optimal eduction.
出处
《计算机科学》
CSCD
北大核心
2006年第10期198-201,共4页
Computer Science
基金
重庆市自然科学基金资助项目(2005BB2224)。
关键词
离群约简
遗传算法
粗糙集
离群相似水平
Outlying reduction, Genetic algorithm, Rough set, Outlying similarity level