摘要
作为数据挖掘的一项重要任务,离群点检测已经引起人们的广泛关注.本文基于粗糙集理论来讨论离群点的定义与检测问题,提出了一种新的离群点定义——粗糙序列离群点以及相应的离群点检测算法RSOD.该算法利用粗糙集理论中的知识熵和属性重要性等概念来构建三种类型的序列,并通过分析序列中元素的变化情况来检测离群点.在UCI标准数据集上,将RSOD算法与现有的离群点检测算法进行了比较分析,实验结果表明,我们所提出的离群点检测方法是有效的.
As an important task of data mining,outlier detection has attracted much attention.We discuss the issues of outlier definition and detection based on rough set theory.We propose a new definition for outlier-rough sequence outlier,and the corresponding outlier detection algorithm RSOD.The algorithm constructs three kinds of sequences exploiting the notions of knowledge entropy and significance of attribute in rough sets,and detects outliers by analyzing changes of the elements in the sequences.We compare algorithm RSOD with the current outlier detection algorithms on UCI data sets.And experimental results show that our method is effective for outlier detection.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2011年第2期345-350,共6页
Acta Electronica Sinica
基金
国家自然科学基金(No.60802042)
国家863高技术研究发展计划(No.2007AA01Z325)
山东省自然科学基金(No.ZR2009GQ013)
关键词
离群点检测
粗糙集
数据挖掘
序列
知识熵
属性重要性
outlier detection
rough sets
data mining
sequence
knowledge entropy
significance of attribute