摘要
为突破数据不平衡对算法的限制,更好地将子群发现技术应用到数据不平衡领域,设计了一个适合入侵检测数据集与子群发现算法的数据简化策略。实例简化阶段,应用均匀分布随机点定理与数据空间稀疏度构造属性相异度函数,借鉴稀有类信息仿制技术并结合进化学习实例选择算法提出稀有类拓展实例选择算法;属性简化阶段,采用主成分分析法,针对数据集特点保留相关兴趣特征以提高算法发现效率。实验证明,此方法适用于子群发现算法,能有效减少时间开销并提高规则发现效果。
In order to break the restrictions on imbalanced data to algorithm and make more available the subgroup discovery process in intrusion detection,this paper proposed a new data reduction strategy for subgroup discovery algorithm which had been designed in imbalanced dataset of intrusion detection. In the instance reduction stage,the theorems of uniformly distributed points and sparseness of data space were applied to construct attribute dissimilarity degree function firstly. And then it proposed the minority class extends instance selection algorithm combined the instance developed algorithm with synthetic minority over-sampling technique. Thirdly,it emploied instance selection algorithm to reduce the training data. In the feature reduction stage,aiming at the characteristic of dataset,the paper applied principal component analysis,a kind of feature selection algorithm to preserve the feature of interests to improve the efficiency of data discovery. The results show that the strategy is suitable for subgroup discovery on imbalanced datasets of intrusion detection,and can reduce time effectively and improve the quality of the subgroups discovered.
出处
《计算机应用研究》
CSCD
北大核心
2014年第7期2123-2126,共4页
Application Research of Computers
基金
山西省自然科学基金资助项目(2009011022-2)
山西省留学基金资助项目(2009-28)
山西省研究生优秀创新项目(20123030)
山西省卫生厅科研项目(201301006)
关键词
子群发现
不平衡数据集
数据简化
实例选择
特征选择
subgroup discovery
imbalanced dataset
data reduction
instance selection
feature selection