期刊文献+

入侵检测不平衡样本子群发现数据简化策略 被引量:2

Data reduction strategy to intrusion detection of imbalanced sample data for subgroup discovery
下载PDF
导出
摘要 为突破数据不平衡对算法的限制,更好地将子群发现技术应用到数据不平衡领域,设计了一个适合入侵检测数据集与子群发现算法的数据简化策略。实例简化阶段,应用均匀分布随机点定理与数据空间稀疏度构造属性相异度函数,借鉴稀有类信息仿制技术并结合进化学习实例选择算法提出稀有类拓展实例选择算法;属性简化阶段,采用主成分分析法,针对数据集特点保留相关兴趣特征以提高算法发现效率。实验证明,此方法适用于子群发现算法,能有效减少时间开销并提高规则发现效果。 In order to break the restrictions on imbalanced data to algorithm and make more available the subgroup discovery process in intrusion detection,this paper proposed a new data reduction strategy for subgroup discovery algorithm which had been designed in imbalanced dataset of intrusion detection. In the instance reduction stage,the theorems of uniformly distributed points and sparseness of data space were applied to construct attribute dissimilarity degree function firstly. And then it proposed the minority class extends instance selection algorithm combined the instance developed algorithm with synthetic minority over-sampling technique. Thirdly,it emploied instance selection algorithm to reduce the training data. In the feature reduction stage,aiming at the characteristic of dataset,the paper applied principal component analysis,a kind of feature selection algorithm to preserve the feature of interests to improve the efficiency of data discovery. The results show that the strategy is suitable for subgroup discovery on imbalanced datasets of intrusion detection,and can reduce time effectively and improve the quality of the subgroups discovered.
出处 《计算机应用研究》 CSCD 北大核心 2014年第7期2123-2126,共4页 Application Research of Computers
基金 山西省自然科学基金资助项目(2009011022-2) 山西省留学基金资助项目(2009-28) 山西省研究生优秀创新项目(20123030) 山西省卫生厅科研项目(201301006)
关键词 子群发现 不平衡数据集 数据简化 实例选择 特征选择 subgroup discovery imbalanced dataset data reduction instance selection feature selection
  • 相关文献

参考文献15

  • 1公茂果,郝琳,焦李成,王晓华,孙奕菲.基于人工免疫系统的数据简化[J].软件学报,2009,20(4):804-814. 被引量:10
  • 2CANO J R, HERRERA F, LOZANO M, et al. Making CN2-SD sub- group discovery algorithm scalable to large size data sets using instance selection [ J ]. Expert System with Applications, 2008,35 ( 4 ) :1949-1965. 被引量:1
  • 3CANO J R, GARCIA S, HERRERA F. Subgroup discover in large size data sets preprocessed using stratified instance selection for in- creasing the presence of minority classes [ J ]. Pattern Recognition Letters,2008,29(16) :2156-2164. 被引量:1
  • 4RODRtGUEZ D, RUIZ R, CUADRADO J, et al. Detecting fault modules applying feature selection to classifiers [ C ]//Proc of the 8th IEEE International Conference on Information Reuse and Integration. 2007 : 667-672. 被引量:1
  • 5TURHAN B, BENER A. Analysis of naive Bayes' assumptions on software fault data: an empirical study[ J]. Data & Knowledge En- gineering, 2009,68 (2) : 278- 290. 被引量:1
  • 6KLOESGEN W. Explora: a muhipattem and nauhistrategy discovery assistant[ C ]//Advances in Knowledge Discovery and Data Mining. [ S. 1. ] : American Association for Artificial Intelligence, 1996 : 249- 271. 被引量:1
  • 7WROBEL S. An algorithm for muhi-relational discovery of subgroups [C]//Proc of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery. 1997:75-87. 被引量:1
  • 8SIEBES A. Data surveying: foundations of an inductive query lan- guage[ C]//Proc of the 1st International Conference on Knowledge Discovery and Data Mining. [ S. 1. ] :AAAI Press,1995:269-274. 被引量:1
  • 9FRANCISO H, CRISTOBAL C J, PEDRO G, et al. An overview on subgroup discovery: foundations and applications [ J ]. Soft Compu- ting,2011,15(12) :2435-2448. 被引量:1
  • 10KIM K. Artificial neural networks with evolutionary instance selection for financial forecasting [ J ]. Expert Systems Applications,2006,30 (3) :519-526. 被引量:1

二级参考文献34

  • 1张光辉,李建东,周雷,刘静.运动对无线Ad Hoc网络中保持连通的临界传输半径的影响[J].西安电子科技大学学报,2005,32(6):881-884. 被引量:5
  • 2Liu H, Motoda H. Instance Selection and Construction for Data Mining. New York: Kluwer Academic Publishers, 2001.3-20. 被引量:1
  • 3Takashi F, Akio D. A Study of data reduction method with data accuracy for triangle data. In: Barolli L, ed. Proc. of the 1 lth Int'l Conf. on Parallel and Distributed Systems. Washington: IEEE Computer Society, 2005. 210-213. 被引量:1
  • 4Charu CA. An efficient subspace sampling fi'amework for high-dimensional data reduction, selectivity estimation, and nearest-neighbor search. IEEE Trans. on Knowledge and Data Engineering, 2004,16(10): 1247-1262. 被引量:1
  • 5Lynch RS, Willetl P K. A theoretical performance analysis of the Bayesian data reduction algorithm. In: Proc. of the 2005 IEEE Int'I Symposium on Systems, Man, and Cybernetics. Piscataway: IEEE Systems, Man, and Cybernetics Society, 2005. 330-335. 被引量:1
  • 6Tahani H, Plummer B, Hemamalini NS. A new data reduction algorithm for pattern classification. In: Proc. of the 1996 IEEE lnt'l Conf. on Acoustics, Speech and Signal Processing. Piscataway: IEEE Signal Processing Society, 1996. 3446-3449. 被引量:1
  • 7Cano JR, Herrera F, Lozano M. Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study. IEEE Trans. on Evolutionary Computation, 2003,7(6):561-575. 被引量:1
  • 8Liu H, Motoda H. Feature Selection for Knowledge Discovery and Data Mining. New York: Kluwer Academic Publishers, 1998. 被引量:1
  • 9Liu H, Motoda H. On issues of instance selection. Data Mining and Knowledge Discovery, 2002,6(2):115-130. 被引量:1
  • 10Cano JR, Herrera F, Lozano M. On the combination of evolutionary algorithm and stratified strategies for training set selection in data mining. Applied Soft Computation, 2006,6(3):323-332. 被引量:1

共引文献40

同被引文献9

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部