期刊文献+

基于非平稳割点的大数据分类样例选择 被引量:3

Unstable Cut-Points Based Sample Selection for Large Data Classification
下载PDF
导出
摘要 针对传统样例选择方法压缩大数据集时,存在计算复杂度较高、时间消耗较大的问题,文中提出基于非平稳割点的样例选择方法.依据在区间端点得到凸函数的极值这一基本性质,通过标记非平衡割点度量一个样例为端点的程度,然后选取端点程度较高的样例,从而避免样例之间距离的计算.该方法旨在不影响分类精度的前提下,达到压缩数据集、提高计算效率的目的.实验表明,文中方法对于类别不平衡度较高的数据集压缩效果明显,同时表现出较强的抗噪性. When the traditional sample selection methods are used to compress the large data, the computational complexity and large time consumption are high. Aiming at this problem, a sample selection method based on unstable cuts for the compression of large data sets is proposed in this paper. The extreme value is obtained at the interval endpoint for convex function, and therefore the endpoint degree of a sample is measured by making the unstable cuts of all attributes according to the basic property. The samples with higher endpoint degree are selected, and the calculation of the distance between the samples is avoided. The efficiency of the computation is improved without affecting the classification accuracy. The experimental results show a significant effect of the proposed algorithm on the compression for the large data set with high imbalance ratio and strong ability of anti-noise.
作者 王熙照 邢胜 赵士欣 WANG Xizhao XING Sheng ZHAO Shixin(College of Mathematics and Information Science, Hebei University, Baoding 071002 School of Management, Hebei University, Baoding 071002 College of Computer Science and Engineering, Cangzhou Normal University, Cangzhou 061001 Department of Mathematics and Physics, Shijiazhuang Tiedao University, Shijiazhuang 050045)
出处 《模式识别与人工智能》 EI CSCD 北大核心 2016年第9期780-789,共10页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.713710630) 深圳市科技计划项目(No.JCYJ20150324140036825)资助~~
关键词 大数据分类 样例选择 非平稳割点 决策树 Large Data Classification Sample Selection Unstable cut-points Decision Tree
  • 相关文献

参考文献25

  • 1BRYANT R E, KATE R H, LAZOWSKA E D. Big-Data Computing: Creating Revolutionary Breakthroughs in Commerce, Science, and Society[EB/OL].[2012-10-02]. http://videolectures.net/eswc2012_grobelnik_big_data. 被引量:1
  • 2WILSON D R, MARTINEZ T R. Reduction Techniques for Instance-Based Learning Algorithms. Machine Learning, 2000, 38(3): 257-286. 被引量:1
  • 3BRIGHTON H, MELLISH C. Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery, 2002, 6(2): 153-172. 被引量:1
  • 4HART P E. The Condensed Nearest Neighbor Rule. IEEE Trans on Information Theory, 1968, 14(3): 515-516. 被引量:1
  • 5GATES G W. The Reduced Nearest Neighbor Rule. IEEE Trans on Information Theory, 1972, 18(3): 431-433. 被引量:1
  • 6RITTER G, WOODRUFF H, LOWRY S, et al. An Algorithm for the Selective Nearest Neighbour Decision Rule. IEEE Trans on Information Theory, 1975, 21(6): 665-669. 被引量:1
  • 7NIKOLAIDIS K, GOULERMAS J Y, WU Q H. A Class Boundary Preserving Algorithm for Data Condensation. Pattern Recognition, 2011, 44(3): 704-715. 被引量:1
  • 8GARCA S, DERRAC J, CANO J R, et al. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Trans on Pattern Analysis and Machine Intelligence, 2012, 34(3): 417-435. 被引量:1
  • 9ZHAI J H, LI T, WANG X Z. A Cross-Selection Instance Algorithm. Journal of Intelligent and Fuzzy Systems, 2016, 30(2): 717-728. 被引量:1
  • 10CHEN J N, ZHANG C M, XUE X P, et al. Fast Instance Selection for Speeding up Support Vector Machines. Knowledge-Based Systems, 2013, 45: 1-7. 被引量:1

同被引文献11

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部