期刊文献+

基于聚类的多维数据热点发现算法 被引量:6

Detecting Hotspot in Multi-dimensional Data Through Clustering
下载PDF
导出
摘要 数据热点发现的目标是找出数据集中的区域,并以易于人理解的方式将其展示出来.本文针对同时包含数值型特征和类别型特征的多维数据设计了数据热点发现算法,该算法的核心是改进CLTree设计的聚类算法CLTree+.本文改进了CLTree,使其能够直接对同时包含数值型特征和类别型特征的数据进行聚类,并提升了具有周期性性质的数值型特征的聚类效果.除此之外,相比CLTree,CLTree+还大幅度提升了计算效率,使其可以用于处理大规模数据. CLTree+被应用于某大型互联网公司的业务数据,成功找出了若干个数据热点,并以易于理解的特征取值组合的方式将这些信息展示出来. Hotspot detection in data aims at finding out those areas with high density of data,and presenting these areas in a interpretable way. In this work,hotspot detecting algorithm is designed to deal with multi-dimensional data containing numerical features as well as categorical features. The core of the algorithm is the clustering algorithm CLTree +,a significant improvement over the baseline CLTree. CLTree + is able to deal with numerical features and categorical features,and the clustering result of numerical features with periodical characteristics is also improved. Besides,the computational efficiency of CLTree + is also improved. CLTree + is applied to transaction data of large Internet businesses and find out a fewareas with high density of data,and these areas are presented as the easy to interpret combinations of attributes and its values.
作者 邹磊 朱晶 聂晓辉 苏亚 裴丹 孙宇 ZOU Lei;ZHU Jing;NIE Xiao-hui;SU Ya;PEI Dan;SUN Yu(Department of Compute Science and Techology,Tsinghua University,Beijing 100084,China;Beijing Didi Chuxing Company Limited,Beijing 100193,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2019年第3期465-471,共7页 Journal of Chinese Computer Systems
关键词 热点发现 聚类 数据挖掘 决策树 多维数据分析 Hotspot detection clustering data mining unsupervised decision tree multi-dimensional data analysis
  • 相关文献

参考文献3

二级参考文献23

  • 1HanJiawei MichelineKambe.数据挖掘概念与技术[M].北京:机械工业出版社,2001.. 被引量:149
  • 2Alsabti K,Ranka S,Singh V. An efficient k-means clustering algorithm. In:Proc. of the First Workshop on High Performance Data Mining, Orlando, FL, March 1998 被引量:1
  • 3Deneubourg J L,Goss S,Franks N,et al. The dynamics of collective sorting: Robot-like ants and ant-like robots. In: J -A Meyer and S Wilson, eds. Proc. of the First Intl. Conf. on Simulation of Adaptive Behaviour: From Animals to Animats 1, MIT Press,Cambridge, MA,1991. 356~365 被引量:1
  • 4Lumer E D,Faieta B. Diversity and Adaptation in Populations of Clustering Ants. In:Cliff D,Husbands P,Meyer J. Wilson S,eds.From Animals to Animats 3, Proc. of the 3rd Int. Conf. on the Simulation of Adaptive Behavior. Cambridge, MA: The MIT Press/Bradford Books, 1994 被引量:1
  • 5Handl J,Knowles J,Dorigo M. Ant-based Clustering: A Comparative study of its relative importance with respect to k-means, average link and 1D-SOM: [Technical Report TR/IRIDIA/2003-24]. Universite Libre de Bruxelles ,2003 被引量:1
  • 6Kuntz P, Snyers D. Emergent colonization and graph partitioning. In: Proc. of the third Intl. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 3 (SAB 94), D. Cliff, P. Husbands, J.A. Meyer, S W Wilson,eds. MIT Press,1994. 494~50 被引量:1
  • 7Monmarch'e N, Slimane M,Venturini G. On improving clustering in numerical databases with artificial ants. In: Lecture Notes in Artificial Intelligence, D Floreano J D Nicoud, F Mondala, eds.Swiss Federal Institute of Technology, Lausanne, Switzerland,(13-17 September 1999). Springer-Verlag,1999. 626~635 被引量:1
  • 8Hawkins D. Identification of Outliers. London: Chapman and Hall, 1980 被引量:1
  • 9张霞,王素贞,尹怡欣,赵海龙.基于模糊粒度计算的K-means文本聚类算法研究[J].计算机科学,2010,37(2):209-211. 被引量:12
  • 10张书彬,韩冀中,刘志勇,王凯.基于MapReduce实现空间查询的研究[J].高技术通讯,2010,20(7):719-726. 被引量:15

共引文献7

同被引文献51

引证文献6

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部