期刊文献+

Clustering by Pattern Similarity 被引量:2

Clustering by Pattern Similarity
原文传递
导出
摘要 The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance. The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance.
作者 王海勋 裴健
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第4期481-496,共16页 计算机科学技术学报(英文版)
关键词 data mining CLUSTERING pattern similarity data mining, clustering, pattern similarity
  • 相关文献

参考文献23

  • 1Ester M, Kriegel H, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. SIGKDD, 1996, pp.226-231. 被引量:1
  • 2Ng R T, Han J. Efficient and effective clustering methods for spatial data mining. In Proc. Santiago de Chile, VLDB, 1994, pp.144-155. 被引量:1
  • 3Zhang T, Ramakrishnan R, Livny M. Birch: An efficient data clustering method for very large databases. In Proc. SIGMOD, 1996, pp.103-114. 被引量:1
  • 4Murtagh F. A survey of recent hierarchical clustering algorithms. The Computer Journal, 1983, 26: 354-359. 被引量:1
  • 5Michalski R S, Stepp R E. Learning from observation: Conceptual clustering. Machine Learning: An Artificial Intelligence Approach, Springer, 1983, pp.331-363. 被引量:1
  • 6Fisher D H. Knowledge acquisition via incremental conceptual clustering. In Proc. Machine Learning, 1987. 被引量:1
  • 7Fukunaga K. Introduction to Statistical Pattern Recognition. Academic Press, 1990. 被引量:1
  • 8Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is nearest neighbors meaningful. In Proc. the Int. Conf. Database Theories, 1999, pp.217-235. 被引量:1
  • 9Aggarwal C C, Procopiuc C, Wolf J, Yu P S, Park J S. Fast algorithms for projected clustering. In Proc. SIGMOD, Philadephia, USA, 1999, pp.61-72. 被引量:1
  • 10Aggarwal C C, Yu P S. Finding generalized projected clusters in high dimensional spaces. In Proc. SIGMOD, Dallas, USA, 2000,pp.70-81. 被引量:1

同被引文献9

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部