期刊文献+

一种有效的高属性维稀疏数据聚类算法 被引量:6

An Effective High Attribute Dimensional Sparse Clustering
原文传递
导出
摘要 聚类分析是数据挖掘最常见的技术之一.数据的规模、维数和稀疏性都是制约聚类分析的不同方面.本文提出一种有效的高属性维稀疏数据聚类方法.给出稀疏相似度、等价关系的相似度、广义的等价关系的定义.基于对象间的稀疏相似度和等价关系原理形成初始等价类.通过等价关系的相似度修正初始等价关系.使得最终聚类结果更合理.该算法聚类过程不依赖于输入样本的排列顺序.高维稀疏数据的有效压缩提高算法在维数较高时的执行效率.适合于高维稀疏数据的聚类分析. Clustering analysis is one of the most important techniques in data mining with scale, dimension and sparseness of dataset being three key factors that influence accuracy of clustering . An effective clustering algorithm for the high attribute dimension sparse data is proposed in this paper. Definitions are given, such as sparse similarity, similarity between equivalence relations and generalized equivalence relation. Based on these definitions, the theory of equivalence relation is applied to form initial clusters. Initial equivalence relations are modified in terms of the similarity between two equivalence relations in order to obtain more reasonable clustering results. High dimensional sparse data is effectively compressed and expressed as sparse feature vector whose dimension is far lower than that of original data. As a result, the proposed approach can handle an array of high dimensional sparse data with high efficiency, and be independent of sequence of the objects.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2006年第3期289-294,共6页 Pattern Recognition and Artificial Intelligence
基金 江苏省自然科学基金(No.BK2004137)
关键词 稀疏相似度 等价关系的相似度 数据压缩 聚类 Sparse Similarity, Similarity between Equivalence Relations, Data Compression, Clustering
  • 相关文献

参考文献8

二级参考文献22

  • 1王珏,袁小红,石纯一,郝继刚.关于知识表示的讨论[J].计算机学报,1995,18(3):212-224. 被引量:54
  • 2王珏,苗夺谦,周育健.关于Rough Set理论与应用的综述[J].模式识别与人工智能,1996,9(4):337-344. 被引量:264
  • 3焦李成.神经网络计算[M].西安:电子科技大学出版社,1996.. 被引量:53
  • 4苗夺谦.Rough Set理论及其在机器学习中的应用研究(博士学位论文)[M].北京:中国科学院自动化研究所,1997.. 被引量:3
  • 5苗夺谦,博士学位论文,1997年 被引量:1
  • 6Zhang T,et al.BIRCH:An efficient data clustering method for very large databases[A].Proc.of the ACM SIGMOD Int'l Conf on Management of Data[C].Montreal:ACM press,1996.73-84. 被引量:1
  • 7Guha S,et al.CURE:An efficient clustering algorithm for large databases[A].Proc.of the ACM SIGMOD Int'l Conf on Management of data[C].Seattle:ACM Press,1998.73-84. 被引量:1
  • 8Guha S,et al.A robust clustering algorithm for categorical attributes[A].Proc.of the 15th IEEE Int'l Conf on data Engineering[C].Sydney,Australia,1999.512-521. 被引量:1
  • 9Ester M,et al.A density-based algorithm for discovering clusters in large spatial database with noise[A].Proc.of 2nd Int'l Conf on KDD'96[C].Portland:AAAI Press,1996.226-231. 被引量:1
  • 10Zhang W,et al.STING:A statistical information grid approach to spatial data mining[A].Proc.of the 23th VLDB Conf[C].Athens:Morgan Kaufmann,1997.186-195. 被引量:1

共引文献280

同被引文献73

  • 1Ai-BoSong,Mao-XianZhao,Zuo-PengLiang,Yi-ShengDong,Jun-ZhouLuo.Discovering User Profiles for Web Personalized Recommendation[J].Journal of Computer Science & Technology,2004,19(3):320-328. 被引量:2
  • 2冯凌,林杰,雷星晖.Web日志数据挖掘模型研究[J].计算机集成制造系统,2005,11(8):1073-1075. 被引量:8
  • 3吴萍,宋瀚涛,牛振东,张利萍,张聚礼.基于SS/OSF实现高维稀疏数据对象的聚类[J].北京理工大学学报,2006,26(3):216-220. 被引量:5
  • 4宋江春,沈钧毅.一种新的Web用户群体和URL聚类算法的研究[J].控制与决策,2007,22(3):284-288. 被引量:11
  • 5Han J,Kamber M.Data mining:concepts and techniques[M].New York:Morgan Kaufmann,2001. 被引量:1
  • 6Beyer K S,Goldstein J,Ramakrishnan R,et al.When is nearest neighbor meaningful?[C] ∥Proceedings of the 7th International Conference on Database.Jerusalem:Springer-Verlag,1999:217-235. 被引量:1
  • 7Hirano S,Tsumoto S,Kuzaki T,et al.A clustering method for nomina1 and numerical data based on rough set theory[C] ∥Proc of the International Workshop on Rough Set Theory and Granular Computing.Matsue:Springer,Berlin,2001:211-216. 被引量:1
  • 8Castellano G,Fanelli A M,Mencar C,et al.Similarity-based fuzzy clustering for user profiling[C] ∥IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops.Washington D C:IEEE Computer Society,2007:75-78. 被引量:1
  • 9Zadeh L A.Some reflections on soft computing,granular computing and their roles in the conception,design and utilization of information/intelligent systems[J].Soft Computing,1998,2(1):23-25. 被引量:1
  • 10Xie Y,Raghavan V V,Dhatric P,et al.A new fuzzy clustering algorithm for optimally finding granular prototypes[J].International Journal of Approximate Reasoning,2005,40(1/2):109-124. 被引量:1

引证文献6

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部