期刊文献+

一种面向不确定数据流的聚类算法 被引量:1

A Cluster Algorithm for Uncertain Data Stream
下载PDF
导出
摘要 作为大数据的重要组成,产生于传感器、移动电话设备、社交网络等的不确定流数据因其具有流速可变、规模宏大、单遍扫描及不确定性等特点,传统聚类算法不能满足用户高效实时的查询要求.首先利用MBR(minimum bounding rectangle)描述不确定元组的分布特性,并提出一种基于期望距离的不确定数据流聚类算法,计算期望距离范围的上下界剪枝距离较远的簇以减少计算量;其次针对簇内元组的分布特征提出了簇MBR的概念,提出一种基于空间位置关系的聚类算法,根据不确定元组MBR和簇MBR的空间位置关系排除距离不确定元组较远的簇,从而提高聚类算法效率;最后在合成数据集和真实数据集进行实验,结果验证了所提出算法的有效性和高效性. As an important component of big data generated in the sensor,mobile phone devices,social networks etc.,uncertain streaming data have many characteristics,such as variable rate,large-scale,single-pass scanning,and uncertainty. Traditional clustering algorithms cannot meet efficient real-time inquiry requirements for the users. Firstly, MBR( minimum bounding rectangle) was used to describe the distribution characteristics of uncertain tuples. And then,a clustering algorithm based on expected distance was proposed for uncertain data stream. The bounds of expected distance range to filter the clusters with far distance can be calculated.Secondly,cluster MBR concept based on the distribution of the tuples in a cluster was presented.Then,a clustering algorithm was given,which excludes the clusters far from the uncertain tuple by the spatial location relationship between uncertainty tuple MBR and clusters MBR,thereby increasing the efficiency of clustering algorithm. Finally, experiments running on synthetic datasets and real datasets verify that the proposed algorithms are effective and efficient.
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2016年第12期1677-1682,共6页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(61173029 61332006 61672144)
关键词 不确定数据流 聚类 大数据 数据挖掘 最小边界矩形 uncertain data stream cluster big data data mining MBR(minimum bounding rectangle)
  • 相关文献

参考文献5

二级参考文献131

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2余仕成.大学物理实验数据处理的几个问题讨论[J].武汉化工学院学报,2005,27(1):94-96. 被引量:9
  • 3谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 4朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量:51
  • 5Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599 被引量:1
  • 6Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350 被引量:1
  • 7Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430 被引量:1
  • 8Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188 被引量:1
  • 9Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81 被引量:1
  • 10Barbara D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502 被引量:1

共引文献248

同被引文献9

引证文献1

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部