基于数据网格环境的k近邻查询

k Nearest Neighbor Queries Based on Data Grid

下载PDF

导出

摘要提出一种在网格环境下的k近邻查询方法——GkNN.到目前为止,尚未有文献提出数据网格环境下的k近邻查询算法.当用户在查询节点提交一个查询向量和k,首先以一个较小的查询半径,在数据节点进行基于双重距离尺度的向量缩减,然后将缩减后的向量按照向量“打包”传输的方式发送到执行节点,在执行节点并行地对这些候选向量进行距离(求精)运算.最终将结果向量返回到查询节点.当返回的向量个数小于k时,扩大半径值,继续循环直到得到k个最近邻向量为止.理论分析和实验证明该方法在减少网络通信开销、增加I/O和CPU并行、降低响应时间方面具有较好的性能,非常适合海量高维数据的查询. Proposed in this paper is a novel k-nearest neighbor query algorithm based on data grid, called the GkNN. Three steps are made in the GkNN. First, when user submits a query vector and k, the vector reduction is performed using DDM index. Then the candidate vectors are transferred to the execution nodes by using vector package technique. Furthermore, the refinement process is conducted in parallelism to get the answer set of the candidate vectors. Finally, the answer set is transferred to the query node. The proposed algorithm uses vector reduction algorithm, vector package technique and pipelined parallelism to solve the problem of heterogeneity of network bandwidth between nodes on the data grid. The analysis and experimental results show that the performance of the algorithm is good in minimizing the response time by decreasing network transmission cost and increasing parallelism of I/O and CPU.

作者庄毅庄越挺吴飞

机构地区浙江大学计算机科学与技术学院

出处《计算机研究与发展》 EI CSCD 北大核心 2006年第11期1876-1885,共10页 Journal of Computer Research and Development

基金国家杰出青年基金项目(60525108) 国家自然科学基金重点项目(60533090) 国家"九七三"重点基础研究发展规划基金项目(2002CB312101) 浙江省科技计划项目重大科技基金项目(2005C13032) 浙江省科技计划项目重大科技攻关基金项目(2005C11001-05) 高等学校中英文图书数字化国际合作计划基金(http://www.cadal.zju.edu.cn)

关键词 K近邻查询类超球超球交数据网格 k-nearest neighbor query cluster hypersphere hypersphere intersection data grid

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献15

1庄越挺等编著..网上多媒体信息分析与检索[M].北京:清华大学出版社,2002:364.
2I Foster,C Kesselman.The Grid:Blueprint for a New Computing inFrastructure[M].San Francisco,CA:Morgan Kaufmann,1998 被引量：1
3B Segal.Grid Computing:The European data grid project[C].The 2000 IEEE Nuclear Science Symposium and Medical Imaging Conf,Lyon,2000 被引量：1
4The IVDGL project[OL].http://www.ivdgl.org,2006 被引量：1
5The Globus Toolkit[OL].http://www.globus.org,2006 被引量：1
6The SDSC storage resource broker[OL].http://www.sdsc.edu/srb,2005 被引量：1
7J Smith,A Gounaris,P Watson,et al.Distributed query processing on the grid[C].In:Proc of the 3rd Int'l Workshop on Grid Computing.Berlin:Springer-Verlag,2002.279-290 被引量：1
8杨东华,李建中,张文平.基于数据网格环境的连接操作算法[J].计算机研究与发展,2004,41(10):1848-1855. 被引量：8
9Christian Bhm,Stefan Berchtold,Daniel Keim.Searching in high-dimensional spaces:Index structures for improving the performance of multimedia databases[J].ACM Computing Surveys,2001,33(3):322-373 被引量：1
10A Guttman.R-tree:A dynamic index structure for spatial searching[C].ACM SIGMOD Int'l Conf on Management of Data,Boston,MA,1984 被引量：1

二级参考文献9

1I Foster, C Kcsselrnan. The Grid: Blueprint for a New Computing Infrastructure. San Francisco, CA: Morgan Kaufmann, 1998 被引量：1
2A Chervenak, I Foster, C Kesselman, et al. The data grid:Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 2001, 23:187～200 被引量：1
3Wolfgang Hoschek, Javier Jaen Martinez, Asad Samar, et al.Data management in an international data grid project. In: Proc of the 1st IEEE/ACM Int'l Workshop on Grid Computing. Berlin:Springer-Verlag, 2000. 17～20 被引量：1
4B Segal. Grid Computing: The European data grid project. The 2000 IEEE Nuclear Science Symposium and Medical Imaging Conference, Lyon, France, 2000 被引量：1
5Heinz Stockinger. Distributed database management systems and the data grid. The 18th IEEE Symp on Mass Storage Systems and the 9th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego, CA, 2001 被引量：1
6J Smith, A Gounaris, P Watson, et al. Distributed query processing on the grid. In: Proc of the 3rd Int'l Workshop on Grid Computing. Berlin: Springer-Verlag, 2002. 279～290 被引量：1
7M Nedim Alpdemir, Arijit Mukherjee, Norman W Paton, et al.Service-based distributed querying on the grid. UK e-Science Programme All Hands Conference, Nottinghan, UK, 2003 被引量：1
8Z Ives, D Florescu, M Friedman, et al. An adaptive query execution system for data integration. In: Proc of the 1999 ACM SIGMOD Int'l Conf on Management of Data. New York: ACM Press, 1999. 299～310 被引量：1
9Nick Roussopoulos, Hyunchul Kang. A pipeline n-way join algorithm based on the 2-way semijoin program. IEEE Trans on Knowledge and Data Engineering, 1991, 3(4): 486～495 被引量：1

共引文献7

1石柯,林海华,徐彬.AnyQuery:网格环境下基于服务的分布式查询处理系统[J].小型微型计算机系统,2006,27(8):1432-1438. 被引量：6
2庄毅,庄越挺,吴飞.基于数据网格的书法字k近邻查询[J].软件学报,2006,17(11):2289-2301. 被引量：3
3申德荣,于戈,聂铁铮,寇月.支持多领域动态数据集成的数据库网格系统[J].软件学报,2006,17(11):2302-2313. 被引量：10
4蔡红云,张建勋,田俊峰,何欣枫.校园网格环境下异构数据库的集成与分布式查询[J].广西师范大学学报（自然科学版）,2007,25(4):298-301. 被引量：7
5印桂生,于翔,宁慧.一种基于网格的增量聚类算法[J].计算机应用研究,2009,26(6):2038-2040. 被引量：4
6胡华,庄毅,胡海洋,赵格华.网格环境下基于流水线的多重相似查询优化[J].软件学报,2010,21(1):55-67. 被引量：1
7谭云松.网格环境中异构数据访问和集成研究[J].重庆文理学院学报（自然科学版）,2010,29(5):33-36. 被引量：1

1庄毅,庄越挺,吴飞.基于数据网格的书法字k近邻查询[J].软件学报,2006,17(11):2289-2301. 被引量：3
2王自营,邱绵浩,安钢,王凯.基于一类超球面支持向量机的机械故障诊断研究[J].振动工程学报,2008,21(6):553-558. 被引量：10
3庄毅,翁建广,庄越挺,吴飞.一种基于双重距离尺度的高维索引结构[J].浙江大学学报（工学版）,2007,41(3):380-385. 被引量：3
4高晶,王韶霞.基于Cuboids特征的多类超球面支持向量机动作识别研究[J].廊坊师范学院学报（自然科学版）,2012,12(2):29-32.
5杨颖娴.改进的二叉树支持向量机在人脸识别中的应用[J].科学技术与工程,2012,20(20):4930-4934. 被引量：1
6庄毅,庄越挺,吴飞.一种基于编码的双距离树高维索引[J].中国科学（E辑）,2007,37(12):1491-1503. 被引量：1
7李太白,唐万梅.一种改进的SVM多类分类算法在入侵检测中的应用[J].重庆师范大学学报（自然科学版）,2012,29(5):63-66. 被引量：8
8李力.无线传感网中一种基于支持向量机的异常事件检测方案[J].计算机应用与软件,2015,32(2):272-277. 被引量：4
9杨晓华,侯巍,王树新,梁捷,刘颉.基于CAN总线的水下机器人执行节点设计与实现[J].海洋技术,2005,24(4):14-17. 被引量：7
10王力,王世强.基于CAN总线的AUV分布式控制系统[J].水雷战与舰船防护,2011,19(2):22-25. 被引量：2

计算机研究与发展

2006年第11期

浏览历史

内容加载中请稍等...

基于数据网格环境的k近邻查询

参考文献15

二级参考文献9

共引文献7

相关作者

相关机构

相关主题

浏览历史