期刊文献+

面向关联关系数据的分布式相似性查询方法 被引量:3

Distributed Similarity Query Method on Data with Relation Information
下载PDF
导出
摘要 带有关联关系的数据在社网平台、电子商务平台、科学数据库等环境中普遍存在,对其进行相似性查询是在各种应用中常见的操作。随着社网、电子商务、云计算等技术的发展和普及,具有关联关系的数据飞速增长,对这种类型的数据进行相似性查询成为数据库领域的一个研究热点。在此应用背景下,提出了一种基于决策树的面向关联关系型数据的分布式相似性查询方法。该方法依据属性的重要度计算相似性,计算过程中达到一定的准确度时可以结束计算,从而在保证准确性的情况下减少了计算量。同时提出了两种分布式环境下面向大数据量的决策树计算方法,该方法具有较少的通信代价,并且有概率理论保证其准确度。最后通过大量的实验证明了方法的有效性。 Data with relation information are ubiquitous in kinds of environments, such as social network, e-commerce and science database, etc. With the development and popularization of the technology of social network, e-commerce and cloud computing, data with relation information grow explosively, it becomes a hot research topic to process similarity query on the data in the database field. So this paper proposes a distributed similarity query method on data with relation information, which is based on decision tree. This method can compute the similarity according to the importance of attributes, and stop the computation when the precision is achieved, so as to reduce the computation cost. And this paper also proposes two algorithms of computing decision tree on large data, which cause less communication cost than existing methods and have accuracy guarantee. Lots of experiments verify the effectiveness and efficiency of the algorithms.
出处 《计算机科学与探索》 CSCD 2014年第7期778-789,共12页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金(60973021 61003060) 国家重点基础研究发展计划(973计划)(2012CB316201)~~
关键词 相似性查询 关联关系型数据 决策树 分布式查询方法 similarity query relation information decision tree distributed query method
  • 相关文献

参考文献1

二级参考文献77

  • 1王国仁,葛健,徐恒宇,郑若石.基于二分频率变换的序列相似性查询处理技术[J].软件学报,2006,17(2):232-241. 被引量:8
  • 2Dong G Z, Pei J. Sequence Data Mining [M]. Berlin: Springer, 2007. 被引量:1
  • 3Sarawagi S. Advanced Methods for Knowledge Discovery from Complex Data [M]. Berlin: Springer, 2005. 被引量:1
  • 4Hand D, Mannila H, Smyth P. Principles of Data Mining [M]. Cambridge, MA: MIT Press, 200]. 被引量:1
  • 5Brejova B, DiMarco C, Vinar T, et al. Finding patterns in biological sequences, CS-2000-22 [R]. Ontario: University of Waterloo, 2000. 被引量:1
  • 6Agrawal R, Faloutsos C, Swami A. Efficient similarity search in sequence databases [C] //Lomet D B. Proc of the 4th Int Conf on Foundations of Data Organization and Algorithms (FODO '93). Berlin: Springer, 1993:69-84. 被引量:1
  • 7Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems [C] //Popa L. Proe of the 21st ACM SIGART-SIGMOD-SIGART Syrup on Principles of Database System(PODS). New York: ACM, 2002:1-16. 被引量:1
  • 8Gusfield D. Algorithms on Strings, Trees, and Sequences[M]. New York: Cambridge Press, 1997. 被引量:1
  • 9Dayhoff M O, Schwartz R M, Orcutt B C. A model of evolutionary change in proteins [J]. National Biomedical Research Foundation, 1978, 5(3): 345-352. 被引量:1
  • 10Henikoff S, Henikoff J. Amino acid substitution matrices from protein blocks [J]. Proc of the National Academy Sciences of the United States of America (PNAS), 1992, 89 (22) : 10915-10919. 被引量:1

共引文献12

同被引文献28

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部