高效的数据源选择方式被引量：1

Efficient Method for Database Selection

下载PDF

导出

摘要随着关键词查询技术的飞速发展和互联网数据的迅猛增长,高效、准确的数据源选择变得十分有意义。提出了一种基于倒排列表的数据源选择方式,通过这种方式,能够在短时间内选择出相关度高的数据源,在这些数据源中执行检索,从而减少查询时间,给用户带来了更好的查询体验。从实验结果可以看出,这种方法在实际系统(例如机票查询系统)中可以得到很好的效果。为了在大规模的数据集上高效地实现相关算法,将min-hash算法应用到相似度估计中来,减少了查询空间和时间的消耗。与传统算法的比较结果表明:min-hash算法能够得到较高的精确度,并且极大地节省了算法的运行时间。 With the rapid growth and deployment of the distributed databases over the Internet, it calls for new efficient search method over multiple structured data sources. This paper proposes a new keyword-search method for effective database selection using inverted lists. The method can achieve a high interactive speed and thus can improve user experiences. This method has been implemented on airticket-search systems, and experimental results show that it achieves high search performance. For large scale data, a min-hash based algorithm is adopted to select highly relevant data sources, which can improve the performance and achieve high precision.

作者黄维篁李国良冯建华

机构地区清华大学计算机科学与技术系

出处《计算机科学与探索》 CSCD 2010年第10期890-898,共9页 Journal of Frontiers of Computer Science and Technology

基金国家自然科学基金No.60873065 国家高技术研究发展计划(863)No.2009AA011906 内蒙古自治区高等学校科学研究项目No.NJzy08152~~

关键词数据源选择关键词查询概要 min-hash算法 database selection keyword search database summary min-hash based algorithm

分类号 TP311.133.1 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1Agrawal S,Chaudhuri S,Das G.DBXpiore:A system for keyword-based search over relational databases[C] //Proceedings of the 18th International Conference on Data Engineering(ICDE),San Jose,26 February-1 March 2002.Washington D C:IEEE Computer Society,2002:5-16. 被引量：1
2Bhalotia G,Hulgeri A,Nakhe C,et al.Keyword searching and browsing in databases using Banks[C] //Proceedings of the 18th International Conference on Data Engineering.(ICDE),San Jose,26 February-1 March,2002:431-440. 被引量：1
3Hristidis V,Gravano L,Papakonstantinou Y.Efficient IR-style keyword search over relational databases[C] //Proceedings of the 29th VLDB Conference,Berlin,Germany,2003.CA:Morgan Kaufmann,2003:850-861. 被引量：1
4Callan J P,Lu Z,Croft W B.Searching distributed collections with inference networks[C] //Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95),Seattle,July 9-13,1995.CA:ACM Press,1995:21-28. 被引量：1
5Yuwono B,Lee D L.Server ranking for distributed text retrieval systems on the Internet[C] //Proceedings of the 5th International Conference on Database Systems for Advanced Applications(DASFAA),Melbourne,Australia,April 1-4,1997.Singapore:World Scientific,1997,6:41-50. 被引量：1
6Yu B,Li G,Sollins K,et al.Effective keyword-based selection of relational databases[C] //Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD),Beijing,China,June 12-14,2007.USA:ACM Press,2007:139-150. 被引量：1
7Vu Q H,Ooi B C,Papadias D,et al.A graph method for keyword-based selection of top-K databases[C] //Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD),Vancouver,BC,Canada,June 10-12,2008.CA:ACM Press,2008:915-926. 被引量：1
8Broder A Z,Charikar M,Frieze A M,et al.Min-wise independent permutation[J].Journal of Computer and System Sciences,2000,60:630-659. 被引量：1
9Broder A Z.On the resemblance and containment of documents[C] //Proceedings of Compression and Complexity of SEQUENCES 1997.CA:IEEE Computer Society,1998. 被引量：1
10Broder A Z.Identifying and filtering near-duplicate documents[C] //Proceedings of 11th Annual Symposium on CPM,Montreal,2000.Germany:Springer,2000,1848:1-10. 被引量：1

同被引文献9

1Sarwar M B, Karypis G, Konstan A J, et al. hem-based collaborative filtering recommendation algorithms [ C ]/! Proceedings of the 10th International Conference on World Wide Web. 2001:285-295. 被引量：1
2Sarwar M B, Karypis G, Konstan A J, et al. Application of dimensionality reduction in recommender system: A case stud- y[C]// WebKDD Workshop at the ACM SIGKKD. 2000. 被引量：1
3Mcginty L, Smyth B. Adaptive selection: An analysis of critiquing and preference-based feedback in conversational recommender systems [ J ]. International Journal of Elec- tronic Commerce, 2006,11(2) :35-57. 被引量：1
4Gaede V, Gunther O. Multidimensional access methods [ J ]. ACM Computing Surveys, 1998,30 (2) : 170-231. 被引量：1
5Rajaraman A, Ullman J D. Mining of Massive Datasets [ M ]. Cambridge University Press, 2010. 被引量：1
6Broder A Z. On the resemblance and containment of docu- ments [ C ]// Proceedings of the Compression and Com- plexity of Sequences, 1997. 1997:21-29. 被引量：1
7Charikar M S. Similarity estimation techniques from roun- ding algorithms [ C ]/! Proceedings of the 34th Annual ACM Symposium on Theory of Computing. 2002:380- 388. 被引量：1
8李晓光.基于联接的高校图聚类方法研究[D].沈阳:辽宁大学,2012. 被引量：1
9蔡衡,李舟军,孙健,李洋.基于LSH的中文文本快速检索[J].计算机科学,2009,36(8):201-204. 被引量：13

引证文献1

1卞艺杰,陈超,马玲玲,陈远磊.一种改进的LSH/MinHash协同过滤算法[J].计算机与现代化,2013(12):19-22. 被引量：5

二级引证文献5

1王伟军,宋梅青.一种面向用户偏好定向挖掘的协同过滤个性化推荐算法[J].现代图书情报技术,2014(6):25-32. 被引量：13
2钟川,陈军.基于精确欧氏局部敏感哈希的改进协同过滤推荐算法[J].计算机工程,2017,34(2):74-78. 被引量：6
3张庆梅.舆情去重算法的研究与比较[J].电子设计工程,2017,25(14):23-27. 被引量：1
4王冰玉,吴振宇,沈苏彬,陈佳颖.社交媒体事件检测研究综述[J].计算机技术与发展,2018,28(9):105-111. 被引量：1
5徐运海,李博文,赖伟,史超.基于信令数据的多维度伴随计算分析[J].中国电子科学研究院学报,2022,17(6):572-576.

1王道才.在WPS中让字体一目了然[J].电脑迷,2006,0(22):84-84.
2毛金玲.关系型数据库系统的设计方法研究[J].中小企业管理与科技,2015,0(10):308-309.
3百度开放平台引发争议[J].IT时代周刊,2010(18):28-28.
4人们最爱用的密码大“揭密”[J].中学英语之友（新教材高三版）,2009(3):28-28.
5肖殿华.大数据环境下的网络安全问题探讨[J].网络空间安全,2016,7(7):80-82. 被引量：1
6董艳雪,田明宝.浅谈基于内容的图像检索技术[J].科技信息,2006,0(10):174-174.
7张颖,李昕.一种关系数据库上的关键词查询排序方法[J].辽宁工业大学学报（自然科学版）,2013,33(5):305-309. 被引量：2
8李燕霞,孙淑丽,刘玉平.在AutoCAD中数据选择方式[J].河南测绘,2003(3):19-20.
9王金金.基于关系数据库的关键词查询研究[J].中国科技财富,2010(20):10-10.
10林子雨,杨冬青,王腾蛟,张东站.基于关系数据库的关键词查询[J].软件学报,2010,21(10):2454-2476. 被引量：49

计算机科学与探索

2010年第10期

浏览历史

内容加载中请稍等...

高效的数据源选择方式被引量：1

参考文献11

同被引文献9

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

高效的数据源选择方式 被引量：1

参考文献11

同被引文献9

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

高效的数据源选择方式被引量：1