期刊文献+

基于关键词的深度万维网数据库选择 被引量:11

Keyword-Based Deep Web Database Selection
下载PDF
导出
摘要 该文提出一种基于关键词的深度万维网查询方法:用户用关键词的方式提交查询,该方法在线地选择能够反映查询意图并且提供高质量结果的万维网数据库.这种方法既避免了深度万维网数据抓取这一代价高、难度大的操作,又可支持多领域的数据库上的关键词查询,从而能够与现有的搜索引擎实现无缝集成.文中侧重于讨论基于关键词的数据库选择,从以下两个方面解决这一问题所涉及的挑战:(1)提出了一种度量关键词-领域属性关联的相关性模型,并设计了基于随机游动的算法从查询日志中发现潜在的关键词-属性关联;(2)给出了一种新的数据采样方法,并用于基于采样的数据库-查询的相关性模型中,最终解决深度万维网的数据库选择问题.在中文深度万维网真实数据集上的实验表明:提出的方法能够有效地选择与关键词查询相关的数据库,提供高质量的结果. This paper proposes a keyword-based Deep Web search method: Given keyword queries provided by users,the proposed method on-the-fly selects the databases capturing the queryintent and providing high-quality data.The method,which is much more efficient than Deep Webcrawling,can support keyword search over multiple-domain Deep Web databases,and thus can besmoothly integrated with the existing search engine architecture.In this paper,we focus on key-word-based Deep Web database selection,and studythe research challenges that naturally arisein the proposed method.(1) We introduce an effective model to measure the relevance of database-domain attributes with respect to keyword queries,and propose a random-walk algorithm to compute the relevance fromdatabase query logs.(2) We develop a novel database sampling method for measuring the relevance of databases with respect to queries,in order to select relevant data-bases in the selected domains.We have implemented our methods on real data sets fromthe Chinese Deep Web.The experi mental results show that our methods achieve high effectiveness.
作者 范举 周立柱
出处 《计算机学报》 EI CSCD 北大核心 2011年第10期1797-1804,共8页 Chinese Journal of Computers
基金 国家自然科学基金重点项目"支持中文Web研究的基础设施建设和应用中的基本方法与关键技术"(60833003)资助
关键词 深度万维网 万维网数据库 关键词查询 领域选择 数据库选择 deep Web Web databases keyword search domain selection database selection
  • 相关文献

参考文献13

  • 1Madhavan J, Cohen S, Dong X, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the CIDR. Asilomar, USA, 2007: 342-350. 被引量:1
  • 2刘玉奎,周立柱,范举.中文深度万维网数据库的现状研究[J].计算机学报,2011,34(2):360-370. 被引量:7
  • 3Madhavan J, Ko D, Kot L, Ganapathy V, Rasmussen A, Halevy A. Google's deep web crawl. PVLDB, 2008, 1: 1241- 1252. 被引量:1
  • 4He H, Meng W, Yu C, Wu Z. Automatic integration of Web search interfaces with wise integrator. VLDB Journal, 2004, 12: 256- 273. 被引量:1
  • 5He B, Zhang Z, Chang K C-C. Knocking the door to the deep web: Integrating web query interfaces//Proceedings of theSIGMOD. Paris, France, 2004:913-914. 被引量:1
  • 6Zhang Z, He B, Chang K C C. Light weight domain based form assistant: Querying Web databases on the Fly//Proceedings of the VLDB. Trondheim, Norway, 2005:97-108. 被引量:1
  • 7Fan J, Li G, Zhou L. Interactive SQL query suggestion: Making databases user-friendly//Proeeedings of the ICDE. Hannover, Germany, 2011:351- 362. 被引量:1
  • 8Agarwal G, Kabra G, Chang K C C. Towards rich query in terpretation: Walking back and forth for mining query tern plates//Proceedings of the WWW. Raleign, USA, 2010: 1-10. 被引量:1
  • 9Bu Y, Howe B, Balazinska M, Ernst M D. HaLoop: Efficient iterative data processing on large clusters. PVLDB, 2010, 3(1): 285 -296. 被引量:1
  • 10Si L, Callan J P. Relevant document distribution estimation method for resource selection//Proceedings of the S1GIR. Toronto, Canada, 2003: 298-305. 被引量:1

二级参考文献16

  • 1Ipeirotis P G,Gravano L,Sahami M.Probe,count,and classify:Categorizing hidden web databases//Proceedings of the SIGMOD Conference.Santa Barbara,CA,2001:67-78. 被引量:1
  • 2Chau M,Chen H.A machine learning approach to web page filtering using content and structure analysis.Decision Support Systems,2008,44(2):482-494. 被引量:1
  • 3Barbosa L,Freire J.Combining classifiers to identify online databases//Proceedings of the 16th International Conference on World Wide Web.Banff,Alberta,Canada,2007:431-440. 被引量:1
  • 4Cope J,Craswell N,Hawking D.Automated discovery ofsearch interfaces on the web//Proceedings of the 14th Australian Database Conference.Australia,2003:181-189. 被引量:1
  • 5Raghaven S,Garcia-Molina H.Crawling the hidden web//Proceedings of the 27th International Conference on Very Large Data Bases.Italy,2001,129-138. 被引量:1
  • 6Chang K C,He B,Li C.Structured databases on the Web:Observations and implications.SIGMOD Record,2004,33 (3):61270. 被引量:1
  • 7Gravano L,Ipeirotis P G,Sahami M.QProber:A system for automatic classification of hidden-web databases.ACM Transactions on Information System,2003,22(1):1-41. 被引量:1
  • 8Su W,Wang J,Lochovsky F H.Automatic hierarchical classification of structured deep web databases//Proceedings of the 7th International Conference on Web Information Systems Engineering,China,2006:210-221. 被引量:1
  • 9He B,Tao T,Chang K C-C.Clustering structured Web sources:A schema-based,model-differentiation approach// Proceedings of the Current Trends in Database Technology-EDBT 2004 Workshops.Greece,2004:536-546. 被引量:1
  • 10Lu Y,He H,Peng Q,Meng W,Yu C T.Clustering e-commerce search engines based on their search interface pages using wise-cluster.Data Knowledge Engine,2006,59(2):231-246. 被引量:1

共引文献6

同被引文献77

  • 1姚天顺,张俐,高竹.WordNet综述[J].语言文字应用,2001(1):27-32. 被引量:33
  • 2赵朋朋,高岭,崔志明.基于查询接口特征的Deep Web数据源自动分类[J].微电子学与计算机,2006,23(10):47-50. 被引量:11
  • 3吴友政,赵军,徐波.基于主题语言模型的句子检索算法[J].计算机研究与发展,2007,44(2):288-295. 被引量:8
  • 4刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 5Umara Noor, Zahid Rashid, Azhar Rauf. A survey of automat- ic Deep Web classification techniques[J]. International Journalo{ Computer Applications,2011,19(6) :43-50. 被引量:1
  • 6Jiawei Han,Micheline Kamber.数据挖掘:概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2007. 被引量:1
  • 7LI Guoliang, WU Hao, FENG Jianhua, et al. DBease: Mak- ing databases user-friendly and easily accessible[C]//Proceed- ings of the 5th Biennial Conference on Innovative Data System Research, 2011 : 45-56. 被引量:1
  • 8Bin He, Zhen Zhang, Kevin Chen-Chuan Chang. MetaQueri er: Querying Structured Web Sources on-the-fly[C]//Proceed ings of the 2005 ACM SIGMOD international conference on Management of data, 2005 : 927-929. 被引量:1
  • 9George A. Miller, WordNet: a Lexical Database for English [M]. Communications of the ACM, 1995,38(11) : 39-41. 被引量:1
  • 10LIANG Hao, ZUO Wanli, REN Fei, et al. Translating Que ry for Deep Web Using Ontology [C]//Proceedings of the 2008 International Conference on Computer Science and Soft ware Engineering, 2008 (4) : 427-430. 被引量:1

引证文献11

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部