期刊文献+

基于多标注源的Deep Web查询结果自动标注 被引量:3

Multi-source automatic annotation for deep Web
下载PDF
导出
摘要 Deep Web查询结果的语义标注,是Deep Web数据集成的关键问题之一。提出基于多标注源的Deep Web查询结果自动标注框架,根据不同的特征设计多个标注器。基于搜索引擎的标注器,扩展AI领域常用的问答技术,构造验证查询并提交到搜索引擎,利用返回结果选择最合适的词汇用于标注,有效提高了标注的查准率和查全率。多个领域Web数据库的测试证明了该方法的有效性。 A large number of data on the World Wide Web are hidden behind form-like interfaces. These interfaces interact with a hidden backend database to provide answers to users' queries. But results returned by Web databases seldom have proper annotations, so it is necessary to assign meaningful labels to them. A framework of multi-source automatic annotation that used multi-annotator to annotate results from different aspects was proposed, especially searching engine-based annotator constructs validate queries and posting them to the search engine. It found the most appropriate terms to annotate the data units by calculating the similarities between terms and instances. Information for annotating can be acquired automatically without the support of domain ontology. Experiments over four real world domains indicate that the proposed approach is highly effective.
出处 《计算机应用》 CSCD 北大核心 2009年第1期196-200,共5页 journal of Computer Applications
基金 国家973计划项目(2007CB310801) 教育部重点项目(1070072) 高等学校学科创新引智计划项目(B07037)
关键词 DEEP WEB 语义标注 接口模式 验证查询 Deep Web semantic annotation interface schema validate query
  • 相关文献

参考文献15

  • 1CHANG K C , HE B , LI C , et al . Structured databases on the Web: Observations and implications[ J]. ACM SIGMOD Record, 2004, 33 (3):61 -70. 被引量:1
  • 2刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 3HE HAI, MENG W Y, LU Y Y, et al. Towards deeper understanding of the search interfaces of the deep Web[ J]. World Wide Web, 2007, 10(2) : 133 - 155. 被引量:1
  • 4CRESCENZI V, MECCA G, MERIALDO P. Roadrunner: Towards automatic data extraction from large Web sites[ EB/OL]. [ 2008 - 05 -05]. http://www, dia. uniroma3, it/- vldbproc/015_109, pdf. 被引量:1
  • 5WANG J, LOCHOVSKY F H. Data extraction and label assignment for Web databases[ C]//Proceedings of the 12th international conference on World Wide Web. New York: ACM Press, 2003:187 - 196. 被引量:1
  • 6ZHAO H, MENG W Y, WU Z, et al. Fully automatic wrapper generation for search engines [ EB/OL]. [ 2008 - 05 - 05 ]. http:// www. www2005, org/edrom/docs/p66, pdf. 被引量:1
  • 7ARLOTTA L, CRESCENZI V, MECCA G, et al. Automatic annotation of data extracted from large Web sites[ EB/OL]. [2008 -05 - 05]. http://www, cse. ogi. edu/webdb03/papers/02, pdf. 被引量:1
  • 8LU Y Y, HE H, ZHAO H K, et al. Annotating structured data of the deep Web [ C]//ICDE 2007: IEEE 23rd International Conference on Data Engineering. [ S.l. ] : IEEE Press, 2007: 376?385. 被引量:1
  • 9袁柳,李战怀,陈世亮.基于本体的Deep Web数据标注[J].软件学报,2008,19(2):237-245. 被引量:28
  • 10WU W, DOAN A, YU C T. WeblQ: Learning from the Web to match Deep-Web query interfaces[ EB/OL]. [2008 -05 -05]. http://www, dit. unitn, it/- p2p/RelatedWork/Matching/icde06- webiq, pdf. 被引量:1

二级参考文献69

  • 1.[EB/OL].http://www.cogsci.Princeton.edu,. 被引量:2
  • 2Fetterly D,Manasse M,Najork M,Wiener J L.A largescale study of the evolution of Web pages//Proceedings of the 12th International World Wide Web Conference.Budapest,2003:669-678 被引量:1
  • 3Chang K C,He B,Li C,Patel M,Zhang Z.Structured databases on the Web:Observations and Implications.SIGMOD Record,2004,33(3):61-70 被引量:1
  • 4Cope J,Craswell N,Hawking D.Automated discovery of search interfaces on the Web//Proceedings of the 14th Australasian Database Conference(ADC 2003).Adelaide,2003:181-189 被引量:1
  • 5Zhang Z,He B,Chang K C.Understanding Web query interfaces:Best-effort parsing with hidden syntax//Proceedings of the 23rd ACM SIGMOD International Conference on Management of Data.Paris,2004:107-118 被引量:1
  • 6Arasu A,Garcia-Molina H.Extracting structured data from Web pages//Proceedings of the 22nd ACM SIGMOD International Conference on Management of Data.San Diego,2003:337-348 被引量:1
  • 7Crescenzi V,Mecca G,Merialdo P.RoadRunner:Towards automatic data extraction from large Web sites//Proceedings of the 27th International Conference on Very Large Data Bases.Italy,2001:109-118 被引量:1
  • 8Wittenburg K,Weitzman L.Visual grammars and incremental parsing for interface languages//Proceedings of the IEEE Symposium on Visual Languages (VL).Skokie,1990:111-118 被引量:1
  • 9He H,Meng W,Yu C T,Wu Z.WISE-integrator:An automatic integrator of Web search interfaces for e-commerce//Proceedings of the 29th International Conference on Very Large Data Bases.Berlin,2003:357-368 被引量:1
  • 10Peng Q,Meng W,He H,Yu C T.WISE-cluster:Clustering e-commerce search engines automatically//Proceedings of the 6th ACM International Workshop on Web Information and Data Management.Washington,2004:104-111 被引量:1

共引文献157

同被引文献22

  • 1崔继馨,张鹏,杨文柱.基于DOM的Web信息抽取[J].河北农业大学学报,2005,28(3):90-93. 被引量:12
  • 2Lu Y Y, He H, Zhao H K, et al. Annotating Structured Data of the Deep Web [ C ]. In: Proceedings of the IEEE 23rd International Conference on Data Engineering. Istanbul : IEEE Computer, 2007 : 376 -385. 被引量:1
  • 3Wang J Y, Lochovsky F H. Data Extraction and Label Assignment for Web Databases [ C ]. In : Proceedings of the 12th International Conference on World Wide Web, Budapest, Hungary. New York, NY,USA: ACM Press, 2003 : 187 - 196. 被引量:1
  • 4Gaelle Hignette, Patrice Buche, Juliette Dibie - Barthelemy, et al. An Ontology- driven Annotation of Date Tables[ M]. Heidelberg: Springer Berlin, 2007 : 29 - 40. 被引量:1
  • 5Goldberg D E. Genetic Algorithms in Search, Optimization,and Machine Learning[ M]. Addison - Wesley, 1989. 被引量:1
  • 6Lin D K. An Information - theoretic Definition of Similarity [ C ]. In: Proceedings of the 15th International Conference on Machine Learning. Madison : ACM Press, 1998 : 296 - 304. 被引量:1
  • 7Van Rijsbergen C J. Information Retrieval[ M]. 2nd Edition. Department of Computer Science, University of Glasgow, 1979. 被引量:1
  • 8He H, Meng W, Yu C, et al. WISE - Integrator:A System for Extracting and Integrating Complex Web Search Interface of the Deep Web [ C ]. In : Proceedings of the 31 st VLDB Conference. Trondheim, Norway:VLDB Press, 2006:1314 - 1317. 被引量:1
  • 9Bright Planet Corp.The deep Web: Surfacing hidden value [EB/OL].[2010-10-20].http://www.completeplanet.com/Tutorials/Deep Web/. 被引量:1
  • 10WANG JIYING,LOCHOVSKY F H. Data extraction and label assignment for Web databases[C] // Proceedings of the 12th International World Wide Web Conference. New York: ACM,2003:187-196. 被引量:1

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部