基于多标注源的Deep Web查询结果自动标注被引量：3

Multi-source automatic annotation for deep Web

下载PDF

导出

摘要 Deep Web查询结果的语义标注,是Deep Web数据集成的关键问题之一。提出基于多标注源的Deep Web查询结果自动标注框架,根据不同的特征设计多个标注器。基于搜索引擎的标注器,扩展AI领域常用的问答技术,构造验证查询并提交到搜索引擎,利用返回结果选择最合适的词汇用于标注,有效提高了标注的查准率和查全率。多个领域Web数据库的测试证明了该方法的有效性。 A large number of data on the World Wide Web are hidden behind form-like interfaces. These interfaces interact with a hidden backend database to provide answers to users＇ queries. But results returned by Web databases seldom have proper annotations, so it is necessary to assign meaningful labels to them. A framework of multi-source automatic annotation that used multi-annotator to annotate results from different aspects was proposed, especially searching engine-based annotator constructs validate queries and posting them to the search engine. It found the most appropriate terms to annotate the data units by calculating the similarities between terms and instances. Information for annotating can be acquired automatically without the support of domain ontology. Experiments over four real world domains indicate that the proposed approach is highly effective.

作者崔晓军彭智勇曾承

机构地区温州科技职业学院计算机系武汉大学软件工程国家重点实验室武汉大学计算机学院

出处《计算机应用》 CSCD 北大核心 2009年第1期196-200,共5页 journal of Computer Applications

基金国家973计划项目(2007CB310801) 教育部重点项目(1070072) 高等学校学科创新引智计划项目(B07037)

关键词 DEEP WEB 语义标注接口模式验证查询 Deep Web semantic annotation interface schema validate query

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献15

1CHANG K C , HE B , LI C , et al . Structured databases on the Web: Observations and implications[ J]. ACM SIGMOD Record, 2004, 33 (3):61 -70. 被引量：1
2刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量：136
3HE HAI, MENG W Y, LU Y Y, et al. Towards deeper understanding of the search interfaces of the deep Web[ J]. World Wide Web, 2007, 10(2) : 133 - 155. 被引量：1
4CRESCENZI V, MECCA G, MERIALDO P. Roadrunner: Towards automatic data extraction from large Web sites[ EB/OL]. [ 2008 - 05 -05]. http://www, dia. uniroma3, it/- vldbproc/015_109, pdf. 被引量：1
5WANG J, LOCHOVSKY F H. Data extraction and label assignment for Web databases[ C]//Proceedings of the 12th international conference on World Wide Web. New York: ACM Press, 2003:187 - 196. 被引量：1
6ZHAO H, MENG W Y, WU Z, et al. Fully automatic wrapper generation for search engines [ EB/OL]. [ 2008 - 05 - 05 ]. http:// www. www2005, org/edrom/docs/p66, pdf. 被引量：1
7ARLOTTA L, CRESCENZI V, MECCA G, et al. Automatic annotation of data extracted from large Web sites[ EB/OL]. [2008 -05 - 05]. http://www, cse. ogi. edu/webdb03/papers/02, pdf. 被引量：1
8LU Y Y, HE H, ZHAO H K, et al. Annotating structured data of the deep Web [ C]//ICDE 2007: IEEE 23rd International Conference on Data Engineering. [ S.l. ] : IEEE Press, 2007: 376?385. 被引量：1
9袁柳,李战怀,陈世亮.基于本体的Deep Web数据标注[J].软件学报,2008,19(2):237-245. 被引量：28
10WU W, DOAN A, YU C T. WeblQ: Learning from the Web to match Deep-Web query interfaces[ EB/OL]. [2008 -05 -05]. http://www, dit. unitn, it/- p2p/RelatedWork/Matching/icde06- webiq, pdf. 被引量：1

二级参考文献69

1.[EB/OL].http://www.cogsci.Princeton.edu,. 被引量：2
2Fetterly D,Manasse M,Najork M,Wiener J L.A largescale study of the evolution of Web pages//Proceedings of the 12th International World Wide Web Conference.Budapest,2003:669-678 被引量：1
3Chang K C,He B,Li C,Patel M,Zhang Z.Structured databases on the Web:Observations and Implications.SIGMOD Record,2004,33(3):61-70 被引量：1
4Cope J,Craswell N,Hawking D.Automated discovery of search interfaces on the Web//Proceedings of the 14th Australasian Database Conference(ADC 2003).Adelaide,2003:181-189 被引量：1
5Zhang Z,He B,Chang K C.Understanding Web query interfaces:Best-effort parsing with hidden syntax//Proceedings of the 23rd ACM SIGMOD International Conference on Management of Data.Paris,2004:107-118 被引量：1
6Arasu A,Garcia-Molina H.Extracting structured data from Web pages//Proceedings of the 22nd ACM SIGMOD International Conference on Management of Data.San Diego,2003:337-348 被引量：1
7Crescenzi V,Mecca G,Merialdo P.RoadRunner:Towards automatic data extraction from large Web sites//Proceedings of the 27th International Conference on Very Large Data Bases.Italy,2001:109-118 被引量：1
8Wittenburg K,Weitzman L.Visual grammars and incremental parsing for interface languages//Proceedings of the IEEE Symposium on Visual Languages (VL).Skokie,1990:111-118 被引量：1
9He H,Meng W,Yu C T,Wu Z.WISE-integrator:An automatic integrator of Web search interfaces for e-commerce//Proceedings of the 29th International Conference on Very Large Data Bases.Berlin,2003:357-368 被引量：1
10Peng Q,Meng W,He H,Yu C T.WISE-cluster:Clustering e-commerce search engines automatically//Proceedings of the 6th ACM International Workshop on Web Information and Data Management.Washington,2004:104-111 被引量：1

共引文献157

1魏勇刚,张国春,常勇,袁方.基于词性分析和领域知识的Deep Web语义标注[J].郑州大学学报（理学版）,2009,41(1):52-55. 被引量：7
2郑淑丽,韩江洪,程文娟,吴永忠.Deep Web查询接口自动识别方法[J].郑州大学学报（理学版）,2009,41(1):56-58. 被引量：1
3李颖,刘国华,佟冰,刘顺江.基于素数的多源模式匹配方法的研究[J].燕山大学学报,2009,33(2):141-145. 被引量：1
4李益民.一种基于关键词的大规模Deep Web信息检索系统[J].图书情报工作,2008,52(10):29-32.
5鲜学丰,方巍,赵朋朋,崔志明,胡鹏昱.一种Deep Web数据源质量评估模型[J].微电子学与计算机,2008,25(10):47-50. 被引量：6
6李益民,魏立新,解成俊.基于用户模式Deep Web检索系统的研究[J].计算机工程与设计,2009,30(3):767-769.
7马安香,张斌,高克宁,齐鹏,张引.基于结果模式的Deep Web数据抽取[J].计算机研究与发展,2009,46(2):280-288. 被引量：15
8李齐会.Deep Web查询接口的判定技术研究[J].计算机与数字工程,2009,37(3):131-134. 被引量：1
9高明,黄哲学.Deep Web研究现状与展望[J].集成技术,2012,1(3):47-54. 被引量：1
10陶磊,莫倩.基于CSS选择器的深网结果页抽取方法[J].北京工商大学学报（自然科学版）,2009,27(2):40-45.

同被引文献22

1崔继馨,张鹏,杨文柱.基于DOM的Web信息抽取[J].河北农业大学学报,2005,28(3):90-93. 被引量：12
2Lu Y Y, He H, Zhao H K, et al. Annotating Structured Data of the Deep Web [ C ]. In: Proceedings of the IEEE 23rd International Conference on Data Engineering. Istanbul : IEEE Computer, 2007 : 376 -385. 被引量：1
3Wang J Y, Lochovsky F H. Data Extraction and Label Assignment for Web Databases [ C ]. In : Proceedings of the 12th International Conference on World Wide Web, Budapest, Hungary. New York, NY,USA: ACM Press, 2003 : 187 - 196. 被引量：1
4Gaelle Hignette, Patrice Buche, Juliette Dibie - Barthelemy, et al. An Ontology- driven Annotation of Date Tables[ M]. Heidelberg: Springer Berlin, 2007 : 29 - 40. 被引量：1
5Goldberg D E. Genetic Algorithms in Search, Optimization,and Machine Learning[ M]. Addison - Wesley, 1989. 被引量：1
6Lin D K. An Information - theoretic Definition of Similarity [ C ]. In: Proceedings of the 15th International Conference on Machine Learning. Madison : ACM Press, 1998 : 296 - 304. 被引量：1
7Van Rijsbergen C J. Information Retrieval[ M]. 2nd Edition. Department of Computer Science, University of Glasgow, 1979. 被引量：1
8He H, Meng W, Yu C, et al. WISE - Integrator:A System for Extracting and Integrating Complex Web Search Interface of the Deep Web [ C ]. In : Proceedings of the 31 st VLDB Conference. Trondheim, Norway:VLDB Press, 2006:1314 - 1317. 被引量：1
9Bright Planet Corp.The deep Web: Surfacing hidden value [EB/OL].[2010-10-20].http://www.completeplanet.com/Tutorials/Deep Web/. 被引量：1
10WANG JIYING,LOCHOVSKY F H. Data extraction and label assignment for Web databases[C] // Proceedings of the 12th International World Wide Web Conference. New York: ACM,2003:187-196. 被引量：1

引证文献3

1张玉连,李帅,周兴林.基于本体的Deep Web自动标注方法研究[J].现代图书情报技术,2009(9):45-50. 被引量：1
2李明,李秀兰.基于结果模式的Deep Web数据标注方法[J].计算机应用,2011,31(7):1733-1736. 被引量：2
3李明,李秀兰.基于启发式信息的Deep Web结果模式获取方法[J].计算机应用研究,2011,28(8):3026-3029.

二级引证文献3

1邹文通,顾颖彦,程培星.基于本体的战术SOA动态安全策略[J].现代电子技术,2018,41(8):155-159. 被引量：1
2何小明.深层网页垂直爬虫技术研究综述[J].电子世界,2018,0(16):42-43.
3孙国敏,张伟,邵怀宗,方旖,李鹏飞.基于低秩张量完备的电磁大数据标注补全算法[J].系统工程与电子技术,2024,46(2):381-390.

1陈书欣,刘滨.垃圾邮件的防范技术[J].科技信息,2006(S1):19-19.
2杜瑞颖,李辉,范东东.一种可认证的匿名查询方案[J].计算机工程,2016,42(5):163-167.
3盛秋艳.基于Internet的自动问答系统研究[J].现代情报,2005,25(4):81-82. 被引量：2
4郑实福,刘挺,秦兵,李生.自动问答综述[J].中文信息学报,2002,16(6):46-52. 被引量：165
5黄为.二维码标签与防伪验证平台如何结合发展[J].标签技术,2016,0(11):29-30.
6鲍建樟,王周秀.基于FAQ的自动问答技术的实现[J].唐山师范学院学报,2006,28(5):79-80.
7张莉,李东生,肖燕峰.基于Web的受限领域问答系统研究及应用[J].电脑开发与应用,2007,20(5):10-12. 被引量：3
8付弦.基于问题库的自动问答智能控制系统研究[J].信息通信,2016,29(2):4-5. 被引量：1
9王泽丽.反馈式信息检索技术浅析[J].科技信息,2008(20):41-42.
10王婧.基于自动问答技术的智能文本机器人[J].科技创业家,2013(8):11-11. 被引量：1

计算机应用

2009年第1期

浏览历史

内容加载中请稍等...

基于多标注源的Deep Web查询结果自动标注被引量：3

参考文献15

二级参考文献69

共引文献157

同被引文献22

引证文献3

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于多标注源的Deep Web查询结果自动标注 被引量：3

参考文献15

二级参考文献69

共引文献157

同被引文献22

引证文献3

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于多标注源的Deep Web查询结果自动标注被引量：3