摘要
Deep Web查询结果的语义标注,是Deep Web数据集成的关键问题之一。提出基于多标注源的Deep Web查询结果自动标注框架,根据不同的特征设计多个标注器。基于搜索引擎的标注器,扩展AI领域常用的问答技术,构造验证查询并提交到搜索引擎,利用返回结果选择最合适的词汇用于标注,有效提高了标注的查准率和查全率。多个领域Web数据库的测试证明了该方法的有效性。
A large number of data on the World Wide Web are hidden behind form-like interfaces. These interfaces interact with a hidden backend database to provide answers to users' queries. But results returned by Web databases seldom have proper annotations, so it is necessary to assign meaningful labels to them. A framework of multi-source automatic annotation that used multi-annotator to annotate results from different aspects was proposed, especially searching engine-based annotator constructs validate queries and posting them to the search engine. It found the most appropriate terms to annotate the data units by calculating the similarities between terms and instances. Information for annotating can be acquired automatically without the support of domain ontology. Experiments over four real world domains indicate that the proposed approach is highly effective.
出处
《计算机应用》
CSCD
北大核心
2009年第1期196-200,共5页
journal of Computer Applications
基金
国家973计划项目(2007CB310801)
教育部重点项目(1070072)
高等学校学科创新引智计划项目(B07037)