摘要
随着网络规模的日益扩大,海量的信息被"深藏"于各类在线数据库中,用户只能通过查询接口才能获取其中的数据,这部分内容称之为Deep Web;因此对同一领域的Deep Web数据进行集成是非常必要的。查询接口的集成是其中一个非常关键的子问题。查询接口的集成分为模式匹配和模式集成两个步骤;重点研究集成查询接口中属性布局的确定。Deep Web中查询接口数量巨大,以及动态性与异构性的特点给该问题带来了巨大的挑战。将查询接口的结构建模成一棵树,然后通过挖掘频繁的模式子树来构建集成的查询接口树,使其最大化地满足属性间的结构约束和顺序约束。该算法具有较低的时间复杂度,并具有很好的扩展性,对八个领域的查询接口进行集成的实验结果证明了算法的有效性。
With the rapid expansion of the network scale, massive information is hidden in various types of online databases, and the data have to accessed through the query interface, which is called Deep Web. It is very necessary to integrate the same field data in the Deep Web, and query interface integration is one of the key problems. Query interface integration is divided into two steps as pattern matching and pattern integration, the study of how to determine the integrated query interface properties layout was focused on. Deep Web has a great number of query interfaces, and the dynamic and heterogeneous characteristics to this question brought enormous challenge. The query interface structure was modeled as a tree, and then through the mining frequent sub pattern tree the integrated query interface tree was constructed, so that the maximum satisfaction of attributes between structural constraints and sequence constraints could be obtained. The algorithm has low time complexity and well expansibility. The experiment results prove the proposed algorithm is effective in eight areas of the query interface integration.
出处
《科学技术与工程》
北大核心
2014年第18期81-88,93,共9页
Science Technology and Engineering
基金
贵州省联合基金项目(黔科合J字LKQS[2013]29号
黔科合J字LKQS[2013]13号)资助
关键词
频繁结构
查询接口
属性布局
模式子树
查询接口树
frequent structure query interface attribute layout pattern sub tree queryinterface tree