1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文...1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文档的查询难以做到全面而精确。衡量Web文档查询的质量主要有两个方面:①是否能把所有相关的文档资源找出来,不要有所遗漏。展开更多
Recently,we designed a new experimental system MSearch,which is a cross-media meta-search system built on the database of the WikipediaMM task of ImageCLEF 2008.For a meta-search engine,the kernel problem is how to me...Recently,we designed a new experimental system MSearch,which is a cross-media meta-search system built on the database of the WikipediaMM task of ImageCLEF 2008.For a meta-search engine,the kernel problem is how to merge the results from multiple member search engines and provide a more effective rank list.This paper deals with a novel fusion model employing supervised learning.Our fusion model employs ranking SVM in training the fusion weight for each member search engine. We assume the fusion weight of each member search engine as a feature of a result document returned by the meta-search engine. For a returned result document,we first build a feature vector to represent the document,and set the value of each feature as the document's score returned by the corresponding member search engine.Then we construct a training set from the documents returned from the meta-search engine to learn the fusion parameter.Finally,we use the linear fusion model based on the overlap set to merge the results set.Experimental results show that our approach significantly improves the performance of the cross-media meta-search(MSearch) and outperforms many of the existing fusion methods.展开更多
The result merging for multiple Independent Resource Retrieval Systems (IRRSs), which is a key component in developing a meta-search engine, is a difficult problem that still not effectively solved. Most of the existi...The result merging for multiple Independent Resource Retrieval Systems (IRRSs), which is a key component in developing a meta-search engine, is a difficult problem that still not effectively solved. Most of the existing result merging methods, usually suffered a great influence from the usefulness weight of different IRRS results and overlap rate among them. In this paper, we proposed a scheme that being capable of coalescing and optimizing a group of existing multi-sources-retrieval merging results effectively by Discrete Particle Swarm Optimization (DPSO). The experimental results show that the DPSO, not only can overall outperform all the other result merging algorithms it employed, but also has better adaptability in application for unnecessarily taking into account different IRRS's usefulness weight and their overlap rate with respect to a concrete query. Compared to other result merging algorithms it employed, the DPSO's recognition precision can increase nearly 24.6%, while the precision standard deviation for different queries can decrease about 68.3%.展开更多
The following paper provides a new algorithm: a result integration algorithm based on matching strategy. The algorithm extracts the title and the abstract of Web pages, calculates the relevance between the query stri...The following paper provides a new algorithm: a result integration algorithm based on matching strategy. The algorithm extracts the title and the abstract of Web pages, calculates the relevance between the query string and the Web pages, decides the Web pages accepted, rejected and sorts them out in user interfaces. The experiment results in dieate obviously that the new algorithms improve the precision of meta-search engine. This technique is very useful to metasearch engine.展开更多
文摘1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文档的查询难以做到全面而精确。衡量Web文档查询的质量主要有两个方面:①是否能把所有相关的文档资源找出来,不要有所遗漏。
基金Project supported by the National Natural Science Foundation of China(No.60605020)the National High-Tech R&D Program (863) of China(Nos.2006AA01Z320 and 2006AA010105)
文摘Recently,we designed a new experimental system MSearch,which is a cross-media meta-search system built on the database of the WikipediaMM task of ImageCLEF 2008.For a meta-search engine,the kernel problem is how to merge the results from multiple member search engines and provide a more effective rank list.This paper deals with a novel fusion model employing supervised learning.Our fusion model employs ranking SVM in training the fusion weight for each member search engine. We assume the fusion weight of each member search engine as a feature of a result document returned by the meta-search engine. For a returned result document,we first build a feature vector to represent the document,and set the value of each feature as the document's score returned by the corresponding member search engine.Then we construct a training set from the documents returned from the meta-search engine to learn the fusion parameter.Finally,we use the linear fusion model based on the overlap set to merge the results set.Experimental results show that our approach significantly improves the performance of the cross-media meta-search(MSearch) and outperforms many of the existing fusion methods.
基金Supported by the National Natural Science Foundation of China (No. 90818007)
文摘The result merging for multiple Independent Resource Retrieval Systems (IRRSs), which is a key component in developing a meta-search engine, is a difficult problem that still not effectively solved. Most of the existing result merging methods, usually suffered a great influence from the usefulness weight of different IRRS results and overlap rate among them. In this paper, we proposed a scheme that being capable of coalescing and optimizing a group of existing multi-sources-retrieval merging results effectively by Discrete Particle Swarm Optimization (DPSO). The experimental results show that the DPSO, not only can overall outperform all the other result merging algorithms it employed, but also has better adaptability in application for unnecessarily taking into account different IRRS's usefulness weight and their overlap rate with respect to a concrete query. Compared to other result merging algorithms it employed, the DPSO's recognition precision can increase nearly 24.6%, while the precision standard deviation for different queries can decrease about 68.3%.
基金Supported by the Fifteenth Project ,Science Tech-nology Development Plan of Shaanxi Province of China (2000K08-G12)
文摘The following paper provides a new algorithm: a result integration algorithm based on matching strategy. The algorithm extracts the title and the abstract of Web pages, calculates the relevance between the query string and the Web pages, decides the Web pages accepted, rejected and sorts them out in user interfaces. The experiment results in dieate obviously that the new algorithms improve the precision of meta-search engine. This technique is very useful to metasearch engine.