期刊文献+

分布协作式搜索引擎系统的初步探索 被引量:1

Preliminary Study of Distributed Cooperative Search Engine System
下载PDF
导出
摘要 针对集中式搜索引擎的瓶颈问题,提出一种既具有集中式搜索引擎优点又解决了其瓶颈门题的分布协作式搜索引擎系统。系统的设计思想是使地理上分散在不同地方的搜索引擎在信息收集与更新上进行协作。讨论了信息收集程序(Crawler)的3种工作方式:封闭式、交叉式和交换式。提出了成组传送和复制热门URL两种方法来降低在交换式工作方式下传送的URL信息频率和信息量。讨论了Web的3种划分方法:URL散列法、站点散列法和分类法。通过模拟实验验证了在封闭式工作方式下当Crawler数量较少时可以得到较好的收集率。验证了站点散列法比URL散列法能显著减少外部链接的数量。验证了成组传送对降低在交换式工作方式下传送URL信息量所起的作用。 Aiming at the problem of the bottleneck of centralized search engine, a system model of distributed cooperative search engine is presented. The main idea was that the search engines in deficient places are made to cooperate each other on information gathering. Three crawling modes, firewall mode, cross - over mode and exchange mode, were discussed. The methods of batch communication and replicating popular URL are presented to reduce URL exchanges in exchange mode. Three schemes, URL- hash based, site - hash based and hierarchical to partition the Web were discussed. The following conclusions are drawn from the experiments, when there is a relatively small number of crawlers, the firewall mode provides good coverage, and the site- hash based partitioning scheme significantly reduces communication overhead compared to the URL - hash based scheme, and batch communication reduces communication overhead in exchange mode.
作者 赵新慧 朱伟
出处 《抚顺石油学院学报》 2003年第4期57-60,共4页 Journal of Fushun Petroleum Institute
关键词 分布协作式 搜索引擎 信息收集 Distributed cooperative Search engine Information gathering
  • 相关文献

参考文献7

  • 1沈红芳.互联网搜索引擎及其功能优化模型[J].情报科学,2000,18(1):7-9. 被引量:14
  • 2阳小华.分布式WWW信息收集技术[J].计算机工程与应用,2000,36(5):145-146. 被引量:14
  • 3Liu J, Lei M. Digging for gold on the web: experience with the WebGather[ C]. In proceedings of the 4th International conference, Beijing: IEEE computer society press,2000, 751 -755. 被引量:1
  • 4Bowman,Mic C. The harvest information discovery and access system[C]. In proceedings of the second international World Wide Web conference. Chicago: Distributed environments, 1994,763- 771. 被引量:1
  • 5Pant G, Menczer F. Myspiders: evolve your own intelligent web crawlers[J ]. Autonomous agents and multi - agent systems,2002,5(2) :221 - 229. 被引量:1
  • 6Andrei Broder, Ravi Kumar. Graph structure in the web: experiments and models[C]. In Proceedings of the ninth international World Wide Web conference. Amsterdam, Netherlands:Computer networks,2000, 309- 320. 被引量:1
  • 7Waterhouse S, Doolin D. Distributed search in P2P Networks[J ]. IEEE internet computing, 2002,6(1): 68- 72. 被引量:1

二级参考文献5

共引文献24

同被引文献21

  • 1WANG Y, DEWITT DJ. Computing PageRank in a Distributed Internet Search System[A]. Proceedings of the 30th VLDB Conference[C].Toronto, Canada, 2004. 被引量:1
  • 2MANASKASEMSAK B, RUNGSAWANG A. Parallel PageRank computation on a gigabit PC cluster[A]. Proceedings of the 18th International Conference on Advanced Information Networking and Application[C]. 2004. 被引量:1
  • 3SANKARALINGAM K, SETHUMADHAVAN S, JAMES C, Browne.Distributed PageRank for P2P systems - High Performance Distributed Computing[A]. Proceedings of the 12th IEEE International Symposium[C]. 2003. 被引量:1
  • 4BRIN S, PAGE L. The Anatomy of a Large-Scale Hypertextual Web Search Engine[A]. Proceedings of the 7th International World Wide Web Conference (WWW7)[C].1998. 被引量:1
  • 5HAVELIWALA TH. Efficient Computation of PageRank[A]. Stanford University Technical Reoort[C]. 1999. 被引量:1
  • 6YAMAMOTO A, ASAHARA D, ITAO T, et al. Distributed PageRank: A Distributed Reputation Model for Open Peer-to-Peer Networks[A]. International Symposium[C]. 2004. 被引量:1
  • 7KAMVAR S, HAVELIWALA T, GOLUB G. Adaptive Methods for the Computation of PageRank[EB/OL]. citeseer, ist. psu. edo/kamvar03adaptive, html, 2003. 被引量:1
  • 8KAO B, LEE J, NG CY, et al. Anchor Point Indexing in Web Document Retrieval[A]. IEEE transaction[C]. 2000. 被引量:1
  • 9CAN F, NURAY R, SEVDIK AB. Automatic performance evaluation of Web search engines[D].Department of Computer Engineering, 2003. 被引量:1
  • 10CARAMIAA M, FELICIB G, PEZZOLIC A. Improving search results with data mining in a thematic search engine[A]. Computers & Operations Research[C]. 2004. 被引量:1

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部