分布协作式搜索引擎系统的初步探索被引量：1

Preliminary Study of Distributed Cooperative Search Engine System

下载PDF

导出

摘要针对集中式搜索引擎的瓶颈问题,提出一种既具有集中式搜索引擎优点又解决了其瓶颈门题的分布协作式搜索引擎系统。系统的设计思想是使地理上分散在不同地方的搜索引擎在信息收集与更新上进行协作。讨论了信息收集程序(Crawler)的3种工作方式:封闭式、交叉式和交换式。提出了成组传送和复制热门URL两种方法来降低在交换式工作方式下传送的URL信息频率和信息量。讨论了Web的3种划分方法:URL散列法、站点散列法和分类法。通过模拟实验验证了在封闭式工作方式下当Crawler数量较少时可以得到较好的收集率。验证了站点散列法比URL散列法能显著减少外部链接的数量。验证了成组传送对降低在交换式工作方式下传送URL信息量所起的作用。 Aiming at the problem of the bottleneck of centralized search engine, a system model of distributed cooperative search engine is presented. The main idea was that the search engines in deficient places are made to cooperate each other on information gathering. Three crawling modes, firewall mode, cross - over mode and exchange mode, were discussed. The methods of batch communication and replicating popular URL are presented to reduce URL exchanges in exchange mode. Three schemes, URL- hash based, site - hash based and hierarchical to partition the Web were discussed. The following conclusions are drawn from the experiments, when there is a relatively small number of crawlers, the firewall mode provides good coverage, and the site- hash based partitioning scheme significantly reduces communication overhead compared to the URL - hash based scheme, and batch communication reduces communication overhead in exchange mode.

作者赵新慧朱伟

机构地区辽宁石油化工大学信息工程学院

出处《抚顺石油学院学报》 2003年第4期57-60,共4页 Journal of Fushun Petroleum Institute

关键词分布协作式搜索引擎信息收集 Distributed cooperative Search engine Information gathering

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献7

1沈红芳.互联网搜索引擎及其功能优化模型[J].情报科学,2000,18(1):7-9. 被引量：14
2阳小华.分布式WWW信息收集技术[J].计算机工程与应用,2000,36(5):145-146. 被引量：14
3Liu J, Lei M. Digging for gold on the web: experience with the WebGather[ C]. In proceedings of the 4th International conference, Beijing: IEEE computer society press,2000, 751 -755. 被引量：1
4Bowman,Mic C. The harvest information discovery and access system[C]. In proceedings of the second international World Wide Web conference. Chicago: Distributed environments, 1994,763- 771. 被引量：1
5Pant G, Menczer F. Myspiders: evolve your own intelligent web crawlers[J ]. Autonomous agents and multi - agent systems,2002,5(2) :221 - 229. 被引量：1
6Andrei Broder, Ravi Kumar. Graph structure in the web: experiments and models[C]. In Proceedings of the ninth international World Wide Web conference. Amsterdam, Netherlands:Computer networks,2000, 309- 320. 被引量：1
7Waterhouse S, Doolin D. Distributed search in P2P Networks[J ]. IEEE internet computing, 2002,6(1): 68- 72. 被引量：1

二级参考文献5

1孙巍.中文搜索引擎向何处去[J].计算机世界,1998,. 被引量：1
2孙巍，计算机世界，1998年被引量：1
3曾民族.网络信息检索现状和性能评价[J].情报学报,1997,16(2):90-99. 被引量：164
4阳小华,周龙骧.World Wide Web的索引与查询技术[J].计算机科学,1997,24(6):29-34. 被引量：21
5陈滢,徐宏炳,王能斌.WebCORD:协作式Web资源发现系统模型[J].计算机学报,1998,21(4):381-384. 被引量：6

共引文献24

1鲁明羽,张红,付克明,陆玉昌.Web ME——一个大型网络挖掘环境系统[J].哈尔滨工业大学学报,2004,36(9):1164-1167. 被引量：1
2赵新慧.一种分布协作式WWW搜索引擎模型[J].辽宁石油化工大学学报,2004,24(3):82-85.
3林明霞,罗键.踩点式信息搜索的新方法[J].计算机工程与设计,2005,26(4):1093-1095.
4岳清.浅析搜索引擎的原理及发展前景[J].大众科技,2005,7(5):58-60. 被引量：5
5高光勇,戴春来.WEB数据搜索引擎技术探究[J].九江学院学报（社会科学版）,2007,26(6):12-14.
6姚咏梅.搜索引擎的使用技巧[J].科学大众（智慧教育）,2009(1):149-149.
7李卫疆,赵铁军,朴星海.网络爬行器的分布式设计[J].计算机工程,2009,35(4):105-107.
8李卫疆,赵铁军,朴星海.一种新的面向主题的爬行算法[J].计算机应用研究,2009,26(5):1663-1666. 被引量：5
9谢娟文,秦淑娟,焦爱胜.人工智能在搜索引擎资源获取中的应用[J].机械研究与应用,2009,22(2):121-122. 被引量：1
10赵胜军.Google搜索引擎使用分析[J].时代教育,2010(4):237-237.

同被引文献21

1WANG Y, DEWITT DJ. Computing PageRank in a Distributed Internet Search System[A]. Proceedings of the 30th VLDB Conference[C].Toronto, Canada, 2004. 被引量：1
2MANASKASEMSAK B, RUNGSAWANG A. Parallel PageRank computation on a gigabit PC cluster[A]. Proceedings of the 18th International Conference on Advanced Information Networking and Application[C]. 2004. 被引量：1
3SANKARALINGAM K, SETHUMADHAVAN S, JAMES C, Browne.Distributed PageRank for P2P systems - High Performance Distributed Computing[A]. Proceedings of the 12th IEEE International Symposium[C]. 2003. 被引量：1
4BRIN S, PAGE L. The Anatomy of a Large-Scale Hypertextual Web Search Engine[A]. Proceedings of the 7th International World Wide Web Conference (WWW7)[C].1998. 被引量：1
5HAVELIWALA TH. Efficient Computation of PageRank[A]. Stanford University Technical Reoort[C]. 1999. 被引量：1
6YAMAMOTO A, ASAHARA D, ITAO T, et al. Distributed PageRank: A Distributed Reputation Model for Open Peer-to-Peer Networks[A]. International Symposium[C]. 2004. 被引量：1
7KAMVAR S, HAVELIWALA T, GOLUB G. Adaptive Methods for the Computation of PageRank[EB/OL]. citeseer, ist. psu. edo/kamvar03adaptive, html, 2003. 被引量：1
8KAO B, LEE J, NG CY, et al. Anchor Point Indexing in Web Document Retrieval[A]. IEEE transaction[C]. 2000. 被引量：1
9CAN F, NURAY R, SEVDIK AB. Automatic performance evaluation of Web search engines[D].Department of Computer Engineering, 2003. 被引量：1
10CARAMIAA M, FELICIB G, PEZZOLIC A. Improving search results with data mining in a thematic search engine[A]. Computers & Operations Research[C]. 2004. 被引量：1

引证文献1

1陈再良,凌力,周强.dPageRank——一种改进的分布式PageRank算法[J].计算机应用,2006,26(1):21-24. 被引量：7

二级引证文献7

1蒋卫星,金瓯,张彬.Web搜索算法研究综述[J].计算机技术与发展,2007,17(4):178-181. 被引量：2
2王钟斐,王彪.基于锚文本相似度的PageRank改进算法[J].计算机工程,2010,36(24):258-260. 被引量：14
3王钟斐.一种改进的PageRank算法[J].计算机与数字工程,2011,39(6):8-10. 被引量：1
4周鸿,朱东华,董萍萍.聚类搜索引擎研究进展综述[J].计算机系统应用,2012,21(5):230-235.
5张禹,周翔.结合PageRank算法的Lucene评分机制改进研究[J].三明学院学报,2015,32(4):54-59.
6高智勇,郭城,高建民.流程工业系统信息质量关键控制点的确定方法[J].计算机集成制造系统,2016,22(5):1323-1328. 被引量：2
7李兰英,周秋丽,孔银,董义明.子图估算PageRank网页排序算法研究[J].哈尔滨理工大学学报,2017,22(2):117-123. 被引量：3

1赵新慧.一种分布协作式WWW搜索引擎模型[J].辽宁石油化工大学学报,2004,24(3):82-85.
2徐漫江,曹元大.分布协作式入侵检测系统[J].计算机工程,2005,31(2):146-148. 被引量：6
3王俊生,施运梅,张仰森.基于Hadoop的分布式搜索引擎关键技术[J].北京信息科技大学学报（自然科学版）,2011,26(4):53-56. 被引量：15
4不必注册轻松获得外链图片[J].网友世界,2009(7):29-29.
5王宏艳,马世平.网络电视台的构建及安全[J].现代电视技术,2011(7):131-135. 被引量：1
6朱绍源,郭怀舟,吴怀昆.基于外部链接的通用法兰生成器的研制[J].阀门,2009(2):36-38.
7王水,余光莉,柯新.P2P模式下分布协作式名字解析系统[J].计算机与网络,2002,28(13):51-52. 被引量：1
8吴玉芹,林宏康.LOCEP——一种基于集群系统的负载均衡算法[J].宁德师专学报（自然科学版）,2008,20(4):376-379.
9陈仁际,谈大龙.分布式对象技术在多机器人系统中的应用[J].机器人,1998,20(6):465-470. 被引量：6
10陆颖华,马廷淮,钟水明,曹杰,王新,Abdullah Al-Dhelaane.Improved locality-sensitive hashing method for the approximate nearest neighbor problem[J].Chinese Physics B,2014,23(8):217-225.

抚顺石油学院学报

2003年第4期

浏览历史

内容加载中请稍等...

分布协作式搜索引擎系统的初步探索被引量：1

参考文献7

二级参考文献5

共引文献24

同被引文献21

引证文献1

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

分布协作式搜索引擎系统的初步探索 被引量：1

参考文献7

二级参考文献5

共引文献24

同被引文献21

引证文献1

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

分布协作式搜索引擎系统的初步探索被引量：1