期刊文献+

搜索引擎中信息动态采集策略的研究 被引量:7

Dynamic Refresh Strategy for Crawler in Search Engine
下载PDF
导出
摘要 为了能及时采集到有关网页信息,搜索引擎应根据相应网站及其更新速度,动态调整其信息采集的频度.本文就模型化网页更新过程以及根据相关性动态调整搜索引擎的信息采集频度进行了探讨.一方面使用泊松过程来描述网页更新并分析了搜索引擎如何有效完成信息采集;另一方面采用基于网页从属关系和内容分析的相关性来调节该过程,使得在进行信息采集与数据更新时的针对性更强.实验表明了该方法的有效性. As for a search engine, keeping up with the evolving Web is necessary. We concern about modeling on an effective Web page collecting policy and propose an adaptive refresh strategy based on the relevance, which is used to adjust the process. On one hand, we think the refresh behavior follows the properties of the Poisson process and analyze the strategy on how to crawl the Web effectively. Further, the relevance is on the basis of the affiliation detecting and the contents analysis. It is used to adjust the process. This makes the process more targeted. The experimental results validate the feasibility of the approach.
作者 高凯
出处 《电子学报》 EI CAS CSCD 北大核心 2007年第10期1984-1988,共5页 Acta Electronica Sinica
关键词 搜索引擎 数据下载器 网页更新 泊松过程 相关性 search engine crawler refresh Poisson process relevance
  • 相关文献

参考文献14

  • 1Ricardo Baeza-Yates, Berthier Ribeiro-Neto. Modem Information Retrieval[M]. USA: Addison Wesley & ACM Press, 1999. 367 - 395. 被引量:1
  • 2http://www. cnnic, net. cn/index/OE/OO/11/index, htm [OL]. 2007. 被引量:1
  • 3P M E De Bra, R D J Post. Searching for arbitrary information in the www:the fish search for mosaic[A]. Proceedings of the 2nd World Wide Web Conference[C]. USA, 1994. 被引量:1
  • 4M Hersovici, M Jacobi, Y S Maarek, D Pelleg, M Shtahaim, S Ur. The shark-search algorithm, an application:tailored web site mapping[A]. Proceedings of 7^th World Wide Web Conference [C]. Australia, 1998.317 - 326. 被引量:1
  • 5Y S Maarek,M Jacobi,M Shtalhaim, S U D Zernik, I Z Ben Shaul. WebCutter: a system for dynamic and tailorable site mapping[A]. Proceedings of 6^th World Wide Web Conference [C]. USA, 1997.713 - 722. 被引量:1
  • 6S Flesca, E Masciari. Efficient and effective web change detection[J]. Journal of Data & Knowledge Engineering, 2003,46 (2):203 - 224. 被引量:1
  • 7Jinghoo Cho. Crawling the Web Discovery and Maintenance of Large-Scale Web Data [D]. Dissertation for the PhD of Stanford University,2001.57 - 70. 被引量:1
  • 8Brian E Brewington, George Cybenko. How dynamic is the web [J]. Journal of Computer Network, 2000, 33 (1-6): 257-276. 被引量:1
  • 9Alexandros Ntoulas, Junghoo Cho, Christopher Olston. What' s new on the web the evolution of the web from a search engine perspective[A]. Proceedings of the 13^th World Wide Web Conference[C]. USA: ACM Press ,2004.1 - 12. 被引量:1
  • 10Brian E Brewington, George Cybenko. Keeping up with the changing web[J]. Computer, 2000,33 (5) : 52 - 58. 被引量:1

同被引文献61

引证文献7

二级引证文献78

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部