期刊文献+

一个增量搜集中国W eb的系统模型及其实现 被引量:7

System model of incremental spider for the Chinese web and its implementation
原文传递
导出
摘要 针对中国W eb的高效增量搜集,设计试验考察了网页的短期变化规律,估算出增量搜集需要的最小搜集能力。提出一个通用的增量式搜集系统模型和它的性能准则,该模型阐明了增量搜集的运行原理。针对该模型,结合北大天网增量搜集系统的开发经验,讨论了它的性能瓶颈并给出解决方案。对增量搜集的两类目标——变化网页和新网页,探讨了相应的搜集策略。介绍了该模型的实现和性能状况。该文的工作为增量搜集系统的设计和实现提供了一个成功的模型。 This paper is aimed at efficient incremental information collection from the Chinese web. The experiments were first designed and performed to inspect how pages were evolved in a short period. Based on the results, a general system model was established for incremental spiders. Then the latent performance bottle-necks in implementation were deeply analyzed, with corresponding solutions supplied. Besides, two particular approaches were put forward to efficiently collect updated or newly-born pages in this mo...
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2005年第S1期1882-1886,共5页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金重点资助项目(60435020) 教育部博士点基金项目(20030001076)
关键词 增量式 网页搜集 系统模型 中国Web 实现策略 incremental spider web crawling system model the Chinese web implementation strategies
  • 相关文献

参考文献8

  • 1Broder A Z,Najork M,Janet L,et al.Efficient URL caching for world wide web crawling[].Proc th Int World Wide Web Conference.2003 被引量:1
  • 2Cho J,Garcia-Molina H.Estimating frequency of change[].A CM Transactions on Internet Technology.2003 被引量:1
  • 3Brian E Brewington,george cybenko.How dynamic is the web?[].Proc th Int World Wide Web Conference.2000 被引量:1
  • 4Bharat K,Broder A,Dean J,et al.A comparison of techniques to find mirrored hosts on the WWW[].J ournal of the American Society for Information Science.2000 被引量:1
  • 5Jenny Edwards,Kevin McCurley,John Tomlin.An adaptive model for optimizing performance of an incremental web crawler[].Proc th International World Wide Web Conference.2002 被引量:1
  • 6Cho J,Garcia-Molina H.The evolution of the web and implications for an incremental crawler[].Proc of th International Conference on Very Large Databases.2000 被引量:1
  • 7MENG Tao,YAN Hongfei,WANG Jimin,et al.The evolution of link-attributes for pages and its implications on web crawling[].In the Proceedings of the IEEE WIC ACM International Conference on Web Intelligence.2004 被引量:1
  • 8Cho J,Garcia-Molina H.Synchronizing a database to improve freshness[].Proc of the ACM SIGMOD International Conference on Management of Data.2000 被引量:1

同被引文献43

引证文献7

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部