期刊文献+

网络爬虫实时控制器的设计与实现 被引量:1

Design and Implementation of the Real Time Controller for Web Crawlers
下载PDF
导出
摘要 针对个性化数据采集,提出一个轻量级网络爬虫框架,该框架包括控制器、下载器、解析器、线程池和代理池等组件。在此框架下,设计一个具有实时处理能力的爬虫控制器,能够自动保存和恢复任务场景。详细介绍爬虫控制器的工作原理和C#实现,并将其应用于站内文章采集。实验结果表明:所提出的爬虫框架是高效易用的,控制器的实时处理能力在实际爬虫开发中非常重要。 Aiming at personalized data collection,a lightweight web crawler framework is proposed,which includes components such as controller,downloader,parser,thread pool,and agent pool.Under the above framework,a crawler controller with real-time processing capabilities is designed,which can automatically save and restore task scene.The working principle and C#implementation of the crawler controller are introduced in detail,and it is applied to the collection of articles within a website.The experimental results show that the crawler frame⁃work proposed in this paper is efficient and easy to use,and the controller’s real-time processing ability is very important in the actual crawler development.
作者 李健 张克亮 LI Jian;ZHANG Ke-liang(Luoyang Campus,Information Engineering University,Luoyang 471003)
出处 《现代计算机》 2021年第5期91-96,共6页 Modern Computer
基金 国家自然科学基金重大项目:多语言言语数据的获取、标注和分析研究(No.11590771)。
关键词 网络爬虫 爬虫框架 实时控制器 C# Web Crawler Crawler Frameworks Real Time Controller C#
  • 相关文献

参考文献7

二级参考文献55

  • 1郑冬冬,赵朋朋,崔志明.Deep Web爬虫研究与设计[J].清华大学学报(自然科学版),2005,45(S1):1896-1902. 被引量:28
  • 2张校乾,金玉玲,侯丽波.一种基于Lucene检索引擎的全文数据库的研究与实现[J].现代图书情报技术,2005(2):40-43. 被引量:30
  • 3赫枫龄,左万利.利用超链接信息改进网页爬行器的搜索策略[J].吉林大学学报(信息科学版),2005,23(1):59-63. 被引量:8
  • 4孙彬,王东,李娟.基于XQuery的Deep Web搜索系统的设计与实现[J].科学技术与工程,2007,7(16):4080-4084. 被引量:2
  • 5Hemovici M, Jacovi M, Maarek Y S, et al. The Shark-Search Algorithm: An Application:Tailored Web Site Mapping[ C ]//Proceedings of the7th international World Wide Web 7 conference. Brisbane, Australia, 1998. 被引量:1
  • 6Joson Rennie, Andrew Kachites McCallum. Using reinforcement learning to spider the web efficiently[ C ]//Proceedings of the 16th International Conference on Machine Learning( ICML - 99 ). Bled, Slovenia, 1999:335 - 343. 被引量:1
  • 7Diligenti M, Coetzee F, Lawrence S, et al. Focused crawling using context graphs. Proceedings of the 26th International Conference on Very Large Database ( VLDB2000), 2000:527 - 534. 被引量:1
  • 8Aggaewal C, A1-Garawif Yup. Intelligent crawling on the World Wide Web with arbitrary predicates[ C ]//Proc of the 10th International WoAd Wide Web Conference. HongKong: [ S n] ,2001. 被引量:1
  • 9Maenehea Ehrig. Ontology-focused crawling of Web documents[ C ]//Proc of ACM Symposium on Applied Computing ,2003. 被引量:1
  • 10Chakrabarti S, Punera K, Subramanyam M. Accelerated Focused Crawling through Online Relevance Feedback [ C ]//Proceedings of the 11 th International Conference on World Wide Web, Hawaii, USA ,2002 : 148 - 159. 被引量:1

共引文献185

同被引文献9

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部