期刊文献+

基于Docker容器的分布式爬虫的设计与实现 被引量:5

Design and implementation of distributed crawler based on Docker container
下载PDF
导出
摘要 随着互联网的快速发展,信息量也随之快速增长。为了快速地获取特定的有效信息,通过对开源爬虫框架Scrapy的学习研究,结合Redis数据库和MongoDB数据库,设计并实现了一个分布式网络爬虫系统。此次针对58同城租房信息进行爬取,网页数据存入MongoDB数据库,网页链接存入Redis数据库,着重对反爬虫问题进行处理优化,并使用Docker容器对传统部署环境进行了改造。运行结果表明,基于Docker的分布式爬虫系统比基于VM的分布式爬虫系统运行效率更高更稳定。 With the rapid development of the Internet,the amount of information has also grown rapidly.In order to quickly obtain specific and effective information,this paper designs and implements a distributed web crawler system by studying the open source crawler framework Scrapy,combining Redis database and MongoDB database. This time,we crawled the 58 city rent information,and the webpage data was stored in the MongoDB database. The webpage link was stored in the Redis database,focusing on the optimization of the anti-crawl problem,and the traditional deployment environment was modified by using the Docker container. The running results show that the Docker-based distributed crawler system runs more efficiently and stably than the VM-based distributed crawler system.
作者 方奇洲 程友清 FANG Qi zhou;CHENG You qing(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074,China;FiberHome Telecommunication Technologies Co.,Ltd,Wuhan 430074,China)
出处 《电子设计工程》 2020年第8期61-65,共5页 Electronic Design Engineering
关键词 计算机软件 分布式爬虫 Scrapy DOCKER computer software distributed crawler Scrapy Docker
  • 相关文献

参考文献14

二级参考文献105

共引文献201

同被引文献39

引证文献5

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部