摘要
从互联网里采集数据是解决数据来源问题的关键,研究开发基于Python网络爬虫技术的数据采集系统实现了主题数据的自动采集。利用urllib、Beautiful Soup、threading库设计开发了包含数据爬取、异常处理、robots协议管理及多线程管理等模块的系统模型框架。通过具体案例应用,介绍了数据采集过程,相比传统手工数据采集,较大提高了工作效率。
Collecting data from the Internet is the key to solve the problem of data source,The research of data collection system based on python web crawler,which is realizes automatic collection subject data.The system model framework including data crawling,exception handling,robots protocol management and multithreading management is designed and which is using urllib,beautiful soup and threading libraries.The process of data collection is introduced through the application of specific cases.Compared with the traditional manual data collection,there is greatly improved work efficiency.
作者
钟机灵
Zhong Jiling(Heyuan Polytechnic,Guangdong Heyuan 517000)
出处
《信息通信》
2020年第4期96-98,共3页
Information & Communications
基金
广东省学校德育科研课题(项目编号:2019GXSZ106)
河源市社会发展科技计划项目(项目编号:180703230222407)。