摘要
面对招聘网站发布的海量招聘数据,为了利用技术手段从招聘网站采集招聘数据,本文基于Python语言设计爬虫采集技术并实现了面向猎聘、Boss、拉钩等招聘类网站的数据采集,完成了对全部招聘信息及其详情页面的数据爬取。本文采用Scrapy框架实现对定制网站内容的爬取,并采用图像识别技术解决了爬取过程中遇到的验证码问题,最终成功获取50000余条数据。
Facing the massive recruitment data published by recruitment websites,in order to collect recruitment data from recruitment websites by technical means,this paper designs crawler collection technology based on Python language,and realizes data collection for recruitment websites such as Liepin,boss and hook,and crawls all recruitment information and its detailed pages.In this paper,Scrapy framework is used to crawl the content of customized website,and image recognition technology is used to solve the verification code problem encountered in crawling process,and finally more than 50000 pieces of data are successfully obtained.
作者
孙暖
曹小平
刘军
Sun Nuan;Cao Xiaoping;Liu Jun(Chongqing Creation Vocational College,Chongqing 402160,China)
出处
《信息与电脑》
2020年第18期161-163,共3页
Information & Computer
基金
重庆市高等教育教学改革研究项目(项目编号:202182)。