摘要
针对携号转网服务背景下电信企业的客户维系和客户精细化服务需求,以及电信运营商对网上不良信息监测和清理的需要,以国内主流社交媒体为主要研究对象,进行网络爬虫策略的研究与实现。文章对基于Python的Scrapy爬虫框架技术进行了分析,设计和实现了基于Scrapy的微博信息采集与分析爬虫,其能按照关键词进行爬取和分析;爬虫使用非关系型数据库MongoDB存储数据、利用Selenium实现模拟登陆、结合Redis数据库存储爬取队列。
In view of the demand of telecommunication enterprise customer maintenance and refined customer service under the background of carrying signal to network service,and telecom operators need to monitor and clean up bad information on the internet,taking domestic mainstream social media as the main research object,this paper studies and implements the strategy of web crawler.This paper analyzes the framework technology of Scrapy crawler based on Python,designs and implements a crawler for microblog information collection and analysis based on Scrapy.The system crawls and analyzes according to keywords,uses non-relational database MongoDB to store data,uses Selenium to realize simulated login,and stores crawling queue with Redis database.
作者
谢钢
XIE Gang(Loudi Branch of China Telecom Co.,Ltd.,Loudi 417000,China)
出处
《现代信息科技》
2020年第14期96-98,共3页
Modern Information Technology