摘要
近年来电影产业迅猛发展,票房数据直接反映一部电影所带来的收益,也是衡量一部电影数据是否成功的重要指标。因此,分析票房数据变得十分重要,本文从电影类别、区域和年限时长来分析。首先,以爬取电影票房数据为例,借助Python功能完备的标准库、强大的第三方库Requests以及Xpath,编写程序快速实现中国票房网页及豆瓣电影数据的抓取;其次,将爬取的数据通过Pandas数据分析库进行数据清洗与预处理之后存入数据库中;最后,通过Flask+Web框架以图形化的方式直观地展示数据结果,并加以分析,得出相关结论。
In recent years, the film industry has developed rapidly. The box office data directly reflects the income brought by a film, and it is also an important indicator to measure the success of a film data. Therefore, it is very important to analyze the box office data. This paper analyzes the film category, region and duration: firstly, taking crawling film box office data as an example, with the help of Python’s fully functional standard library, powerful third-party library requests and XPath, write a program to quickly capture China’s Box Office Web page and Douban film data;Secondly, the crawled data is stored in the database after data cleaning and preprocessing through pandas data analysis library;Finally, through the flash + web framework, the data results are displayed intuitively in a graphical way, analyzed and drawn relevant conclusions.
作者
谢彦南
杨呈敏
XIE Yannan;YANG Chengmin(Guangdong Vocational College of Environmental Protection Engineering,Guangzhou Guangdong 510655,China)
出处
《信息与电脑》
2021年第23期176-178,共3页
Information & Computer
关键词
数据分析
数据爬虫
数据可视化
data analysis
data crawler
data visualization