摘要
本文选取了2015年上映的200部国产电影作为实验数据,将电影票房作为因变量分为8个类别,从用户期待程度、影片自身影响程度、同期竞争力以及基本面四个方向选取一些变量,运用数据挖掘中的C5.0决策树算法构建出了一个票房预测模型。同时对选取的电影票房影响因子进行数据分析,得出不同影响因子对电影票房的预测重要程度,发现百度指数与电影票房之间有着很大的关联度。同时使用了多元Logistic回归分析、贝叶斯网络以及CHAID树方法进行实验,最终发现C5.0决策树效果最好。
In this paper,200 Chinese films released in 2015 are selected as experimental data and their box office is divided into 8 categories as dependent variables. With the help of C5.0 decision tree algorithm in DM(data mining)and several variables form users' expectations,films' influences,concurrent competitiveness and fundamentals,box-office prediction models are built. Meanwhile,the factors that make contributions to the chosen box office are evaluated by utilizing data analysis to see their predicted significances and profound relevancy between Baidu index and the box office. Ultimately,C5.0 decision tree proves to be the best after experiments in virtue of multiple logistic regression analysis,Bayesian network and CHAID tree approach.
出处
《科技广场》
2016年第4期186-192,共7页
Science Mosaic
关键词
票房
预测
决策树
百度指数
Box-Office
Forecasting
Decision Tree
Baidu Index