摘要
在大数据时代,在城市复杂交通环境中,实现实时、准确的交通流预测,是实现智能交通系统的必要前提。提出了一种在Spark平台上基于梯度优化决策树的分布式城市交通流预测模型(distributed urban traffic prediction with GBDT,DUTP-GBDT);并提出了分布式情况下梯度优化决策树模型实现的优化方法,包括切分点抽样、特征装箱和逐层训练三种,提高了分布式情况下梯度优化决策树训练效率。基于Spark分布式计算平台高效、可靠、弹性可扩展的优势,以及梯度优化决策树模型准确率较高和时间复杂度较低的优点,利用时间特征、道路状况特征以及天气特征等特征参数,建立了DUTP-GBDT模型,实现了实时、准确的交通流预测。通过与GABP、GA-KNN、MSTAR等模型的对比,证明了利用Spark平台,DUTP-GBDT模型在分布式环境下准确率和训练速度方面均有所提高,符合城市交通流预测系统的各项要求。
In the era of big data and complex urban traffic environment, real-time and accurate traffic flow forecast is a prere- quisite tO implementing intelligent transportation system. This paper presented a distributed urban traffic flow forecasting model which based on gradient optimization decision tree on Spark platform. It also proposed the optimization method of gradient optimization decision tree model in distributed case, which included sampling points, feature packing and layer-by-layer training. All of them could improve the training efficiency of gradient optimization decision tree in distributed case. The characteristics of time ,road condition and weather were established based on the advantages of efficient, reliable and flexible expansibility of Spark distributed computing platform and the advantages of high accuracy and time complexity of gradient optimization decision tree model. The DUTP-GBDT model implemented real-time and accurate traffic flow prediction. Compared with GA-BP, GA- KNN and MSTAR models, the results prove that, the accuracy and training speed of DUTP-GBDT model using Spark platform in the distributed environment are both improved. In line with the requirements of urban traffic flow forecasting system.
出处
《计算机应用研究》
CSCD
北大核心
2018年第2期405-409,416,共6页
Application Research of Computers
基金
赛尔网络下一代互联网技术创新项目(NGII20160306)
广西科技攻关项目(PD160189)
关键词
交通流预测
分布式计算
Spark平台
梯度优化决策树模型
traffic flow forecast
distributed computing
Spark platform
gradient optimization decision tree model