基于机器学习技术的交通流预测模型研究与实现

发布时间：2018-03-05 22:22

本文选题：交通流预测　切入点：机器学习　出处：《西南交通大学》2017年硕士论文　论文类型：学位论文

【摘要】：经济的高速发展,城市化水平的不断提高,在改善人民生活质量的同时,也随之带来了严重的交通拥堵问题,如何利用城市的历史交通流量,对未来的交通状况进行快速而精准的预测,是智能交通领域一大重要的研究课题。传统的处理交通流预测问题的方法可以分为基于数学模型的方法(如卡尔曼滤波模型、时间序列模型等)和无数学模型的方法(如神经网络模型、非参数回归模型等)。然而,传统的方法在应对变化日益复杂的交通流数据上,已经表现出了一定的局限性,这主要表现为:(1)在应对非线性问题上,许多算法存在局限性;(2)交通流的非平稳特性,大大影响着模型的预测精度;(3)大量样本所带来的对于效率的挑战。近年来,随着数据挖掘、机器学习等以数据为导向的技术的兴起,对于交通流预测的研究越来越多地与以上算法结合,这带来了预测精度的大大提升。论文以美国加州交通局Pems数据集作为实验数据,首先,针对交通流的非平稳特性,提出基于DBSCAN算法与最优分割算法结合的双阶段有序聚类模型,实现了在缺少先验知识的条件下,以更小开销对有序样本的聚类,并在实验数据上证明了聚类结果的合理性;在有序聚类模型的基础上,提出基于时间分段的支持向量机模型,以拟合优度作为指标,证明了该模型能够达到理想的回归精度;论文还提出基于历史数据加权的交通流序列生成模型,该模型利用基于时间分段的支持向量机模型来进行参考值的生成,从而将生成的参考值与历史数据进行加权,并通过迭代上述过程,生成交通流序列,并在与真实序列的比较中,证明了该模型所生成序列的精度;最后,论文引入标签传播算法,将实验数据中的各个采样时刻点,根据其对应特征分为上升点、下降点、平稳点三类模式。在此分类结果的基础上,引入随机森林模型,以实时的交通流序列作为输入,识别其对应的交通变化模式。该模型在主要的性能指标上,都达到了理想的效果。
[Abstract]:With the rapid development of economy and the continuous improvement of urbanization level, while improving the quality of life of the people, it also brings the serious traffic congestion problem, how to make use of the historical traffic flow of the city, Rapid and accurate prediction of future traffic conditions is an important research topic in the field of intelligent transportation. Traditional methods to deal with traffic flow forecasting problems can be divided into mathematical model-based methods (such as Kalman filter model). Time series models and methods without mathematical models (such as neural network models, non-parametric regression models, etc.). However, traditional methods have shown some limitations in dealing with increasingly complex traffic flow data. This is mainly shown as: 1) in dealing with nonlinear problems, many algorithms have limitations on the non-stationary characteristics of traffic flow, which greatly affect the prediction accuracy of the model and the efficiency challenge brought by a large number of samples. In recent years, with the data mining, With the rise of data-oriented technology such as machine learning, more and more research on traffic flow prediction is combined with the above algorithms, which brings a great improvement of prediction accuracy. This paper takes the Pems data set of California Transportation Bureau as experimental data. Firstly, aiming at the non-stationary characteristics of traffic flow, a two-stage ordered clustering model based on the combination of DBSCAN algorithm and optimal segmentation algorithm is proposed, which realizes the clustering of ordered samples with less cost under the condition of lack of prior knowledge. On the basis of the experimental data, the rationality of the clustering results is proved, and on the basis of the ordered clustering model, the support vector machine model based on time segmentation is proposed, which takes the goodness of fit as the index, and proves that the model can achieve the ideal regression accuracy. This paper also proposes a traffic flow sequence generation model based on historical data weighting. The model uses the support vector machine model based on time segmentation to generate reference value, thus weighting the generated reference value with historical data. By iterating the above process, the traffic flow sequence is generated, and the accuracy of the sequence generated by the model is proved in the comparison with the real sequence. Finally, the paper introduces the label propagation algorithm to sample each sampling time point in the experimental data. According to its corresponding characteristics, it can be divided into three models: ascending point, descending point and stationary point. On the basis of the classification results, the stochastic forest model is introduced, and the real-time traffic flow sequence is used as the input. The corresponding traffic change patterns are identified and the model achieves ideal results in terms of main performance indicators.
【学位授予单位】：西南交通大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP181;U491.14

【参考文献】