基于MapReduce的城市交通出行分布异常检测和分析
发布时间:2019-03-20 14:22
【摘要】:随着时空轨迹数据挖掘的快速发展,轨迹数据异常值检测已成为数据挖掘领域的研究热点。传统的异常检测方法在检测轨迹数据异常值时很多都基于欧式空间环境,将异常值表示为远离大部分一定距离的点。但在交通事件应急响应等方面的实际应用中,交通出行分布异常的检测主要通过交通流量的变化进行判断,对传统异常检测算法中采用的欧式距离来度量异常的方法不再适用。此外,交通轨迹数据量庞大,使用传统的、单机运行的异常检测方法运行效率也较低。在本文中,利用MapReduce分布式并行计算框架,提出了一种基于MapReduce的分布式并行城市交通出行分布异常检测和分析算法。具体工作如下: (1)为了能更好的描述交通出行分布情况,本文提出了一种基于小区交通流量的城市交通出行分布模型。该模型较为简单且容易理解,能够从宏观上展现整个城市的交通出行分布状况。 (2)针对交通出行分布异常检测问题,本文结合交通领域知识,在城市交通流量分布模型的基础之上提出了基于小区交通流量的交通出行分布异常定义,并给出了形式化的表示方法。 (3)在上述工作基础之上,本文提出了一种基于MapReduce的分布式并行交通出行分布异常检测和分析算法(MapReduce-Based Distributed ParallelTransportation Distribution Outliers Detection And Analysis Algorithm,简称MDPTDODA)。该算法首先对出租车轨迹数据进行预处理,然后从出租车轨迹数据中提取经过小区之间的交通流量并建立基于小区交通流量的城市交通出行分布模型。最后整合该分布模型中连续多天的交通流量,构建时间序列集,通过DBSCAN聚类算法和动态时间扭曲距离(Dynamic Time Warping,,简称DTW)进行交通出行分布异常检测,并根据异常之间的关系分析异常引起的可能原因。 本文以北京市出租车历史轨迹数据作为原始数据,在单机多核环境和基于Hadoop的集群环境下分别对试验算法的单机版本和分布式并行版本进行了实验,证明了本文提出的MDPTDODA算法在分析处理大量轨迹数据时的高效性。同时,本文将实验结果与历史实际情况进行了对比,结果表明该方法在异常的检测和分析方面是有效的。
[Abstract]:With the rapid development of spatial-temporal trajectory data mining, anomaly detection of trajectory data has become a hot topic in the field of data mining. Many of the traditional anomaly detection methods are based on the European space environment when detecting the outliers of trajectory data. The outliers are represented as points far away from most of the distance. However, in the practical application of traffic emergency response, the detection of traffic trip distribution anomaly is mainly judged by the change of traffic flow, and the Euclidean distance used in the traditional anomaly detection algorithm is no longer applicable. In addition, the large amount of traffic trajectory data, the use of traditional, single-machine anomaly detection method is also low operating efficiency. In this paper, using MapReduce distributed parallel computing framework, a distributed parallel urban traffic trip anomaly detection and analysis algorithm based on MapReduce is proposed. The specific work is as follows: (1) in order to better describe the traffic travel distribution, this paper proposes a model of urban traffic trip distribution based on community traffic flow. The model is simple and easy to understand, and can show the distribution of traffic travel in the whole city macroscopically. (2) aiming at the problem of abnormal detection of traffic travel distribution, this paper puts forward the definition of traffic trip distribution anomaly based on community traffic flow based on the urban traffic flow distribution model, which is based on the knowledge of traffic field and the urban traffic flow distribution model. A formal representation method is given. (3) on the basis of the above work, this paper proposes a distributed parallel traffic trip distribution anomaly detection and analysis algorithm based on MapReduce (MDPTDODA). For short). The algorithm firstly preprocesses the taxi track data, then extracts the traffic flow from the taxi track data and establishes the urban traffic travel distribution model based on the cell traffic flow. Finally, the traffic flow in the distribution model is integrated for many days, and the time series set is constructed. The traffic trip distribution anomaly detection is carried out by DBSCAN clustering algorithm and dynamic time distortion distance (Dynamic Time Warping, (DTW). According to the relationship between the anomalies, the possible causes of the anomalies are analyzed. Taking the historical track data of Beijing taxi as the original data, this paper makes experiments on the single-machine version and the distributed parallel version of the experimental algorithm in the single-machine multi-core environment and the Hadoop-based cluster environment, respectively. It is proved that the proposed MDPTDODA algorithm is efficient in analyzing and processing a large number of trajectory data. At the same time, the experimental results are compared with the actual situation in history, and the results show that the method is effective in anomaly detection and analysis.
【学位授予单位】:北京工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08;TP311.13
本文编号:2444295
[Abstract]:With the rapid development of spatial-temporal trajectory data mining, anomaly detection of trajectory data has become a hot topic in the field of data mining. Many of the traditional anomaly detection methods are based on the European space environment when detecting the outliers of trajectory data. The outliers are represented as points far away from most of the distance. However, in the practical application of traffic emergency response, the detection of traffic trip distribution anomaly is mainly judged by the change of traffic flow, and the Euclidean distance used in the traditional anomaly detection algorithm is no longer applicable. In addition, the large amount of traffic trajectory data, the use of traditional, single-machine anomaly detection method is also low operating efficiency. In this paper, using MapReduce distributed parallel computing framework, a distributed parallel urban traffic trip anomaly detection and analysis algorithm based on MapReduce is proposed. The specific work is as follows: (1) in order to better describe the traffic travel distribution, this paper proposes a model of urban traffic trip distribution based on community traffic flow. The model is simple and easy to understand, and can show the distribution of traffic travel in the whole city macroscopically. (2) aiming at the problem of abnormal detection of traffic travel distribution, this paper puts forward the definition of traffic trip distribution anomaly based on community traffic flow based on the urban traffic flow distribution model, which is based on the knowledge of traffic field and the urban traffic flow distribution model. A formal representation method is given. (3) on the basis of the above work, this paper proposes a distributed parallel traffic trip distribution anomaly detection and analysis algorithm based on MapReduce (MDPTDODA). For short). The algorithm firstly preprocesses the taxi track data, then extracts the traffic flow from the taxi track data and establishes the urban traffic travel distribution model based on the cell traffic flow. Finally, the traffic flow in the distribution model is integrated for many days, and the time series set is constructed. The traffic trip distribution anomaly detection is carried out by DBSCAN clustering algorithm and dynamic time distortion distance (Dynamic Time Warping, (DTW). According to the relationship between the anomalies, the possible causes of the anomalies are analyzed. Taking the historical track data of Beijing taxi as the original data, this paper makes experiments on the single-machine version and the distributed parallel version of the experimental algorithm in the single-machine multi-core environment and the Hadoop-based cluster environment, respectively. It is proved that the proposed MDPTDODA algorithm is efficient in analyzing and processing a large number of trajectory data. At the same time, the experimental results are compared with the actual situation in history, and the results show that the method is effective in anomaly detection and analysis.
【学位授予单位】:北京工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08;TP311.13
【参考文献】
相关期刊论文 前2条
1 陈森发,周振国,于栋华;一种动态OD矩阵估计算法的理论及应用[J];东南大学学报(自然科学版);2003年01期
2 姜桂艳;常安德;李琦;伊峰;;基于出租车GPS数据的路段平均速度估计模型[J];西南交通大学学报;2011年04期
本文编号:2444295
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2444295.html