基于出租车轨迹数据挖掘的居民出行特征研究
本文选题:出租车 + 轨迹数据挖掘 ; 参考:《长安大学》2017年硕士论文
【摘要】:居民出行行为分析是进行城市综合交通体系规划和城市建设规划十分重要的基础工作,同时也是制定交通政策的有效依据。传统的居民出行数据采集主要靠入户访谈及问卷调查,有着误报率高、费时费力等缺点,已不能满足现代社会的需要。随着地理信息系统(GIS)技术的飞速发展和全球定位系统(GPS)的广泛部署,大量个体的移动轨迹数据被广泛的存储起来,为居民出行行为分析提供了新的思路。论文以西安市12000辆出租车一个月的真实GPS数据集为基础,研究如何从出租车轨迹数据提取出行特征,并将其应用于居民出行行为分析、热门区域发现、区域功能识别等内容。论文的主要工作如下:(1)利用基于云计算的MapReduce并行计算框架对原始GPS数据集进行二次排序和轨迹提取,完成了数据清洗及地图匹配的工作。(2)从GPS数据中提取居民出行的OD信息,设计分时段的平均出行次数、平均出行时长、平均出行距离等多种出行特征,对比分析西安市居民节假日和工作日出行不同的时间规律,并用可视化的方法分析居民出行在空间上的分布情况。(3)提出一种改进的DBSCAN算法的居民出行热门区域发现算法,改善了传统DBSCAN算法对参数敏感,聚类范围无限制的缺点。该算法根据簇的动态近邻密度自适应的选择参数,给定热门区域的面积约束,并对超过面积约束的簇进行分裂,将数量众多的OD点聚类成大小合理的热门区域。(4)提取不同维度的热门区域的人流时序特征,描述区域的人流变化规律与不同区域的社会功能之间的关系。提出了一种结合不确定抽样的半监督分类算法,将之应用于热门区域的社会功能识别上,最终成功将热门区域分为车站、景区、商业区、居民区、学校、娱乐区六大类。实验结果证明,出租车轨迹数据能较好的反映城市居民出行的时空分布规律。改进的DBSCAN算法能聚类出合理面积的居民出行热门区域,避免了传统算法聚类结果面积不受约束的缺点。热门区域的人流特征可以来识别区域的社会功能,且细颗粒的人流时序特征分类效果更好。结合不确定抽样的半监督分类算法只需要对少量的区域进行标注,即可获得较高的分类精度。
[Abstract]:The analysis of residents' travel behavior is a very important basic work for urban comprehensive transportation system planning and urban construction planning, and it is also an effective basis for formulating traffic policies. The traditional data collection of residents travel mainly depends on household interviews and questionnaires, which has the shortcomings of high false alarm rate, time-consuming and laborious, and can not meet the needs of modern society. With the rapid development of GIS (Geographic Information system) technology and the wide deployment of GPS (Global Positioning system), a large number of individual trajectory data are widely stored, which provides a new idea for the analysis of residents' travel behavior. Based on the real GPS data set of 12000 taxis in Xi'an in one month, this paper studies how to extract the travel characteristics from the taxi track data, and applies it to the analysis of residents' travel behavior, the discovery of popular areas and the identification of regional functions. The main work of this paper is as follows: (1) using the MapReduce parallel computing framework based on cloud computing, the original GPS data set is sorted and locus extracted, and the data cleaning and map matching are completed. The OD information of residents travel is extracted from GPS data. Design the average travel times, average travel time, average travel distance and other travel characteristics, compare and analyze the different travel time rules of Xi'an residents on holidays and working days. Using the visualization method to analyze the spatial distribution of resident trip, we propose an improved DBSCAN algorithm for finding popular areas of resident travel, which improves the disadvantage of traditional DBSCAN algorithm, which is sensitive to parameters and unlimited in clustering range. According to the dynamic nearest neighbor density of the cluster, the algorithm adaptively selects the parameters, gives the area constraint of the hot area, and splits the cluster that exceeds the area constraint. A large number of OD points are clustered into hot areas of reasonable size. The time-series features of the popular areas with different dimensions are extracted to describe the relationship between the changing law of the flow of people and the social functions of different regions. A semi-supervised classification algorithm combined with uncertain sampling is proposed, which is applied to the social function recognition of hot areas. Finally, the hot areas are divided into six categories: station, scenic area, commercial district, residential area, school and entertainment area. The experimental results show that the taxi track data can well reflect the spatial and temporal distribution of urban residents' travel. The improved DBSCAN algorithm can cluster the popular area of residents with reasonable area and avoid the disadvantage that the result area of the traditional algorithm is unconstrained. The characteristics of the popular area can identify the social function of the region, and the classification effect of the fine particles is better. The semi-supervised classification algorithm based on uncertain sampling only needs to label a small number of regions to achieve higher classification accuracy.
【学位授予单位】:长安大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:U491;TP311.13
【参考文献】
相关期刊论文 前10条
1 吕绍仟;孟凡荣;袁冠;;基于轨迹结构的移动对象热点区域发现[J];计算机应用;2017年01期
2 卢光跃;刘迪;岳峗;董静怡;;基于电信位置数据的人群活动热点区域识别[J];西安邮电大学学报;2017年01期
3 蔡柳;`u飞;叶敏;康科;赵祥模;;基于不确定抽样的半监督城市土地功能分类方法[J];吉林大学学报(信息科学版);2016年04期
4 陈世莉;陶海燕;李旭亮;卓莉;;基于潜在语义信息的城市功能区识别——广州市浮动车GPS时空数据挖掘[J];地理学报;2016年03期
5 涂山山;陶怀舟;黄永峰;;基于半监督学习的即时语音通信隐藏检测[J];清华大学学报(自然科学版);2015年11期
6 刘建伟;刘媛;罗雄麟;;半监督学习方法[J];计算机学报;2015年08期
7 郭雪婷;秦艳丽;雷震;;基于出租车GPS数据的城市道路拥堵判别[J];交通信息与安全;2013年05期
8 夏英;温海平;张旭;;基于轨迹聚类的热点路径分析方法[J];重庆邮电大学学报(自然科学版);2011年05期
9 袁冠;夏士雄;张磊;周勇;;基于结构相似度的轨迹聚类算法[J];通信学报;2011年09期
10 闫小勇;;人类个体出行行为的统计实证[J];电子科技大学学报;2011年02期
相关硕士学位论文 前2条
1 姚国鑫;城市居民出行调查抽样技术与数据分析研究[D];长安大学;2010年
2 李民;基于活动链的居民出行行为分析[D];吉林大学;2004年
,本文编号:1825548
本文链接:https://www.wllwen.com/kejilunwen/daoluqiaoliang/1825548.html