基于云计算的海量时空数据存储及挖掘方法的研究和应用
发布时间:2018-06-02 13:14
本文选题:数据挖掘 + 云计算 ; 参考:《杭州电子科技大学》2014年硕士论文
【摘要】:近年来,越来越多的应用程序收集和存储大量时空数据在分布式数据库中,使得时空数据挖掘的需求不断增加。在公安交通管理领域,由于交通流数据急剧增加,加上其数据具有显著的时空特性,,使得在处理海量的时空数据上面临着严重的挑战。针对日益增长的海量数据分析,传统的处理方法在存储空间和计算效率上已不能满足用户需求,需要有支持海量数据存储和分析的平台来适应新的需求。 时空异常探测是时空数据挖掘领域中一个重要分支。本文针对传统处理方法在时空异常探测方面的局限性,设计实现了一个大数据存储及分析平台。主要研究内容和创新如下: (1)本文分析和研究云平台下Hadoop、HBase、Hive及Zookeeper的技术原理,研究了Hadoop框架的HDFS原理及MapReduce编程模型,重点研究了HBase分布式数据库的数据存储架构底层实现原理及HBase表的数据模型。在此基础上,本文构建了基于Hadoop、HBase、Hive及Zookeeper的云平台,并搭建了HBase+Hive系统扩展架构。 (2)对时空异常探测方法进行了深入研究,分析研究了现有的一些时空异常模式,通过挖掘预先定义的时空异常模式得到有价值的知识。提出了基于云平台的四步骤时空异常探测方法(数据预处理、分布式异常探测方法、知识规则应用、结果验证)来挖掘预先定义的时空异常模式,使用交通数据流中的一个真实应用来验证该方法。实验表明该方法具有较高的运行效率和正确性。 (3)研究了HBase行键设计,提出了基于行键的数据模型。在明确设计目标的基础上,利用行键来设计辅助索引表和副本恢复表,实现了一种基于HBase的分布式辅助索引并应用于交通流过车数据应用中。实验表明该索引机制可以高效地实现海量数据的查询。 (4)结合上述的研究内容,本文设计实现了大数据存储及分析平台,包括云平台、后台程序和前台展示系统。将时空异常探测的真实应用集成到该平台中,给用户提供方便操作及结果展示。
[Abstract]:In recent years, more and more applications collect and store a large amount of spatio-temporal data in distributed databases, which makes the demand of spatio-temporal data mining increasing. In the field of public security traffic management, due to the sharp increase of traffic flow data and the remarkable spatio-temporal characteristics of traffic flow data, there are serious challenges in dealing with massive spatio-temporal data. For the growing mass data analysis, the traditional processing methods can not meet the needs of users in terms of storage space and computing efficiency, and need a platform to support mass data storage and analysis to meet the new needs. Spatiotemporal anomaly detection is an important branch of spatiotemporal data mining. In this paper, a big data storage and analysis platform is designed and implemented in view of the limitation of the traditional processing methods in the detection of space-time anomalies. The main research contents and innovations are as follows: 1) this paper analyzes and studies the technical principle of Hadoop HBaseHive and Zookeeper under the cloud platform, studies the HDFS principle and MapReduce programming model of Hadoop framework, and emphatically studies the underlying realization principle of HBase distributed database data storage architecture and the data model of HBase table. On this basis, this paper constructs a cloud platform based on Hadoop HBaseHive and Zookeeper, and builds a HBase Hive system extension architecture. 2) the methods of detecting space-time anomalies are deeply studied, and some existing spatio-temporal anomaly patterns are analyzed and studied, and valuable knowledge is obtained by mining predefined spatio-temporal anomaly patterns. A four-step spatio-temporal anomaly detection method based on cloud platform (data preprocessing, distributed anomaly detection, knowledge rule application and result verification) is proposed to mine predefined spatio-temporal anomaly patterns. Use a real application in traffic data flow to verify the method. Experiments show that the method has high efficiency and correctness. The design of HBase row key is studied, and the data model based on line key is proposed. On the basis of clear design goal, the auxiliary index table and replica recovery table are designed by using row key, and a distributed auxiliary index based on HBase is implemented and applied to traffic passing vehicle data application. Experiments show that the indexing mechanism can efficiently realize the query of massive data. This paper designs and implements big data storage and analysis platform, including cloud platform, background program and foreground display system. The real application of space-time anomaly detection is integrated into the platform to provide users with convenient operation and display of results.
【学位授予单位】:杭州电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP333;TP311.13
【参考文献】
相关期刊论文 前6条
1 舒红;陈军;史文中;;时空数据模型研究综述[J];计算机科学;1998年06期
2 柴晓路;曹晶;施伯乐;;时空信息的层次存储和管理[J];计算机科学;2000年07期
3 王珊;王会举;覃雄派;周p
本文编号:1968888
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1968888.html