基于Hadoop的海量工程数据处理技术研究
发布时间:2018-07-03 11:56
本文选题:海量工程数据处理 + Hadoop ; 参考:《北京交通大学》2013年硕士论文
【摘要】:摘要:随着工程项目信息化程度越来越高,海量的工程数据随之产生,然而传统的工程数据存储技术无法满足越来越高的数据存储质量的要求。近年来,云计算技术在工业界和学术界共同推动下取得了巨大的发展,大量的云计算系统投入使用。其中,Hadoop平台被广泛用来开发云计算程序。Hadoop最大的优点就是实现了并行化对应用开发者的透明处理,应用开发者可以像开发普通程序一样来开发云计算的应用系统,而集群的并行化则由Hadoop底层自动完成。本文基于Hadoop技术来研究工程领域海量数据的处理问题,主要采用Hadoop的HDFS分布式文件系统以及MapReduce分布式处理模型来支持海量工程数据的存储和处理。 海量工程数据的处理过程主要分为两大部分:海量数据的存储过程与计算分析过程。针对工程领域数据存储的问题,本文分析设计了基于Hadoop的HDFS分布式文件系统的工程数据存储系统。该系统基于Java7文件监控器设计了的文件识别算法,该算法可以快速、准确地监控并识别客户端本地文件目录变化信息。配合基于Quartz的调度器的存储作业调度以及HDFS文件操作API接口的调用,实现了跨平台文件同步功能。经过在云仿真平台存储系统中应用测试,表明该系统具有较好的通用性、高效性以及经济性。所设计的文件同步方法较好地完成了文件同步任务,解决了云仿真平台存储系统中核心的文件同步问题,提供了快速、正确的文件同步功能。 针对海量工程数据计算分析的问题,本文基于Hadoop技术另外一个核心技术MapReduce分布式处理模型,以城市海量噪声数据为应用对象提出了海量数据分析处理模型,为城市社区噪声监测系统提供海量数据处理服务。针对城市噪声数据特点,提出了由四个部分组成的数据处理过程,分别是数据清洗、数据预处理、数据处理以及数据可视化。基于该处理模型,对城市噪声监测采集系统采集到的海量噪声数据进行存储,将存储后的数据进行测试分析处理,实现了移动计算与海量工程数据存储及分析的结合。测试结果表明该分布式处理模型快速、准确、有效地完成了噪声数据处理的任务。 最后对基于Hadoop的海量工程数据的存储与计算分析的应用情况进行了研究成果总结,并对下一步工作进行了展望。
[Abstract]:Absrtact: with the increasing degree of engineering project information, massive engineering data is produced. However, the traditional engineering data storage technology can not meet the requirements of higher and higher quality of data storage. In recent years, cloud computing technology has made great progress under the promotion of industry and academia, and a large number of cloud computing systems have been put into use. Among them, Hadoop platform is widely used to develop cloud computing programs. Hadoop has the greatest advantage of parallelizing the transparent processing of application developers. Application developers can develop cloud computing applications like common programs. The parallelization of cluster is accomplished automatically by Hadoop bottom layer. Based on Hadoop technology, this paper studies the problem of mass data processing in engineering field. Hadoop's HDFS distributed file system and MapReduce distributed processing model are used to support the storage and processing of mass engineering data. The processing process of mass engineering data is divided into two parts: the stored process of mass data and the process of calculation and analysis. Aiming at the problem of data storage in engineering field, this paper analyzes and designs the engineering data storage system of HDFS distributed file system based on Hadoop. The system is based on the file recognition algorithm designed by Java 7 file monitor. The algorithm can quickly and accurately monitor and recognize the local file directory change information on the client side. Combined with the storage job scheduling of Quartz based scheduler and the call of HDFS file operation API interface, the function of file synchronization across platforms is realized. Through the application test in the cloud simulation platform storage system, it shows that the system has good generality, high efficiency and economy. The designed method of file synchronization completes the task of file synchronization, solves the problem of file synchronization in the storage system of cloud simulation platform, and provides a fast and correct function of file synchronization. Aiming at the problem of computing and analyzing mass engineering data, based on another core technology of Hadoop technology, MapReduce distributed processing model, this paper proposes a mass data analysis and processing model based on urban mass noise data as the application object. To provide massive data processing services for urban community noise monitoring system. According to the characteristics of urban noise data, a data processing process consisting of four parts is proposed, which is data cleaning, data preprocessing, data processing and data visualization. Based on the model, the mass noise data collected by the urban noise monitoring and acquisition system are stored, and the stored data are tested and analyzed. The combination of mobile computing and mass engineering data storage and analysis is realized. The test results show that the distributed processing model is fast, accurate and effective in noise data processing. Finally, the application of Hadoop based massive engineering data storage and computing analysis is summarized, and the future work is prospected.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
中国期刊全文数据库 前10条
1 李s,
本文编号:2093554
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2093554.html