当前位置:主页 > 科技论文 > 软件论文 >

基于Hadoop和C4.5算法的车联网数据处理系统

发布时间:2018-06-10 00:04

  本文选题:Hadoop + 车联网 ; 参考:《江苏大学》2017年硕士论文


【摘要】:随着国民经济的发展以及我国城市化进程的加快,汽车作为生活必需品开始走进千家万户。现如今汽车上都安装有电子控制单元Electronic Control Unit(ECU),ECU可以采集各种传感数据,如车速,加速踏板开度信号,转速等。这些数据通过车联网传输到数据中心并保存,传感器数据具有数据量大和非结构化特点。这给大数据存储和分析带来了一定的困难,如何对这些数据进行有效地存储和分析成为车联网企业面临的重要挑战之一。云计算和大数据的发展为大量车联网数据的存储和分析提供了契机。论文基于Hadoop大数据处理平台及其生态系统,采用HBase分布式数据库实现对大量的车联网传感数据进行有效地存储;基于MapReduce和优化的C4.5算法对车联网数据进行高效的分析,主要工作如下:1、基于HBase的车联网数据管理系统的设计,采用HBase分布式数据库对传感器采集到的汽车工况参数进行存储,包括数据库的设计;存储与查询数据的接口函数设计;构建二级索引实现多条件查询;与Hive的集成来实现SQL引擎;基于MapReduce实现数据迁移;开发了网页端数据管理系统。2、根据C4.5算法的特点,采用泰勒中值定理对C4.5算法的属性选择度量进行简化,避免对数运算,降低算法计算复杂度,提高算法的效率;基于MapReduce对优化的C4.5算法并行化实现,进一步提高算法的运行效率。对车联网数据进行特征提取,用优化C4.5算法对车辆加速性能分类,生成判断加速性能的决策树分类规则。3、搭建系统平台并对系统进行测试,基于Hadoop和HBase搭建测试平台,对HBase和SQL Server的数据操作性能进行对比测试;测试特征提取的并行化运行效率;通过特征提取后的数据集来验证优化的C4.5算法的效率和准确率。测试结果表明,与SQL Server相比,系统中HBase的读写效率都得到了明显的提高;数字特征提取的运行效率随着集群节点数量的增加而成倍增加;与原C4.5算法相比,在分类准确率没有降低的情况下,优化后的C4.5算法提高了分类的效率。
[Abstract]:With the development of national economy and the acceleration of urbanization in our country, automobile as a necessity of life began to enter thousands of households. Nowadays, electronic control unit ECU is installed on the automobile. It can collect all kinds of sensing data, such as speed, acceleration pedal opening signal, speed and so on. These data are transferred to the data center and stored through the vehicle network. The sensor data is characterized by large amount of data and unstructured data. This brings some difficulties to the storage and analysis of big data. How to store and analyze these data effectively becomes one of the important challenges faced by car networking enterprises. The development of cloud computing and big data provides an opportunity for the storage and analysis of a large number of vehicle network data. Based on Hadoop big data processing platform and its ecosystem, this paper uses HBase distributed database to realize the efficient storage of a large number of vehicle network sensing data, and analyzes the vehicle networking data efficiently based on MapReduce and optimized C4.5 algorithm. The main work is as follows: 1. The design of the vehicle network data management system based on HBASE, using the HBase distributed database to store the parameters of the vehicle working condition collected by the sensor, including the design of the database, the design of the interface function between storing and querying data, and the design of the system. Build secondary index to realize multi-condition query; integrate with Hive to realize SQL engine; realize data migration based on MapReduce; develop web-side data management system .2. according to the characteristics of C4.5 algorithm, Using Taylor mean value theorem to simplify the attribute selection metric of C4.5 algorithm, to avoid logarithmic operation, to reduce the computational complexity of the algorithm, to improve the efficiency of the algorithm, to realize the optimized C4.5 algorithm parallelization based on MapReduce. Further improve the efficiency of the algorithm. The feature extraction of the vehicle network data is carried out, the vehicle acceleration performance is classified by optimized C4.5 algorithm, and the decision tree classification rule .3 is generated to judge the acceleration performance. The system platform is built and tested, and the testing platform is built based on Hadoop and HBase. The data operation performance of HBase and SQL Server is compared and tested; the parallelization efficiency of feature extraction is tested; the efficiency and accuracy of the optimized C4.5 algorithm are verified by the data set after feature extraction. The test results show that compared with SQL Server, the efficiency of reading and writing of HBase in the system has been obviously improved; the efficiency of digital feature extraction has increased exponentially with the increase of the number of cluster nodes; and compared with the original C4.5 algorithm, the efficiency of HBase reading and writing in the system has been greatly improved. The optimized C4.5 algorithm improves the classification efficiency without reducing the classification accuracy.
【学位授予单位】:江苏大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13

【参考文献】

相关期刊论文 前2条

1 马煜;;基于C4.5算法的高校教师评价研究[J];现代计算机(专业版);2017年02期

2 孔尧;曹曦文;叶润武;郭华;王洪初;;铝材B2B电子商务系统设计与实现[J];软件导刊;2016年09期

相关硕士学位论文 前10条

1 刘顺文;基于Hadoop平台的大学生个性化就业推荐系统的构建与研究[D];东华理工大学;2016年

2 陆婷;基于HBase的交通流数据实时存储系统的设计与实现[D];北方工业大学;2016年

3 张静;大数据技术在学生业绩分析中的研究与应用[D];吉林大学;2016年

4 胡亚伟;面向数据分发的车联网RSU放置问题研究[D];中国科学技术大学;2016年

5 刘越甲;车联网路口场景下分簇算法的研究[D];北京交通大学;2016年

6 徐岩;Hadoop中MapReduce的性能优化及可视化工具开发[D];北京交通大学;2016年

7 余大州;基于Hadoop混合存储解决方案的研究[D];吉林大学;2016年

8 孟庆翔;基于HBase的日志异常分析与相关算法研究[D];电子科技大学;2016年

9 杨乐;基于实时流数据平台的车联网数据监控系统[D];电子科技大学;2016年

10 周斌;基于Hadoop的海量工程数据关联规划挖掘方法研究[D];北京交通大学;2016年



本文编号:2001290

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2001290.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户62ba3***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com