基于Hbase的大数据存储系统研究开发
[Abstract]:With the arrival of the big data era, the amount of data stored in the information system database is increasing explosively, and the performance requirements of data reading, writing and querying are becoming more and more high. The traditional relational database can no longer meet the requirements of big data storage and query. In order to explore the storage and query technology of massive data, this paper focuses on the research and development of typical non-relational (NoSQL) database Hbase. Hbase is an open source version of GoogleBigTable, which has the characteristics of high reliability, high performance, column oriented, scalable, consistent, and so on. Support for secondary indexing. A large scale storage cluster can be built on cheap PC Server by using Hbase technology, and the big data storage system can be realized. In this paper, the architecture of big data storage system is studied firstly, and the key technology of Hbase database is discussed. Then the Hbase database system is deployed on the Spark big data platform, and the floating population database is stored. Because the Hbase database only supports the primary key query, we add the secondary index function to the floating population database, which greatly improves the query speed. On this basis, the performance of floating population database based on Hbase is analyzed and evaluated, and the performance of Hbase is tested by YCSB, a testing tool developed by Yahoo Corporation. The test object is a Hbase data table based on the actual data provided by an enterprise. The total number of records is 30 million. Finally, based on Spark big data platform and Hbase database system, a prototype system of massive floating population data management is developed. The system has the functions of data acquisition, data storage, data management, statistical analysis, system management and so on. Among them, 30 million records are stored. The total amount of data reached 12.6 GB, which realized the efficient storage and fast query of massive data of floating population.
【学位授予单位】:西安理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13;TP333
【参考文献】
相关期刊论文 前10条
1 赵越;李培;王震;张声圳;;电网图形数据管理MongoDB数据库的应用[J];计算机系统应用;2017年03期
2 熊安萍;王运萍;邹洋;;基于数据冗余的HBase合并机制研究[J];计算机工程;2017年02期
3 崔丹;史金鑫;;基于Redis实现HBase二级索引的方法[J];软件;2016年11期
4 陈达伦;陈荣国;谢炯;;基于MPP架构的并行空间数据库原型系统的设计与实现[J];地球信息科学学报;2016年02期
5 丁祥武;陈金鑫;王梅;;异构计算平台上列存储系统的并行连接优化策略[J];计算机工程与应用;2017年05期
6 魏文娟;王黎明;;异构Hadoop集群下的比例数据分配策略[J];计算机应用与软件;2015年06期
7 马雁云;;基于HBase分布式档案管理系统研究[J];兰台世界;2015年14期
8 费贤举;王树锋;;基于云环境下的海量大数据存储系统设计[J];计算机测量与控制;2014年07期
9 杜晓东;;大数据环境下基于Hbase的分布式查询优化研究[J];计算机光盘软件与应用;2014年08期
10 薛峰;梁锋;徐书勋;王彪任;;基于Spring MVC框架的Web研究与应用[J];合肥工业大学学报(自然科学版);2012年03期
相关博士学位论文 前1条
1 丁祥武;列存储系统的若干关键技术研究[D];东华大学;2013年
相关硕士学位论文 前5条
1 陆婷;基于HBase的交通流数据实时存储系统的设计与实现[D];北方工业大学;2016年
2 张彬;基于Spark大数据平台日志审计系统的设计与实现[D];山东大学;2015年
3 邱士海;基于分布式存储系统的企业级大数据解决方案的研究与实现[D];吉林大学;2015年
4 关莹莹;基于SSH框架的流动人口管理系统的设计与实现[D];吉林大学;2014年
5 黄晓云;基于HDFS的云存储服务系统研究[D];大连海事大学;2010年
,本文编号:2167265
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2167265.html