基于Key-Value的海量日志存储系统设计
发布时间:2018-04-30 17:51
本文选题:分布式 + KEYVALUE ; 参考:《复旦大学》2013年硕士论文
【摘要】:现行的互联网企业数据中心拥有庞大的计算机集群,而每天产生的日志信息为海量(TB)数据,对于管理人员来说,如此海量的数据的的持久化以及搜索将是一个巨大的问题,基于ACID理论的关系数据库无法满足现行互联网数据管理的需求。而随着NOSQL技术的兴起,NOSQL数据能够很好的实现数据存储的横向和纵向扩展。 基于BASE理论的KEYVALUE数据库选用AP为理论基础,在保证最终一致性的基础上,能够很好的满足可用性和分区容忍性。因为对很多特定的系统来说,绝大部分的检索都是基于主键的查询,因此KEYVALUE数据库存储将会是一个很好的选择,而另一方面,大量互联网数据在管理上并不需要保证强一致性,而只需要保证最终一致性。同时,KEYVALUE型数据库具有极高并发读写性能,能够很好的横向及纵向扩展、能够满足当前互联网应用中的海量数据的存储、管理、搜索、统计,具有高可靠性及高扩展性。 本论文在研究现有流行的KEYVALUE数据库的设计原理、应用环境、特点及应用案例的基础上,结合海量日志型数据的特点,测试现有KEYVALUE数据库插入、搜索及持久化性能,在memlink基础上设计出数据的多级持久化方案,同时在分布式策略上使用ZEROMQ为基础,提出海量日志分布式存储系统架构方案、设计方案、部署策略,整个系统采用C语言编写实现,并最终完成性能测试工作。从最终的测试结果来看,本方案能够很好的降低部署和运行成本,在保证可用性以及最终一致性的前提下,具有极高的性能,即在兼顾效率条件下,本系统完全能够满足现有互联网海量日志的存储、查询、统计等各项需求。
[Abstract]:The current Internet enterprise data center has a huge computer cluster, and the daily log information generated is a huge amount of TB-based data. For managers, the persistence and search of such huge amounts of data will be a huge problem. The relational database based on ACID theory can not meet the demand of current Internet data management. With the rise of NOSQL technology, NOSQL data can achieve the horizontal and vertical expansion of data storage. The KEYVALUE database based on BASE theory is based on AP, which can satisfy the availability and partition tolerance well on the basis of ensuring the final consistency. For many specific systems, most searches are based on primary key queries, so KEYVALUE database storage will be a good choice. On the other hand, there is no need to ensure strong consistency in the management of a large amount of Internet data. You just need to ensure final consistency. At the same time, KEYVALUE database has the extremely high concurrent reading and writing performance, can very good horizontal and vertical expansion, can satisfy the massive data storage, the management, the search, the statistics in the current Internet application, has the high reliability and the high expansibility. On the basis of studying the design principle, application environment, characteristics and application cases of the popular KEYVALUE database, this paper tests the insert, search and persistence performance of the existing KEYVALUE database based on the characteristics of massive log data. The multilevel persistence scheme of data is designed on the basis of memlink. At the same time, on the basis of ZEROMQ, the architecture scheme, design scheme and deployment strategy of distributed storage system for mass log are put forward. The whole system is implemented in C language. And the final performance test work completed. From the final test results, this scheme can reduce the cost of deployment and operation very well, and it has a very high performance under the premise of ensuring availability and final consistency, that is, under the condition of taking into account the efficiency. The system can fully meet the existing Internet massive log storage, query, statistics and other needs.
【学位授予单位】:复旦大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关期刊论文 前1条
1 冯登国;张敏;张妍;徐震;;云计算安全研究[J];软件学报;2011年01期
,本文编号:1825569
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1825569.html