海量数据缓存算法与设计模式的研究及应用

发布时间：2018-07-05 02:11

本文选题：Web应用 + 海量数据　；参考：《浙江大学》2013年硕士论文

【摘要】：随着Internet的飞速发展,证券、银行等大型金融信息系统积累了海量的用户数据,数据量和用户数的剧增,给这些系统带来了很大的压力。如何有效减少用户访问延时,提高系统服务质量是一个迫切需要解决的难题,Web缓存技术可以极大的提高系统的响应速度,然而传统的缓存策略只是着眼于单个用户的访问习惯,没有从全盘考虑缓存性能,或者需要加载所有数据才能判断哪些数据是需要缓存的,对于海量数据来说,加载所有数据是不可能的,因此,本文针对海量数据的缓存策略和如何设计缓存系统,做了以下几个方面的工作： (1)针对海量数据如何剔除那些冗余的数据,只提取对系统有用的核心数据进行缓存,提出了基于规则引擎的数据精简模式(REBDR, Rules Engine-Based Data Reduction design pattern),采用规则引擎作为数据预处理的核心,而不是传统上的将数据预处理逻辑固化在代码中,从而让数据预处理逻辑和应用代码之间获得了很好的分离。数据精简的规则可灵活定制,能够适应不同行业的业务需求,并且可以快速响应业务需求的变动,该模式具有通用性。 (2)对于海量数据,经过数据精简后,也往往无法全部加载到缓存中,因此我们提出了基于装载因子的数据缓存策略,该算法无需加载所有数据到内存中,根据数据的装载因子即能做出是否缓存的策略,通过仿真测试,该缓存策略相比传统的缓存策略,命中率更高,在系统的吞吐量获得成倍增长的同时大幅降低了系统的响应时延。 (3)设计实现了“电子银行风险监控系统”,通过应用REBDR模式和基于装载因子的数据缓存策略,解决了系统最关键的性能问题,该系统拥有很高的灵活性,为国内银行开展电子银行实时风险监控打下了坚实的基础,目前,国内未见有其它满足银行海量交易实时监控性能要求的产品。
[Abstract]:With the rapid development of Internet, large financial information systems such as securities, banks and other financial information systems have accumulated huge amounts of user data, and the amount of data and the number of users have increased dramatically, which has brought great pressure to these systems. How to effectively reduce the delay of user access and improve the quality of service is an urgent problem. Web caching technology can greatly improve the response speed of the system. However, the traditional caching strategy only focuses on the access habits of individual users. It is not possible to load all the data for a large amount of data without considering the overall cache performance, or to load all the data to determine which data needs to be cached. This paper focuses on the cache strategy of mass data and how to design the cache system. (1) aiming at how to remove the redundant data from the massive data, we only extract the core data useful for the system to cache. The rule engine is used as the core of data preprocessing in rule engine based data reduction design pattern), (RDR), rather than the traditional logic of data preprocessing in code. So that the data preprocessing logic and the application code to get a good separation. The rules of data reduction can be flexibly customized, can adapt to the business needs of different industries, and can quickly respond to changes in business requirements. (2) for mass data, after data streamlining, the model is universal. Therefore, we propose a data caching strategy based on load factor. This algorithm does not need to load all data into memory, according to the loading factor of data, we can make the policy of whether to cache or not. The simulation results show that the cache strategy has a higher hit rate than the traditional cache strategy. At the same time, the throughput of the system increases exponentially and the response delay of the system is greatly reduced. (3) the "Electronic Banking risk Monitoring system" is designed and implemented, through the application of REBDR mode and data cache strategy based on loading factor. It has solved the most critical performance problem of the system, and the system has high flexibility, which has laid a solid foundation for domestic banks to carry out real-time risk monitoring of electronic banking. There are no other domestic products that meet the real-time monitoring performance requirements of banks' massive transactions.
【学位授予单位】：浙江大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP333

【参考文献】