高并发下的全国零售户信息采集和应用

发布时间：2018-08-01 11:32

【摘要】：截止目前企业面向的零售户量已经突破900万,随着信息化和数据化的加深,采集零售户数据和信息也是势在必行的。面对900万的用户量,随之引起问题的是服务器的访问量急剧上升,数据量呈爆炸式增长。以企业目前的服务器等硬件设备已经无法满足高并发、大数据的需求。如果只是增加多台服务器组成服务器集群,提高硬件、软件的性能,来解决大量用户并发请求和大数据处理的问题,这显然并不是很好的办法。本文通过分析和对比传统的数据采集模式和企业业务类型,设计了一种基于移动端的数据采集模式。并详细的分析了该数据采集模型,为了满足企业的要求和用户的需求,针对新型采集模式面对的高并发、大数据处理的问题,进一步分析和研究了负载均衡调度策略,并根据基于高并发的特征和调度算法的特点,在多时间片轮询调度机制的基础上提出了基于任务请求预测的动态调整负载均衡算法。并通过分析多维Markov链和排队论对算法进行进一步的调整和改进,经过实验验证,该算法较好地提高了整体系统的性能和负载。具体研究内容如下:1)本文通过对比传统数据采集方法,根据目前企业的业务类型和数据请求特征,设计了一种基于分布式策略的新型数据采集模型,该模型将APP和微信企业号作为数据采集的入口,并且组建服务器集群对高并发业务进行调度处理。使用基于Hadoop平台的策略创建了数据中心,并对数据进行了分析和处理。其中对全国零售户请求的业务类型和提交的数据性质,对该模型进行了初步优化处理。并通过分析模型系统中的层次结构和技术要点,提出了在高并发环境下,如何高效地调度任务请求和调整服务器节点的负载的问题,并在后文进行了分析和解决。2)深入研究了基于CPU和MEM的调度算法,根据实际任务请求的特征,改进并提出了基于预测机制的负载均衡调度算法。各个服务器节点会收集其它节点的负载情况,并预测网络请求的业务类型和到达率,动态的调整请求的分发,减少请求的等待时间,并缩短服务器的闲置时间,达到资源的有效利用,最终使系统的整体负载达到均衡状态。通过实验验证,该算法在缩短响应时间方面具有良好的性能,并且比基于CPU和MEM的调度算法性能更好。3)针对基于预测机制的负载均衡算法本身存在的不足,利用排队论的知识,对网络服务器方面的负载进行了优化,合理的安排任务请求的等待、处理和挂起。通过分析多维Markov的机制,对后续网络请求的特征和联系进行预测。通过分析多时间片轮询策略,提出了基于预测的Markov排队模型。经过实验的验证和分析,该模型较好的协调各服务器节点的负载状况,合理的分发后续到达的网络请求,减小响应时间。4)根据数据采集模型和负载均衡技术,建立了一套集移动终端数据采集平台、业务处理平台、数据处理平台的三位一体的数据采集系统。并将负载均衡技术应用到数据处理中,加快了服务器处理数据的时间,并展示了数据采集系统设计实现的成果。
[Abstract]:Up to now, the number of retail customers facing the enterprise has exceeded 9 million. With the deepening of information and data, it is imperative to collect data and information of retail customers. Facing 9 million users, the problem is that the amount of access to the server is rising rapidly and the amount of data is exploding. It is obviously not a good way to solve the problem of a large number of concurrent requests and large data processing, which is obviously not a very good solution. This article analyzes and compares the traditional data collection mode and enterprise business by analyzing and comparing the problems of large number of concurrent requests and large data processing. Type, a data acquisition model based on mobile terminal is designed, and the data acquisition model is analyzed in detail. In order to meet the requirements of the enterprise and the needs of the users, the load balancing scheduling strategy is further analyzed and studied for the problems of high concurrency and large data processing in the new collection mode. On the basis of the multi time chip polling scheduling mechanism, the dynamic adjustment load balancing algorithm based on the task request prediction is proposed. The algorithm is further adjusted and improved by analyzing the multidimensional Markov chain and queuing theory. The algorithm improves the performance and negative effect of the whole system. The specific research contents are as follows: 1) in this paper, a new data acquisition model based on the distributed strategy is designed by comparing the traditional data acquisition methods, according to the business types and data request features of the current enterprise. This model takes APP and WeChat enterprise number as the entrance of data collection, and sets up a server cluster for high concurrency services. The data center is created using the strategy based on the Hadoop platform, and the data are analyzed and processed. The model is optimized for the type of business and the nature of the data submitted by the national retail customers, and the high level structure and technical points in the model system are put forward. In the concurrent environment, how to efficiently schedule task requests and adjust the load of server nodes, and analyze and solve.2 in the later text, the scheduling algorithm based on CPU and MEM is deeply studied. According to the characteristics of the actual task request, the load balancing scheduling algorithm based on the prediction machine is improved and proposed. Collect the load of other nodes, predict the type of service and the rate of arrival of the network request, dynamically adjust the request distribution, reduce the waiting time of the request, shorten the idle time of the server, achieve the effective utilization of the resource, and finally make the whole load of the system reach the balance state. The inter aspect has good performance and is better than the scheduling algorithm based on CPU and MEM). In view of the shortcomings of the load balancing algorithm based on the prediction mechanism, using the knowledge of queuing theory, it optimizes the load of the network server and arranges the waiting, processing and hanging up. By analyzing multidimensional Ma. The mechanism of rkov predicts the characteristics and connections of the subsequent network requests. Through the analysis of the multi time slice polling strategy, a prediction based Markov queuing model is proposed. Through the experimental verification and analysis, the model is better to coordinate the load status of each server node, to distribute the following network requests reasonably, and to reduce the response time.4). According to the data collection model and the load balancing technology, a set of data collection system is set up, which sets the data collection platform of the mobile terminal, the business processing platform and the data processing platform. The load balancing technology is applied to the data processing, and the time for the server to deal with the data is speeded up, and the design of the data acquisition system is demonstrated. The present results.
【学位授予单位】：浙江理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：F724.2

【参考文献】