面向SLA惩罚成本最小化的多租户数据查询优化研究
发布时间:2018-04-24 04:16
本文选题:多租户 + 数据管理 ; 参考:《山东大学》2016年博士论文
【摘要】:软件即服务(SaaS)是云计算的一种重要的应用交付形式,被服务提供商广泛采用,且已经成为中小企业使用先进软件技术的重要渠道。SaaS模式下,成熟的服务运营商一般采用单实例多租赁的方式,启动一个应用实例为众多租户提供有共性的服务,这种应用被称为多租户应用。服务提供商将多租户应用部署在云中,供租户以按需付费的方式来租赁这些应用。服务提供商根据租户的需求及支付能力,提供不同服务质量的SaaS应用以备不同租户所租赁。租户为确保获得稳定服务质量,与SaaS提供商签订服务水平协议(SLA,Service-Level Agreement)。查询响应时间是SLA中重要的性能指标,若查询响应时间超出了SLA规定的截止时间,租户则无法及时得到数据查询结果,导致较低的SaaS体验。当查询响应时间违背服务水平目标时,服务提供商需须根据签订的SLA向租户支付一定的罚金。服务提供商根据每个租户的需求及支付的应用租赁费用与租户签订不同等级的SLA。如何有效的进行查询优化,提高查询效率,满足不同用户的SLA,以最小化SLA惩罚成本,已成为服务提供商关注的问题。服务提供商从成本与收益的角度希望用较少的资源成本尽可能满足所有租户的查询SLA,因此多租户数据库需在租户间共享查询处理资源,优化资源利用率。共享资源的多租户数据查询处理结构必然会出现多个租户查询争用资源,进而导致一些租户查询违反SLA。为最小化服务提供商SLA惩罚成本最小化,需在云计算环境下对SaaS多租户数据查询进行优化,其所面临的主要问题和挑战包括:(1)多租户数据处理需要良好的云组织架构。多租户数据库有着租户数量多,数据量大的特点,同时租户不断加入与离开数据库,这就需要依赖云计算平台来完成多租户数据处理。大量的节点与数据需要良好的数据组织、节点组织及数据定位方法,从而为查询SLA惩罚成本的优化提供基础。然而,目前给出清晰有效的多租户数据云组织架构的文献较少。(2)以租户为单位的资源分配粒度过大,仍有需进一步优化的空间。以租户的单位的资源的分配较易实现,目前多以租户粒度对SLA惩罚进行优化。然而,一个租户的诸多查询在惩罚成本、访问频率、占用资源量方面也有不同的属性。因此,需要以查询为单位对处理资源进行分配与调度,更加精细的进行查询优化。(3)多租户应用用户众多,查询并发数量多,易造成处理的性能瓶颈。特别是在负载较高时,云中众多节点负载不平衡,会导致一些查询无法在截止时间前完成,增加SLA的惩罚成本。云中去中心化的组织结构是避免性能瓶颈较为有效的手段。因此,降低SLA惩罚成本的查询优化需基于云中去中心化的组织结构。(4)当查询处理节点处于满负荷运行时,容易造成较多查询违约。当云中各处理节点配置完毕后,多租户的数据查询到达率并不稳定,当处于查询到达高峰时,各查询会对有限的处理资源竞争占用。这时若采用开辟新的处理节点或者租户数据迁移的方式为租户分配资源,无法敏捷、及时地解决资源争用问题。因此,需要设计一种高峰时期的查询处理应急机制,使违约的惩罚最小。本文在云计算环境下,以服务提供商的惩罚成本最小化为目标,结合租户数据的隔离性、定制性特点,对多租户的数据查询优化的索引、缓存、调度环节展开研究与讨论,主要工作和贡献包括:(1)针对多租户数据处理需要良好的云组织架构的问题,建立了一个支持P2P结构的多租户索引机制,该机制对云中的多租户数据及索引、节点进行组织,避免了集中式索引的性能瓶颈,同时为后续基于SLA的查询处理优化提供了良好的数据组织基础。该索引支持租户查询对隔离性的需求,即在利用索引获取数据时避免获取到其他租户无效数据。该机制支持索引项的顺序存储,支持SaaS应用常见的比较查询、范围查询。该机制将属于一个租户的索引与数据集中地存储在尽可能少的节点上,避免了租户查询处理时大量数据传输。该机制提供了动态扩展性,可以利用云计算平台的伸缩性为无限数量的租户提供索引服务。实验结果表明,在租户数量与节点数量达到一定规模时,该机制的单点查询时间与范围查询时间比集中式索引分别至少节省50%与75%,惩罚成本至少可以降低20%。(2)针对资源分配粒度过大问题,建立了一个SLA感知的多租户数据缓存管理机制,在P2P结构下根据不同租户查询的特征及违约惩罚值对多租户数据库的缓存进行优化,降低服务提供商的惩罚成本。建立了缓存数据与查询惩罚成本的量化关系,为选取缓存数据提供了依据。该机制为每个节点生成缓存数据,能够较大幅度降低总体惩罚成本。可以较高效率完成跨节点的缓存数据调整。在该机制中,任意节点都可以迅速完成租户查询的分发,使租户查询在处理时间最短的节点上被处理。通过实验验证了在云计算平台上,其惩罚成本比基准算法至少减少30%。(3)针对当查询处理节点处于满负荷运行时,容易造成较多查询违约的问题,建立了一个最小化SLA惩罚成本的多租户查询去中心化调度机制,通过确定每个查询的处理节点与处理时间,在处理资源紧张的条件下,优先保证关键查询在截止时间前返回,从而达到惩罚成本最小化。该机制根据租户查询的违约惩罚值、截止时间的急迫性,赋予每个租户查询一个优先级,优先级高的租户查询会被先处理,从而达到总体惩罚成本最小化。该机制基于P2P结构使每个节点都参与调度,避免了调度的性能瓶颈。改进了租户查询等待调度队列的数据结构,可以在大量的租户查询快速地完成查询的查找、插入与删除操作,提高了调度的效率。实验表明,在租户查询达到一定数量时,该调度机制的惩罚成本至少比基准方案低50%。该机制将调度的时间复杂度从O(N)降低到O(log~2N),实验表明一个租户查询的调度时间稳定在2ms左右,且不随租户查询数量增加而变化。
[Abstract]:Software as a service (SaaS) is an important form of application delivery of cloud computing, widely used by service providers, and has become an important channel for small and medium-sized enterprises to use advanced software technology in.SaaS mode. Mature service operators generally adopt a single instance and multi lease mode, and start an application example for many tenants to provide the same. Service providers, which are called multi tenant applications. Service providers deploy multi tenant applications in the cloud for tenants to rent these applications in a paid way. The service provider provides SaaS applications with different quality of service based on the tenant's needs and payment capabilities for different tenants. Service quality, and sign the service level protocol with the SaaS provider (SLA, Service-Level Agreement). Query response time is an important performance indicator in SLA. If the query response time exceeds the deadline specified by SLA, the tenant can not get the result of the data query in time, resulting in a lower SaaS experience. When the query response time is contrary to the service level, the query response time is contrary to the service level. When the target is, the service provider must pay a certain fine to the tenant according to the SLA signed. The service provider, according to the needs of each tenant and the application lease cost of each tenant, will sign different levels of SLA. to optimize the query effectively, improve the efficiency of the query, meet the SLA of the different users, and minimize the cost of SLA punishment. From the point of view of cost and benefit, service providers hope to satisfy all tenants' query SLA with less cost and cost, so multi tenant database needs to share query processing resources among tenants and optimize resource utilization. The tenant inquires the contention resource, and then causes some tenants to minimize the penalty cost minimization of the service provider SLA, which needs to optimize the SaaS multi tenant data query in the cloud computing environment. The main problems and challenges facing the SaaS are as follows: (1) multiple rental accounts need good cloud organization structure. The database has a large number of tenants and a large amount of data. At the same time, the tenants continue to join and leave the database. This needs to rely on the cloud computing platform to complete the multi tenant data processing. A large number of nodes and data need good data organization, node organization and data location method, thus providing the basis for the optimization of the query SLA penalty cost. There are few documents to give a clear and effective multi tenant data cloud organization structure at present. (2) the granularity of resource allocation with tenant as a unit is too large and still needs further optimization. The allocation of the resources of the tenant unit is easier to be realized. At present, the SLA punishment is optimized with the granularity of the tenant. However, many of the tenants' inquiries are punished. There are different attributes in the cost of penalty, the frequency of access and the amount of resources occupied. Therefore, it is necessary to allocate and dispatch the processing resources by the query unit, and more meticulous to optimize the query. (3) many tenants have a large number of users with a large number of concurrent queries, and it is easy to cause the performance bottleneck of processing. The load imbalance will lead to some queries that can not be completed before the deadline and increase the penalty cost of SLA. The decentralized organization structure of the cloud is a more effective means to avoid performance bottlenecks. Therefore, the query optimization for reducing the penalty cost of SLA needs to be based on the organization structure of the cloud centralization. (4) when the query processing node is at full load When the processing nodes in the cloud are configured, the arrival rate of the data query is not stable. When the query reaches the peak, the query will compete for the limited processing resources. It is difficult to solve the problem of resource contention in a timely manner. Therefore, it is necessary to design a query processing emergency mechanism at the peak period to minimize the penalty for default. This paper aims at minimizing the penalty cost of service providers in the cloud computing environment, combining the isolation of the tenant data, the customization characteristics, and optimizing the data query of the multi tenant. The main work and contributions are as follows: (1) a multi tenant index mechanism supporting P2P structure is established to solve the problem that multi tenant data processing needs a good cloud organization architecture. This mechanism organizes the multi tenant data and index, nodes in the cloud, avoids the centralized index. Performance bottlenecks provide a good data organization basis for subsequent SLA based query processing optimization. This index supports the demand for isolation by tenant queries, that is, to avoid getting to other tenant invalid data when using the index to obtain data. This mechanism supports sequential storage of index items and supports common comparative queries in SaaS applications. The mechanism will be stored on a tenant's index and data centrally on as few nodes as possible to avoid a large amount of data transmission when the tenant query is processed. The mechanism provides dynamic scalability and can use the scalability of the cloud computing platform for indefinite tenants to provide index services. The experimental results show that the number of tenants is in the tenant number. When the quantity and the number of nodes reach a certain scale, the single point query time and the range query time of the mechanism save at least 50% and 75% respectively than the centralized index. The penalty cost can reduce at least 20%. (2) for the problem of excessive resource allocation granularity, and a SLA aware multi renting data cache management mechanism is established, under the P2P structure, the different data cache management mechanism is different. The characteristics of the tenant query and the penalty of default are optimized for the caching of the multi tenant database to reduce the penalty cost of the service providers. The quantitative relationship between the cached data and the penalty cost is established to provide the basis for the selection of the cached data. This mechanism generates the cached data for each node, which can greatly reduce the overall penalty cost. In this mechanism, any node can quickly complete the distribution of the tenant query in this mechanism, so that the tenant query is processed on the shortest processing time. It is verified by experiments that on the cloud computing platform, the penalty cost is at least 30%. (3) less than the base algorithm for the query processing. When the node is in full load operation, it is easy to cause more query default. A multi tenant query de centralization scheduling mechanism is established to minimize the SLA penalty cost. By determining the processing nodes and processing time of each query, the key query is returned before the deadline. To minimize the penalty cost, the mechanism gives each tenant a priority according to the default penalty value of the tenant query, the urgency of the deadline, and the high priority tenant query will be processed first to minimize the overall penalty cost. This mechanism is based on the P2P structure to make each node participate in scheduling and avoid scheduling. It improves the data structure of the tenant query waiting for the scheduling queue. It can quickly complete the search, insert and delete operations in a large number of tenants, and improve the efficiency of the scheduling. The experiment shows that the penalty cost of the scheduling mechanism is lower than the benchmark scheme at least 50%. when the tenant query reaches a certain number. The time complexity is reduced from O (N) to O (log~2N). The experiment shows that the scheduling time of a tenant query is stable around 2ms, and does not change with the increase of the number of tenant query.
【学位授予单位】:山东大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP393.09
,
本文编号:1795097
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1795097.html