电商数据仓库作业调度系统的设计与实现

发布时间：2018-11-18 18:31

【摘要】：数据已成为当代互联网企业核心竞争力,而高效的作业调度系统是离线海量数据管理的重要工具,谁能有效管理这些海量数据,并能有效挖掘其中有价值信息,谁就站在了战略至高点。ETL作业是数据仓库日常工作的核心内容,海量具有复杂关系的作业只有在作业调度系统的调度管理下才能高效有序进行。在当前以数据为生产力的信息经济时代,电商数据仓库日常工作已不是简单的数据备份和日志拉取,任何能够关联的数据都有可能产生新的火花。由此,作业调度系统不仅要保证高效和稳定地触发作业,又要兼顾各作业之间的依赖关系,最后以作业链的形式有序触发所有作业。这些要求是作业调度系统建设将要面临的新的挑战。随着大数据时代的到来,以Hadoop生态系统为基础的大数据处理工具得到了市场的广泛认可。而Hive数据库的诞生正迎合了大数据时代的需要。本系统将对Hive数据处理的支持纳入数据仓库重要部分,充分利用了hadoop集群稳定高扩展性优势,采用分布式集群满足电子商务企业对数据仓库的稳定/高效/经济的需求。由此新的作业调度系统不仅支持常规关系型数据库处理,还能兼容HIVE数据处理功能。目前,国内外各大企业数据仓库作业调度系统多以自主建设为主,也有些优秀的开源作业调度系统(如OOZIE)和一些优秀的作业调度系统框架(如quartz),但是在使用场景和功能上与企业当前发展阶段需求不符。本文通过总结日常工作中的调度需求,为企业设计开发了一套符合当前发展阶段的定制化的电商数据仓库作业调度引擎,数据开发人员能够方便地在任意作业机部署自己的作业,并提供按周期调起,灵活添加依赖,负载均衡,日志记录,监控报警等一系列的统一高效管理。
[Abstract]:Data has become the core competitiveness of contemporary Internet enterprises, and efficient job scheduling system is an important tool for offline mass data management. Who can effectively manage these massive data and effectively mine valuable information. The ETL job is the core of the daily work of data warehouse. A large number of jobs with complex relationships can only be carried out efficiently and orderly under the scheduling management of the job scheduling system. In the current era of information economy with data as productivity, the daily work of e-commerce data warehouse is no longer a simple data backup and log pull, any data can be associated with the possibility of a new spark. Therefore, the job scheduling system should not only guarantee the efficient and stable triggering of jobs, but also take into account the dependencies among the jobs. Finally, all jobs will be triggered in an orderly manner in the form of job chains. These requirements are the new challenges to the construction of job scheduling system. With the arrival of big data era, big data processing tools based on Hadoop ecosystem have been widely accepted by the market. The birth of Hive database is to meet the needs of big data era. In this system, the support of Hive data processing is brought into the important part of data warehouse, and the stable and high expansibility advantage of hadoop cluster is fully utilized, and the distributed cluster is adopted to meet the stable / efficient / economical demand of electronic commerce enterprises for data warehouse. The new job scheduling system not only supports conventional relational database processing, but also can be compatible with HIVE data processing function. At present, most of the job scheduling systems in domestic and foreign enterprises are mainly self-built, and there are some excellent open source job scheduling systems (such as OOZIE) and some excellent job scheduling system frameworks (such as quartz),). However, in the use of scenarios and functions with the current stage of enterprise development requirements. Through summing up the scheduling requirements of daily work, this paper designs and develops a set of customized scheduling engine of e-commerce data warehouse in accordance with the current development stage for the enterprise. Data developers can easily deploy their jobs on any job machine and provide a series of unified and efficient management such as cycle adjustment, flexible addition of dependencies, load balancing, logging, monitoring and alarm, and so on.
【学位授予单位】：首都经济贸易大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13

【参考文献】

中国期刊全文数据库前10条

1 宋丹;黄旭;;新兴技术在商业智能创新发展中的应用[J];中国管理信息化;2016年19期

2 周柱;郎朗;;Ajax技术在B/S架构中的数据传输应用研究[J];新余学院学报;2016年03期

3 李治;;数据挖掘在商业信息服务中的应用[J];电脑知识与技术;2015年05期

4 赵宣容;;计算机软件数据库设计的重要性以及原则探讨[J];电子技术与软件工程;2015年17期

5 王有为;王伟平;孟丹;;基于统计方法的Hive数据仓库查询优化实现[J];计算机研究与发展;2015年06期

6 曹靖;;提高Java数据库访问效率的策略研究[J];通讯世界;2015年11期

7 叶均隆;叶均明;何银川;;Tomcat执行定时任务实现不同系统数据导入[J];现代计算机(专业版);2015年09期

8 罗强;何利力;王晓菲;;数据仓库中数据清洗技术分析[J];电脑编程技巧与维护;2015年02期

9 聂章艳;李川;唐常杰;徐洪宇;张永辉;杨宁;;面向OLGP的多维信息网络数据仓库模型设计[J];计算机科学与探索;2014年01期

10 侯增江;王勇;饶磊;;一种高可用性的计划任务管理方法[J];计算机与现代化;2012年12期

中国博士学位论文全文数据库前1条

1 马丹;任务间相互依赖的并行作业调度算法研究[D];华中科技大学;2007年

中国硕士学位论文全文数据库前4条

1 王伟;基于Hive的物流数据仓库研究与实现[D];东华大学;2016年

2 张智敏;数据仓库之ETL并行调度研发[D];吉林大学;2015年

3 金迎;基于SaaS的中小企业区域信息化支持平台构建研究[D];东北林业大学;2011年

4 王云辉;工作流建模过程的分析与设计[D];吉林大学;2004年

，

本文编号：2340805

资料下载

论文发表

支付宝下载

Download by Alipay
微信下载

Download by Wechat
会员下载

Download by Member

本文链接：https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2340805.html

上一篇：电商数据仓库作业调度系统的设计与实现
下一篇：结合遮挡级别的人体姿态估计方法

论文发表

·知网|万方|维普|龙源|省级|国家级|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|