电商数据仓库作业调度系统的设计与实现
[Abstract]:Data has become the core competitiveness of contemporary Internet enterprises, and efficient job scheduling system is an important tool for offline mass data management. Who can effectively manage these massive data and effectively mine valuable information. The ETL job is the core of the daily work of data warehouse. A large number of jobs with complex relationships can only be carried out efficiently and orderly under the scheduling management of the job scheduling system. In the current era of information economy with data as productivity, the daily work of e-commerce data warehouse is no longer a simple data backup and log pull, any data can be associated with the possibility of a new spark. Therefore, the job scheduling system should not only guarantee the efficient and stable triggering of jobs, but also take into account the dependencies among the jobs. Finally, all jobs will be triggered in an orderly manner in the form of job chains. These requirements are the new challenges to the construction of job scheduling system. With the arrival of big data era, big data processing tools based on Hadoop ecosystem have been widely accepted by the market. The birth of Hive database is to meet the needs of big data era. In this system, the support of Hive data processing is brought into the important part of data warehouse, and the stable and high expansibility advantage of hadoop cluster is fully utilized, and the distributed cluster is adopted to meet the stable / efficient / economical demand of electronic commerce enterprises for data warehouse. The new job scheduling system not only supports conventional relational database processing, but also can be compatible with HIVE data processing function. At present, most of the job scheduling systems in domestic and foreign enterprises are mainly self-built, and there are some excellent open source job scheduling systems (such as OOZIE) and some excellent job scheduling system frameworks (such as quartz),). However, in the use of scenarios and functions with the current stage of enterprise development requirements. Through summing up the scheduling requirements of daily work, this paper designs and develops a set of customized scheduling engine of e-commerce data warehouse in accordance with the current development stage for the enterprise. Data developers can easily deploy their jobs on any job machine and provide a series of unified and efficient management such as cycle adjustment, flexible addition of dependencies, load balancing, logging, monitoring and alarm, and so on.
【学位授予单位】:首都经济贸易大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13
【参考文献】
中国期刊全文数据库 前10条
1 宋丹;黄旭;;新兴技术在商业智能创新发展中的应用[J];中国管理信息化;2016年19期
2 周柱;郎朗;;Ajax技术在B/S架构中的数据传输应用研究[J];新余学院学报;2016年03期
3 李治;;数据挖掘在商业信息服务中的应用[J];电脑知识与技术;2015年05期
4 赵宣容;;计算机软件数据库设计的重要性以及原则探讨[J];电子技术与软件工程;2015年17期
5 王有为;王伟平;孟丹;;基于统计方法的Hive数据仓库查询优化实现[J];计算机研究与发展;2015年06期
6 曹靖;;提高Java数据库访问效率的策略研究[J];通讯世界;2015年11期
7 叶均隆;叶均明;何银川;;Tomcat执行定时任务实现不同系统数据导入[J];现代计算机(专业版);2015年09期
8 罗强;何利力;王晓菲;;数据仓库中数据清洗技术分析[J];电脑编程技巧与维护;2015年02期
9 聂章艳;李川;唐常杰;徐洪宇;张永辉;杨宁;;面向OLGP的多维信息网络数据仓库模型设计[J];计算机科学与探索;2014年01期
10 侯增江;王勇;饶磊;;一种高可用性的计划任务管理方法[J];计算机与现代化;2012年12期
中国博士学位论文全文数据库 前1条
1 马丹;任务间相互依赖的并行作业调度算法研究[D];华中科技大学;2007年
中国硕士学位论文全文数据库 前4条
1 王伟;基于Hive的物流数据仓库研究与实现[D];东华大学;2016年
2 张智敏;数据仓库之ETL并行调度研发[D];吉林大学;2015年
3 金迎;基于SaaS的中小企业区域信息化支持平台构建研究[D];东北林业大学;2011年
4 王云辉;工作流建模过程的分析与设计[D];吉林大学;2004年
,本文编号:2340805
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2340805.html