基于Hadoop的电信大数据采集方案研究与实现
发布时间:2019-02-09 12:15
【摘要】:ETL是数据仓库实施过程中一个非常重要的步骤,设计一个能够对大数据进行有效处理的ETL流程以提高运营平台的采集效率,具有重要的实际意义。首先简单介绍某运营商大数据平台采集的主要数据内容。随后,为提升海量数据采集效率,提出了Hadoop与Oracle混搭架构解决方案。继而,提出一种动态触发式ETL调度流程与算法,与定时启动的ETL流程调度方式相比,可有效缩短部分流程的超长等待时间;有效避免资源抢占拥堵现象。最后,根据Hadoop和Oracle的系统运行日志,比较分析了两个平台的采集效率与数据量之间的关系。实践表明,混搭架构的大数据平台优势互补,可有效提升数据采集时效性,获得比较好的应用效果。
[Abstract]:ETL is a very important step in the implementation of data warehouse. It is of great practical significance to design a ETL process that can deal with big data effectively in order to improve the collection efficiency of the operation platform. First of all, a brief introduction of the main data collected by big data platform. Then, in order to improve the efficiency of mass data acquisition, a solution of Hadoop and Oracle mashup architecture is proposed. Then, a dynamic trigger ETL scheduling process and algorithm is proposed, which can effectively shorten the long waiting time of some processes and avoid the congestion phenomenon of resource preemption compared with the scheduled ETL process scheduling mode. Finally, according to the system log of Hadoop and Oracle, the relationship between the collection efficiency and the data volume of the two platforms is compared and analyzed. The practice shows that the big data platform of the mashup architecture has complementary advantages, which can effectively improve the timeliness of data acquisition and obtain a better application effect.
【作者单位】: 中国联合网络通信有限公司上海市分公司;同济大学软件学院;
【分类号】:TP311.13
,
本文编号:2418947
[Abstract]:ETL is a very important step in the implementation of data warehouse. It is of great practical significance to design a ETL process that can deal with big data effectively in order to improve the collection efficiency of the operation platform. First of all, a brief introduction of the main data collected by big data platform. Then, in order to improve the efficiency of mass data acquisition, a solution of Hadoop and Oracle mashup architecture is proposed. Then, a dynamic trigger ETL scheduling process and algorithm is proposed, which can effectively shorten the long waiting time of some processes and avoid the congestion phenomenon of resource preemption compared with the scheduled ETL process scheduling mode. Finally, according to the system log of Hadoop and Oracle, the relationship between the collection efficiency and the data volume of the two platforms is compared and analyzed. The practice shows that the big data platform of the mashup architecture has complementary advantages, which can effectively improve the timeliness of data acquisition and obtain a better application effect.
【作者单位】: 中国联合网络通信有限公司上海市分公司;同济大学软件学院;
【分类号】:TP311.13
,
本文编号:2418947
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2418947.html