基于云计算的BI处理技术研究

发布时间：2018-06-11 15:55

本文选题：云计算 + Hadoop　；参考：《吉林大学》2013年硕士论文

【摘要】：商业智能（BI）是多种技术相结合形成的解决方案，对于商业信息化起到了重要的作用。但是使用也起来有着它的局限性，包括：使用成本过高、硬件资源有限、安全保障性过低、使用风险过高等缺点，而且处理性能不能满足功能需求的不断扩展。因此，只有突破局限性才能更好的发展商业智能技术。本文通过分析云计算的特点，将云计算技术与BI处理技术相结合，尝试为解决局限性问题寻找一个突破口，构造出一个新的解决方案。首先选择Hadoop作为系统实现和实验的云计算平台，同时将传统的云计算架构进行改进，使得云计算框架完全面向服务于BI结合进行改进，称作BIHadoop。与传统云计算两层结构相比，BIHadoop集群的搭建采用了四层结构，在最顶层使用反向代理服务器，第二层加入超级主节点。超级节点起到承上启下作用，既对下两层的节点实现直接控制，又可以承接上层通信服务，，完成应用服务宏观调配、命名管理、服务器端负载均衡、前端负载均衡、风险控制管理和运行维护检测活动等功能。具体结构为：第一层代理层；第二层是控制节点层，由一个超级主节点（主控制节点）构成；第三层是名字节点层，跟传统云计算架构中的第一层类似；第四层是由许多虚拟机组成的超级数据节点层。与传统Hadoop集群仅有一个名字节点相比，本集群中拥有多个名字节点，系统的综合性能也有着全方面的提升。整体架构中的前两层的节点皆可归类为控制节点，每层在文件系统中都有统一的管理协议，同时管理着本层中的特殊格式元数据和下一层的节点之间的映射。接下来对BI体系结构进行适应Hadoop平台的改进。首先提出了BI体系中最重要的数据挖掘模块面向Hadoop平台的改进模型，由于传统的数据挖掘体系结构是面向单任务处理的串行结构，而云计算平台采用并发处理模式，因此在数据处理能力和安全性方面存在瓶颈。在此我们构建出面向Hadoop平台的数据挖掘云模型，对数据挖掘模型进行改进，共有HDFS数据管理、算法管理和资源监控三大模块。核心模块算法管理是各种基于MapReduce算法集成的工具箱（数据挖掘中间件），以更好的适应云计算平台的并行计算和MapReduce编程模式，同时使得数据挖掘架构与云计算技术结合起来更安全和稳定。然后是对BI的体系结构进行改进，根据功能需要将体系结构模块化，总共化分为8个主要功能模块，改进后的架构为云计算的加入提供了接口，可以获得更好的扩展性和可维护性。使得云计算技术的加入后形成一个内部功能强大、结构科学的完整体系结构。云计算处理BI技术的解决方案生成后进行了系统实现，并且利用数据挖掘算法对数据的处理来进行性能测试。测试的指标主要是负载均衡特性、存储能力、所需成本、扩展性能、安全性能和计算能力，以上指标均达到了较好的水平。
[Abstract]:Business intelligence ( BI ) is a solution formed by combining multiple technologies . It plays an important role in business informatization . However , it also has its limitations . It includes : high use cost , limited hardware resources , low security supportability , high risk of use , etc . , and the processing performance can not meet the continuous expansion of functional requirements . Therefore , the business intelligence technology can be developed only through the limitation of breakthrough .

In this paper , by analyzing the characteristics of cloud computing , combining cloud computing technology with BI processing technology , trying to find a breakthrough to solve the limitation problem , a new solution is constructed .

The invention provides a cloud computing platform for realizing and testing Hadoop as a system , and simultaneously improves the traditional cloud computing architecture , so that the cloud computing framework is fully oriented to the BI integration and is called BIHadoop .
the second layer is a control node layer , and is formed by a super host node ( main control node ) ;
the third layer is a name node layer similar to the first layer in the conventional cloud computing architecture ;
The fourth layer is a super data node layer composed of many virtual machines . Compared with the traditional Hadoop cluster , there are more than one name node in the cluster , and the comprehensive performance of the system is also improved . All nodes in the first two layers in the overall architecture can be classified as control nodes , each layer has a unified management protocol in the file system , and the mapping between the special format metadata and the next layer in the layer is managed .

This paper presents an improved model of the data mining model , which is based on the MapReduce algorithm . It also improves the data processing ability and security . The core module algorithm management is a kind of tool box ( data mining middleware ) which is integrated with the MapReduce algorithm . The architecture of the data mining is improved . The architecture is modularized and the architecture is divided into 8 main functional modules . The improved architecture provides an interface for the join of cloud computing .

After the solution of BI technology is generated by cloud computing , the system implementation is carried out , and the performance test is carried out by using the data mining algorithm to process the data . The indexes of the test are mainly the load balance characteristics , the storage capacity , the required cost , the expansion performance , the safety performance and the computing capacity , and the above indexes all reach a good level .
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP311.13;TP3

【参考文献】