大规模多维网络分析模型的研究与实现
发布时间:2018-10-24 11:40
【摘要】:随着信息技术的发展,存储设备成本的降低,企业都根据自己的业务需求建立了大量的数据库并存储了海量的数据。如何利用这些数据为业务决策提供指引与建议是企业决策分析人员需要解决的一个难题。联机分析处理(OLAP)被公认为是一个有效的解决方案。OLAP能够高效快速地对海量数据进行多维度、跨粒度的分析并提供决策支持。经过二十多年的研究与发展,OLAP技术已经相对成熟规范,很多商用的数据库以及数据仓库系统都有OLAP功能的实现。近些年来,社交网络、生物信息、多源信息融合等新兴领域高速发展,在现实应用中涌现出大量的多维异质网络,网络的规模也在不断增大。传统OLAP分析的数据是以事实表与维表组织的,事实之间没有关联。使用传统的OLAP技术无法有效的对多维网络进行分析。面对这一问题,Graph OLAP技术逐渐发展起来,这一技术相比于传统的OLAP技术,改进了信息模型,使用图立方体代替数据立方体,支持网络数据的多维多角度分析。但是Graph OLAP的研究目前仍还处于起步阶段,模型分析能力有限,大多的模型不支持对多维异质网络以及海量数据进行有效和高效的分析。本文针对现有Graph OLAP模型的不足,提出了新的分析模型,支持大规模多维异质网络的多维度分析,本文的主要研究内容如下:1.设计了新型的多维异质网络信息模型,定义了异质网络中的二元关系元路径,n元关系元路径,并对这些元路径的关系进行了研究,作为指导网络聚集的新方式。2.设计了 TSMH Graph Cube,将传统的图立方体扩展为实体超立方体和维度立方体这样的两阶段立方体。在立方体模型的基础上,赋予了传统操作新的语义,并提出了更多的Graph OLAP操作,使得网络分析更加多样。3.对实体超立方体,本文提出了并行化的聚集算法并给出了物化策略。对维度立方体,本文对节点以及维度属性进行编码,设计了节点的编码算法,使得节点做维度OLAP操作时无需进行实体表与维度表的连接操作,大大提高了维度OLAP操作的效率。4.为支持海量的数据规模,模型的Graph OLAP操作算法使用并行计算框架实现。通过对大规模真实以及模拟数据的实验,验证了模型对大规模多维异质网络能够进行有效和高效的分析。
[Abstract]:With the development of information technology and the reduction of storage equipment cost, enterprises have established a large number of databases and stored huge amounts of data according to their own business requirements. How to use these data to provide guidance and advice for business decision making is a difficult problem that enterprise decision analysts need to solve. On-Line Analytical processing (OLAP) is recognized as an effective solution. OLAP can efficiently and quickly analyze large amounts of data in multiple dimensions, cross-granularity and provide decision support. After more than 20 years of research and development, OLAP technology has been relatively mature specification, many commercial databases and data warehouse systems have the implementation of OLAP function. In recent years, social networks, biological information, multi-source information fusion and other emerging areas of rapid development, in the practical application of a large number of multi-dimensional heterogeneous networks, network size is also increasing. The data of traditional OLAP analysis is organized by fact table and dimension table, and there is no correlation between facts. Using the traditional OLAP technology can not effectively analyze the multidimensional network. In the face of this problem, Graph OLAP technology is gradually developed. Compared with the traditional OLAP technology, this technology improves the information model, uses graph cube instead of data cube, and supports multi-dimensional and multi-angle analysis of network data. However, the research of Graph OLAP is still in its infancy, the ability of model analysis is limited, and most of the models do not support the analysis of multi-dimensional heterogeneous networks and massive data effectively and efficiently. In this paper, a new analysis model is proposed to support the multi-dimensional analysis of large-scale multi-dimensional heterogeneous networks. The main contents of this paper are as follows: 1. A new multi-dimensional heterogeneous network information model is designed, and the binary relational meta-path and n-element relational meta-path in heterogeneous network are defined, and the relationship of these meta-paths is studied as a new way to guide network aggregation. 2. TSMH Graph Cube, is designed to extend the traditional graph cubes to two-stage cubes such as solid hypercube and dimensional cube. Based on the cube model, new semantics of traditional operations are given, and more Graph OLAP operations are proposed, which makes network analysis more diverse. In this paper, we propose a parallel aggregation algorithm for solid hypercubes and present a materialization strategy. For dimension cube, this paper encodes nodes and dimension attributes, designs the coding algorithm of nodes, makes nodes do not need to join entity table and dimension table when they do dimension OLAP operation, and greatly improves the efficiency of dimension OLAP operation. 4. In order to support massive data scale, the Graph OLAP operation algorithm of the model is implemented by parallel computing framework. Experiments on large scale real and simulated data show that the model can effectively and efficiently analyze large scale multi-dimensional heterogeneous networks.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13
[Abstract]:With the development of information technology and the reduction of storage equipment cost, enterprises have established a large number of databases and stored huge amounts of data according to their own business requirements. How to use these data to provide guidance and advice for business decision making is a difficult problem that enterprise decision analysts need to solve. On-Line Analytical processing (OLAP) is recognized as an effective solution. OLAP can efficiently and quickly analyze large amounts of data in multiple dimensions, cross-granularity and provide decision support. After more than 20 years of research and development, OLAP technology has been relatively mature specification, many commercial databases and data warehouse systems have the implementation of OLAP function. In recent years, social networks, biological information, multi-source information fusion and other emerging areas of rapid development, in the practical application of a large number of multi-dimensional heterogeneous networks, network size is also increasing. The data of traditional OLAP analysis is organized by fact table and dimension table, and there is no correlation between facts. Using the traditional OLAP technology can not effectively analyze the multidimensional network. In the face of this problem, Graph OLAP technology is gradually developed. Compared with the traditional OLAP technology, this technology improves the information model, uses graph cube instead of data cube, and supports multi-dimensional and multi-angle analysis of network data. However, the research of Graph OLAP is still in its infancy, the ability of model analysis is limited, and most of the models do not support the analysis of multi-dimensional heterogeneous networks and massive data effectively and efficiently. In this paper, a new analysis model is proposed to support the multi-dimensional analysis of large-scale multi-dimensional heterogeneous networks. The main contents of this paper are as follows: 1. A new multi-dimensional heterogeneous network information model is designed, and the binary relational meta-path and n-element relational meta-path in heterogeneous network are defined, and the relationship of these meta-paths is studied as a new way to guide network aggregation. 2. TSMH Graph Cube, is designed to extend the traditional graph cubes to two-stage cubes such as solid hypercube and dimensional cube. Based on the cube model, new semantics of traditional operations are given, and more Graph OLAP operations are proposed, which makes network analysis more diverse. In this paper, we propose a parallel aggregation algorithm for solid hypercubes and present a materialization strategy. For dimension cube, this paper encodes nodes and dimension attributes, designs the coding algorithm of nodes, makes nodes do not need to join entity table and dimension table when they do dimension OLAP operation, and greatly improves the efficiency of dimension OLAP operation. 4. In order to support massive data scale, the Graph OLAP operation algorithm of the model is implemented by parallel computing framework. Experiments on large scale real and simulated data show that the model can effectively and efficiently analyze large scale multi-dimensional heterogeneous networks.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13
【参考文献】
相关期刊论文 前7条
1 王会举;覃雄派;王珊;张延松;李芙蓉;;面向大规模机群的可扩展OLAP查询技术[J];计算机学报;2015年01期
2 陈湘涛;丁平尖;王晶;;异构信息网中基于元路径的动态相似性搜索[J];计算机应用;2014年09期
3 黄立威;李德毅;马于涛;郑思仪;张海粟;付鹰;;一种基于元路径的异质信息网络链路预测模型[J];计算机学报;2014年04期
4 古晓艳;王伟平;孟丹;杨秀峰;周江;;高效支持多维网络OLAP的数据立方体模型CI-DCG[J];高技术通讯;2013年10期
5 孟小峰;慈祥;;大数据管理:概念、技术与挑战[J];计算机研究与发展;2013年01期
6 王珊;王会举;覃雄派;周p,
本文编号:2291299
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2291299.html