网路安全数据可视化系统的设计与研究
发布时间:2018-07-20 12:51
【摘要】:现代社会呈现指数增长的数字信息,促使数据分析学科进入了一个蓬勃发展的黄金年代。一直以来,人们总是试图使用数据分析的方法,从源源不断的数据资源中探索出与我们息息相关的信息。在网络安全领域,使用数据分析来解决安全问题成为一个新的方法。人们收集到的各种安全日志数据的数据量巨大,如果没有分析工具的帮助人们将无法处理和使用这些数据。尤其是人们还需要解决快速地理解网络通信模式、识别网络异常点和发现网络攻击等一系列问题。网络安全可视化技术就是一种非常实用的技术。将可视化技术应用到网络安全领域,把庞大的网络数据转变成易于理解的视觉图像,利用人类视觉来获取数据模型和结构,构建起安全数据和认知之间的桥梁。可视化在网络安全领域的流行是必然的:人们需要筛选的数据越多,就越希望把数据转化成图像,把图像和文字并列显示。可视化成为一个重要的分析工工具,运用它能够直观地呈现出安全数据背后所表现出来的模式和规律,从而帮助人们分析网络现状,处理已经发现的安全事件以及及时预测未发生的潜在安全事件。同时,可视化分析工具能够帮助我们更好地理解安全数据,它帮助人们处理数据过载而节约时间,在告知人们信息的同时也让人们参与数据收集和分析的过程。本文依据网络安全可视化参考模型,借鉴分层架构的思想,研究并设计了一款针对网络安全数据可视化的web原型系统Nets.vis。该系统能够完成从数据处理到生成视图的过程。Nets.vis原型系统框架是一个分层、灵活、轻量级的网络安全数据可视化框架。该系统使用了服务器-客户端的结构,客户端在用户的浏览器中进行渲染,服务器端提供数据的获取、存储和分析并加载可视化组件。Nets.vis原型系统主要由以下7层构成:(1)数据预处理层。主要对源数据进行数据清洗,将脏数据、无用的数据、错误的数据去掉,得到干净可用的数据。(2)数据导入层。该层主要负责将MySQL数据库中的数据导入到HDFS中。(3)数据存储层。Nets.vis原型系统的所有实验数据均保存在HDFS中。(4)数据管理层。整个Nets.vis原型系统的数据仓库的数据都是由Hive来管理,也就是说,所有的数据都是由数据存储层以Hive表的形式输出到数据管理层。(5)数据服务层。在这一层,根据分析的需求,基于数据仓库的数据进行各种分析和数据挖掘。(6)数据应用层。数据服务层的数据必须导回到关系型数据库中,这是由于Hive执行的高延迟不适合用来生成最终的可视化结果。(7)可视化层。用户通过浏览器查看最终的可视化结果。整个Nets.vis系统的需求功能可以概括为:数据预处理、数据导入、数据分析、生成视图。本文主要从以下几个方面展开研究工作。首先,通过在Linux系统的服务器上部署Hadoop系统,实现了对大规模数据的存储和管理。Hadoop系统提供的Hive数据仓库可用于存储数据,Sqoop可实现关系型数据库MySQL与Hadoop之间的数据传输。研究中服务器端的数据导入、存储及相关数据分析模块均基于Hadoop平台。使用Sqoop实现从关系型数据库MySQL中导入数据到数据仓库Hive中,再将分析后的结果导回到MySQL数据库中。客户端使用Spring MVC对Web端进行架构,并使用Bootstrap优化原型系统的可视化界面。其次,由于在本文的Nets.vis可视化原型系统里,经常会涉及到查询等操作,因此优化Hive的数据分析模块的相关操作效率十分重要。本文使用空间亚线性算法对数据提取、转换、加载、查询等操作效率进行优化改进。其中,使用寻找频繁元素的Misra-Gries算法,通过计算找出出现最频繁的元素。例如在网络中找到频繁出现的IP地址;使用估算不同元素的数量算法来估算数据流中不同元素的个数,例如可以用于统计某个页面的访问ip数。与此同时,在数据分析模块使用Canopy聚类结合k-means聚类对源IP进行分析。在数据分析模块中选取属性维度时,本文选取概率论和统计学中一种常用的皮尔逊积距相关系数和相关矩阵来验证维度间的相关性。然后,Nets.vis原型系统的可视化模块其主要目的是用于按照用户的意愿进行数据集合的筛选。在可视化模块中,本文主要使用Echarts和D3两个可视化工具设计了符合网络安全数据属性的可视化组件,包括:气泡图、Treemap、平行坐标图、关系图、条形图、折线图以及矩形热力图。本文设计实现了基于SVG的可视化组件渲染方法,可以使可视化的结果更加丰富且直观。同时,使用Brich算法对气泡图进行了布局上的改进。最后,本文采用“先总体后细节”的可视化指南,选取Nets.vis原型系统中的部分可视化组件,使用Vis China 2015挑战赛提供的Tcp flow日志数据来验证Nets.vis系统的可行性。第一步,使用层次聚类改进过的气泡图、条形图以及关系图,找出了网络中的服务器与客户端,挖掘了网络的拓扑结构。第二步,对服务器分别按照协议特征以及时间序列特征进行分类。第三步,挖掘网络流量特征。对于流量特征的挖掘,本文考虑结合网络流量数据具有的层次结构属性和时序属性:以折线图实现数据整体时序特征的可视化,发现网络“节假日模式”和“工作日模式”。第四步,以树图实现数据局部时间特征的可视化,发现产生异常的特定主机。实验证明:使用Nets.vis系统可视化分析Tcp flow数据集,实现了由整体到局部的网络分析,通过该系统能够完成对网络服务与客户端的确定、对服务器进行分类、识别网络流量模式以及发现网络异常,便于分析人员对网络的管理以及对网络安全事态的感知。
[Abstract]:The digital information of the exponential growth in modern society has prompted the data analysis subject into a flourishing golden age. People always try to use the method of data analysis to explore the information which is closely related to us from the continuous data resources. In the field of network security, the use of data analysis to solve the security. The whole problem becomes a new method. The amount of data that people collect is huge, and people will not be able to handle and use these data without the help of analytical tools. In particular, people also need to solve a series of problems such as fast understanding of network communication patterns, identifying network anomaly points and discovering network attacks. Network security visualization technology is a very practical technology. It applies the visualization technology to the field of network security, transforms large network data into easy to understand visual images, uses human vision to obtain data model and structure, and constructs a bridge between security data and cognition. Visualization is popular in the field of network security. It is inevitable: the more data people need to screen, the more they want to transform the data into images, and to display the image and the text. Visualization becomes an important analytical tool, using it to visualize the patterns and rules displayed behind the security data, so as to help people to analyze the network status and deal with it. At the same time, visual analysis tools help us to better understand security data. It helps people to deal with data overload and save time. It also allows people to participate in data collection and analysis while informing people of information. This article is based on network security visibility. The web prototype system Nets.vis., which can complete from data processing to generated view, is a framework of hierarchical, flexible and lightweight network security data visualization framework. The system is used in this system. The server client structure, the client is rendered in the user's browser, the server side provides data acquisition, storage and analysis, and loading visual components.Nets.vis prototype system mainly consists of the following 7 layers: (1) data preprocessing layer. The main data is cleaned on the source data, dirty data, useless data, the wrong number The data import layer. (2) the data import layer. The layer is mainly responsible for importing the data in the MySQL database into the HDFS. (3) all the experimental data of the.Nets.vis prototype system of the data storage layer are kept in the HDFS. (4) the data management layer. The data warehouse data of the whole Nets.vis prototype system are managed by Hive, too. It is said that all data are output from the data storage layer to the data management layer in the form of Hive table. (5) data service layer. In this layer, various analysis and data mining are carried out based on data warehouse data according to the requirements of analysis. (6) data application layer. Data service layer data must be returned to relational database, which is due to Hi The high latency of VE execution is not suitable for generating the final visualization results. (7) the visualization layer. Users view the final visual results through the browser. The requirement function of the whole Nets.vis system can be summarized as data preprocessing, data import, data analysis, and generation view. The main research work is to be carried out from the following aspects. First, By deploying Hadoop system on the server of Linux system, the Hive data warehouse provided for the storage and management of large scale data can be used to store data. Sqoop can realize data transmission between MySQL and Hadoop in relational database. The data import, storage and related data analysis module of the server side in the study It is based on the Hadoop platform. Using Sqoop to import data from relational database MySQL into the data warehouse Hive, then lead the analysis results back to the MySQL database. The client uses Spring MVC to construct the Web end, and uses Bootstrap to optimize the visual interface of the prototype system. Secondly, because of the Nets.vis visualization in this article. In the prototype system, query and other operations are often involved, so it is very important to optimize the operation efficiency of the data analysis module of Hive. This paper uses the spatial sublinear algorithm to optimize the operation efficiency of data extraction, conversion, loading, query and so on. In this paper, the Misra-Gries algorithm for finding frequent elements is used to find out the results by calculation. The most frequent elements, such as finding frequent IP addresses in the network, estimate the number of different elements in the data stream using the number of algorithms that estimate the number of different elements, such as the number of access IP for a page. At the same time, the data analysis module uses Canopy clustering and K-means clustering to analyze the source IP. When selecting attribute dimensions in the data analysis module, this paper selects a common Pearson product distance correlation coefficient and correlation matrix in probability theory and statistics to verify the correlation between dimensions. Then, the main purpose of the visualization module of the Nets.vis prototype system is to screen the data set according to the user's wishes. In the module, this paper mainly uses two visual tools of Echarts and D3 to design visual components that conform to the network security data attributes, including bubble graph, Treemap, parallel coordinate diagram, relation diagram, bar graph, line diagram and rectangular thermal diagram. This paper designs and implements a visualization component rendering method based on SVG, which can make visual results More abundant and intuitive. At the same time, the Brich algorithm is used to improve the layout of the bubble graph. Finally, this paper uses the visual guide of "the first overall after details", selects some visual components in the Nets.vis prototype system, and uses the Tcp flow log data provided by the Vis China 2015 challenge to verify the feasibility of the Nets.vis system. One step, using hierarchical clustering improved bubble map, bar graph and relational graph, find the server and client in the network, excavate the network topology. Second steps, the server according to the protocol characteristics and time series characteristics are classified respectively. Third steps, digging network traffic characteristics. For the flow characteristics mining, this paper Considering the hierarchical attributes and temporal attributes of the network traffic data, the visualization of the whole time sequence characteristics of the data is realized by the fold line graph, and the network "holiday mode" and "working day mode" are found. The fourth step is to visualize the local time characteristics of the data with the tree graph, and find the specific host that produces the abnormal. The Nets.vis system is used to visualize the Tcp flow data set, and the network analysis from the whole to the local is realized. Through this system, the network service and the client can be determined, the server is classified, the network traffic pattern is identified and the network abnormality is found, which facilitates the analysis of the network management and the network security situation. Perception.
【学位授予单位】:兰州交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.08
本文编号:2133572
[Abstract]:The digital information of the exponential growth in modern society has prompted the data analysis subject into a flourishing golden age. People always try to use the method of data analysis to explore the information which is closely related to us from the continuous data resources. In the field of network security, the use of data analysis to solve the security. The whole problem becomes a new method. The amount of data that people collect is huge, and people will not be able to handle and use these data without the help of analytical tools. In particular, people also need to solve a series of problems such as fast understanding of network communication patterns, identifying network anomaly points and discovering network attacks. Network security visualization technology is a very practical technology. It applies the visualization technology to the field of network security, transforms large network data into easy to understand visual images, uses human vision to obtain data model and structure, and constructs a bridge between security data and cognition. Visualization is popular in the field of network security. It is inevitable: the more data people need to screen, the more they want to transform the data into images, and to display the image and the text. Visualization becomes an important analytical tool, using it to visualize the patterns and rules displayed behind the security data, so as to help people to analyze the network status and deal with it. At the same time, visual analysis tools help us to better understand security data. It helps people to deal with data overload and save time. It also allows people to participate in data collection and analysis while informing people of information. This article is based on network security visibility. The web prototype system Nets.vis., which can complete from data processing to generated view, is a framework of hierarchical, flexible and lightweight network security data visualization framework. The system is used in this system. The server client structure, the client is rendered in the user's browser, the server side provides data acquisition, storage and analysis, and loading visual components.Nets.vis prototype system mainly consists of the following 7 layers: (1) data preprocessing layer. The main data is cleaned on the source data, dirty data, useless data, the wrong number The data import layer. (2) the data import layer. The layer is mainly responsible for importing the data in the MySQL database into the HDFS. (3) all the experimental data of the.Nets.vis prototype system of the data storage layer are kept in the HDFS. (4) the data management layer. The data warehouse data of the whole Nets.vis prototype system are managed by Hive, too. It is said that all data are output from the data storage layer to the data management layer in the form of Hive table. (5) data service layer. In this layer, various analysis and data mining are carried out based on data warehouse data according to the requirements of analysis. (6) data application layer. Data service layer data must be returned to relational database, which is due to Hi The high latency of VE execution is not suitable for generating the final visualization results. (7) the visualization layer. Users view the final visual results through the browser. The requirement function of the whole Nets.vis system can be summarized as data preprocessing, data import, data analysis, and generation view. The main research work is to be carried out from the following aspects. First, By deploying Hadoop system on the server of Linux system, the Hive data warehouse provided for the storage and management of large scale data can be used to store data. Sqoop can realize data transmission between MySQL and Hadoop in relational database. The data import, storage and related data analysis module of the server side in the study It is based on the Hadoop platform. Using Sqoop to import data from relational database MySQL into the data warehouse Hive, then lead the analysis results back to the MySQL database. The client uses Spring MVC to construct the Web end, and uses Bootstrap to optimize the visual interface of the prototype system. Secondly, because of the Nets.vis visualization in this article. In the prototype system, query and other operations are often involved, so it is very important to optimize the operation efficiency of the data analysis module of Hive. This paper uses the spatial sublinear algorithm to optimize the operation efficiency of data extraction, conversion, loading, query and so on. In this paper, the Misra-Gries algorithm for finding frequent elements is used to find out the results by calculation. The most frequent elements, such as finding frequent IP addresses in the network, estimate the number of different elements in the data stream using the number of algorithms that estimate the number of different elements, such as the number of access IP for a page. At the same time, the data analysis module uses Canopy clustering and K-means clustering to analyze the source IP. When selecting attribute dimensions in the data analysis module, this paper selects a common Pearson product distance correlation coefficient and correlation matrix in probability theory and statistics to verify the correlation between dimensions. Then, the main purpose of the visualization module of the Nets.vis prototype system is to screen the data set according to the user's wishes. In the module, this paper mainly uses two visual tools of Echarts and D3 to design visual components that conform to the network security data attributes, including bubble graph, Treemap, parallel coordinate diagram, relation diagram, bar graph, line diagram and rectangular thermal diagram. This paper designs and implements a visualization component rendering method based on SVG, which can make visual results More abundant and intuitive. At the same time, the Brich algorithm is used to improve the layout of the bubble graph. Finally, this paper uses the visual guide of "the first overall after details", selects some visual components in the Nets.vis prototype system, and uses the Tcp flow log data provided by the Vis China 2015 challenge to verify the feasibility of the Nets.vis system. One step, using hierarchical clustering improved bubble map, bar graph and relational graph, find the server and client in the network, excavate the network topology. Second steps, the server according to the protocol characteristics and time series characteristics are classified respectively. Third steps, digging network traffic characteristics. For the flow characteristics mining, this paper Considering the hierarchical attributes and temporal attributes of the network traffic data, the visualization of the whole time sequence characteristics of the data is realized by the fold line graph, and the network "holiday mode" and "working day mode" are found. The fourth step is to visualize the local time characteristics of the data with the tree graph, and find the specific host that produces the abnormal. The Nets.vis system is used to visualize the Tcp flow data set, and the network analysis from the whole to the local is realized. Through this system, the network service and the client can be determined, the server is classified, the network traffic pattern is identified and the network abnormality is found, which facilitates the analysis of the network management and the network security situation. Perception.
【学位授予单位】:兰州交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.08
【参考文献】
相关期刊论文 前7条
1 肖万武;向宁;;计算机网络安全可视化研究平台设计与实现[J];现代电子技术;2017年01期
2 李聪颖;王瑞刚;梁小江;;基于Hadoop的交互式大数据分析查询处理方法[J];计算机技术与发展;2016年08期
3 赵颖;王权;黄叶子;吴青;张胜;;多视图合作的网络流量时序数据可视分析[J];软件学报;2016年05期
4 张胜;施荣华;赵颖;;基于多元异构网络安全数据可视化融合分析方法[J];计算机应用;2015年05期
5 余长俊;张燃;;云环境下基于Canopy聚类的FCM算法研究[J];计算机科学;2014年S2期
6 赵颖;樊晓平;周芳芳;汪飞;张加万;;网络安全数据可视化综述[J];计算机辅助设计与图形学学报;2014年05期
7 孙大为;张广艳;郑纬民;;大数据流式计算:关键技术及系统实例[J];软件学报;2014年04期
相关博士学位论文 前2条
1 王怀晖;基于特征的复杂流场纹理可视化关键技术研究[D];国防科学技术大学;2015年
2 吕良福;DDoS攻击的检测及网络安全可视化研究[D];天津大学;2008年
相关硕士学位论文 前1条
1 冯琦森;基于出租车轨迹的居民出行热点路径和区域挖掘[D];重庆大学;2016年
,本文编号:2133572
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2133572.html