当前位置:主页 > 管理论文 > 移动网络论文 >

基于Hadoop的网络节点行为分析

发布时间:2018-06-21 20:42

  本文选题:Hadoop + 大数据 ; 参考:《北京邮电大学》2015年硕士论文


【摘要】:近年来,随着互联网技术快速的普及和应用,网民数量持续上升,全国使用计算机办公的企业比例为93.1%,大部分企业走进了信息化的高速公路。互联网的普及程度越来越高,网络规模急剧扩大,网络节点随之迅速增多。互联网在促进信息交流,提供诸多便利的同时,也带来了一些新的问题。如在安全性方面存在诸多漏洞,面临种种网络攻击的威胁。因此,我们对网络节点行为进行研究分析具有重大的意义。 随着网络用户的不断增多,产生的网络流量急剧增加,对网络数据的存储和传输要求已经远远超过了传统数据库的处理能力。Apache的开源项目Hadoop是一个有效处理海量数据的分布式软件框架,能轻松实现大数据的分布式存储与计算。 本文首先介绍了网络节点行为分析的背景及意义,接着详细介绍了Hadoop技术和网络行为监测分析系统。。之后,本文根据网络会话的通信特点以及流量特征,提出一种新的网络会话重组方式—复合会话,这种会话能更详细的体现网络会话过程的会话特点和报文特征。复合会话进行采集和预处理,为本文的实验与分析提供了数据基础。本文以复合会话为实验数据,对网络节点的流量,访问用户数进行分析,揭示了网络节点的流量和用户访问数分布规律。针对原始K-means算法对初始聚类中心敏感,以及评价函数片面考虑簇内差异的缺陷,提出一种优化的初始簇中心选择方法和均衡化评价函数作为算法的改进。实验表明改进后的算法能有效消除聚类结果的不稳定性,提高了聚类的准确性,之后在Hadoop平台上对K-means进行分布式实现,完成网络节点聚类分析。本文最后使用ARIMA模型对网络节点的流量、访问用户数等参数进行预测,具有很好的预测效果。为了检测网络中的异常网络节点,本文克服以往异常检测算法的不足,提出了一种新的基于距离与阂值判定的异常检测算法,此算法具有快速高效、实时更新的特点,对异常网络节点的检测具有很好的效果,并在工程实践上具有很好的实用性
[Abstract]:In recent years, with the rapid popularization and application of Internet technology, the number of Internet users continues to rise, the proportion of enterprises using computer office in China is 93.1, most enterprises have entered the information highway. With the increasing popularity of the Internet, the scale of the network expands rapidly, and the number of network nodes increases rapidly. The Internet promotes information exchange, provides many conveniences, but also brings some new problems. For example, there are many vulnerabilities in security and face the threat of various network attacks. Therefore, it is of great significance to study and analyze the behavior of network nodes. With the increasing number of network users, the resulting network traffic increases dramatically. The requirement of network data storage and transmission has exceeded the processing ability of traditional database. Hadoop, an open source project of Apache, is a distributed software framework which can deal with massive data effectively, and can easily realize distributed storage and computation of big data. This paper first introduces the background and significance of network node behavior analysis, then introduces Hadoop technology and network behavior monitoring and analysis system in detail. Then, according to the communication characteristics and traffic characteristics of network sessions, this paper proposes a new network session reorganization method, compound session, which can reflect the conversation characteristics and packet characteristics of the network session process in more detail. The data base of the experiment and analysis is provided by the data acquisition and preprocessing of the composite session. Taking compound session as experimental data, this paper analyzes the traffic of network nodes and the number of users visited, and reveals the distribution of traffic and user visits of network nodes. In view of the original K-means algorithm is sensitive to the initial clustering center and the evaluation function considers the difference within the cluster unilaterally, an optimized initial cluster center selection method and an improved equalization evaluation function are proposed. Experiments show that the improved algorithm can effectively eliminate the instability of clustering results, improve the accuracy of clustering, and then implement K-means distributed on Hadoop platform to complete the clustering analysis of network nodes. In the end, Arima model is used to predict the network nodes' traffic, number of users and so on. In order to detect the abnormal network nodes in the network, this paper overcomes the shortcomings of the previous anomaly detection algorithms, and proposes a new anomaly detection algorithm based on distance and threshold decision, which has the characteristics of fast and efficient, real-time updating. It has a good effect on the detection of abnormal network nodes, and has good practicability in engineering practice.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP393.06

【参考文献】

相关期刊论文 前7条

1 唐俊;赵晓娟;;基于蜜罐的主动式网络安全防御联动模型[J];计算机安全;2009年05期

2 张毅;万里勇;;基于主动防御的蜜罐技术研究的综述[J];广西轻工业;2011年05期

3 刘劲松;;数据挖掘中的现代时间序列分析方法[J];信息技术;2007年07期

4 黄敏;何中市;邢欣来;陈英;;一种新的k-means聚类中心选取算法[J];计算机工程与应用;2011年35期

5 汪中;刘贵全;陈恩红;;一种优化初始中心点的K-means算法[J];模式识别与人工智能;2009年02期

6 仝雪姣;孟凡荣;王志晓;;对k-means初始聚类中心的优化[J];计算机工程与设计;2011年08期

7 白斌飞;晏正春;;ARIMA模型在移动通信用户数预测中的应用[J];统计教育;2007年05期



本文编号:2049942

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2049942.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户ea9b6***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com