基于日志数据的域名访问源多尺度分析
发布时间:2018-06-18 08:18
本文选题:cn域名 + 日志数据 ; 参考:《南京师范大学》2013年硕士论文
【摘要】:互联网作为全球性的信息网络对人们的生产和生活产生着深刻影响,在用户从互联网上获取信息的同时,服务器会对用户的访问行为进行相应的记录,生成互联网日志数据。互联网日志数据蕴含了大量的空间信息。目前,对于互联网日志数据的研究主要是利用数据挖掘、机器学习等技术对用户访问行为和系统安全性等进行分析和监测,而在空间层面上的研究还相对较少。对互联网日志数据的研究与地理空间相结合,利用地理信息系统中的技术手段和方法对互联网日志数据进行空间上的挖掘,可以有效的获取其隐含的空间规律,在为互联网日志数据的分析提供了更宽的视野和角度的同时,也可促进以信息流为研究对象的信息地理学研究,具有重要的理论意义和实践价值。 本文以从中国互联网络信息中心获取的连续24小时的cn域名服务器日志数据为研究基础数据,针对cn域名服务器日志数据的特点,实现对cn域名服务器日志数据的海量数据处理、地理编码和空间化,并从全球、地区和局域三个尺度对其进行空间分析,旨在理论上为信息地理学拓展研究领域,在技术上构建网络日志数据处理和空间化表达的技术体系,在实践上为网络基础设施建设提供参考。主要研究内容及结论如下: (1)对信息地理学、互联网DNS服务以及属性数据空间化的相关理论体系进行梳理。总结了信息地理学的内涵以及研究方向。归纳了互联网DNS服务的概念、体系结构以及工作原理,指出域名服务器日志数据具有结构化和海量性的特点。概括了属性数据空间化的涵义和基本方法。在理论梳理的基础之上,构建顶级域名网络访问空间特征分析的理论体系。 (2)针对cn域名服务器日志数据结构化、海量性的特点以及传统地理信息系统在海量数据处理方面的不足,构建了海量数据处理框架,实现对日志数据的高效处理。经过对日志数据中地理信息的提取和对访问量空间分布的影响因素的分析,设计了cn域名服务器日志数据的空间化方法并对其进行实现。 (3)从全球、地区以及局域三个尺度对cn域名服务器日志数据进行了空间分析,揭示了cn域名下网络服务的访问量在空间上的分布格局。探索了这种分布格局形成的原因以及反映出的问题。这些分析结果将为未来中文网络在中国乃至全球的发展提供参考和指导。
[Abstract]:As a global information network, the Internet has a profound impact on people's production and life. When users obtain information from the Internet, the server records the user's access behavior and generates the Internet log data. Internet log data contains a lot of spatial information. At present, the research on Internet log data mainly uses data mining, machine learning and other technologies to analyze and monitor user access behavior and system security. Combining the research of Internet log data with geographical space, and using the technical means and methods of GIS to mine the Internet log data in space, we can effectively obtain its implicit spatial rule. It not only provides a wider perspective and angle for the analysis of Internet log data, but also promotes the study of information geography with information flow as its research object, which has important theoretical significance and practical value. In this paper, the basic data of CN domain name server log data obtained from China Internet Network Information Center for 24 hours are studied, aiming at the characteristics of CN domain name server log data. To realize the massive data processing, geo-coding and spatialization of CN domain name server log data, and to carry out spatial analysis from three scales of global, regional and local, in order to expand the research field of information geography theoretically. The technical system of web log data processing and spatial expression is constructed in technology, and the reference is provided for the construction of network infrastructure in practice. The main research contents and conclusions are as follows: 1) combing the related theories of information geography, Internet DNS service and attribute data spatialization. The connotation and research direction of information geography are summarized. This paper summarizes the concept, architecture and working principle of Internet DNS service, and points out that the domain name server log data has the characteristics of structure and magnanimity. The meaning and basic method of attribute data spatialization are summarized. On the basis of theoretical combing, this paper constructs a theoretical system for analyzing the spatial characteristics of top-level domain name (TLDN) access space. (2) aiming at the structure of CN domain name server log data, Based on the characteristics of magnanimity and the shortcomings of traditional GIS in mass data processing, a massive data processing framework is constructed to achieve efficient processing of log data. After extracting the geographical information from the log data and analyzing the factors affecting the spatial distribution of the access amount, the spatial method of CN domain name server log data is designed and realized. The spatial analysis of CN domain name server log data is carried out at the three scales of region and local area, and the spatial distribution pattern of the traffic volume of network service under CN domain name is revealed. The reasons for the formation of this distribution pattern and the problems reflected are explored. These results will provide reference and guidance for the future development of Chinese language network in China and even the world.
【学位授予单位】:南京师范大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP311.13;P208
【参考文献】
相关期刊论文 前5条
1 黄莹;包安明;陈曦;刘海隆;杨光华;;基于绿洲土地利用的区域GDP公里格网化研究[J];冰川冻土;2009年01期
2 甄峰;信息时代新空间形态研究[J];地理科学进展;2004年03期
3 汪明峰,宁越敏;互联网与中国信息网络城市的崛起[J];地理学报;2004年03期
4 季成;李晓东;袁坚;尉迟学彪;山秀明;;基于k-means算法的DNS查询模式分析[J];清华大学学报(自然科学版);2010年04期
5 蔡俊;宋顺林;;基于Web日志的频繁偏爱路径挖掘算法[J];计算机工程与设计;2009年24期
,本文编号:2034808
本文链接:https://www.wllwen.com/kejilunwen/dizhicehuilunwen/2034808.html