Web可视化技术在数据挖掘中的研究与应用
发布时间:2018-08-03 08:09
【摘要】:近来,随着计算机软硬件的飞速发展以及互联网的普及,人们通过网络和各种移动设备所产生的数据量正在爆炸式的增长,可以说我们正生活在一个浩瀚无边的数据海洋中。因此如何从海量数据中迅速分析统计出有用的信息特征就显得特别重要,而互联网技术的蓬勃发展为这一问题的解决提供了有效的途径。数据挖掘应运而生,它是从大量的数据集中提取出潜在的、有价值的信息特征。可视化是把得到的信息特征转化为可视的表达形式的过程。在数据挖掘的理论研究与应用中,与可视化技术的结合就产生了另一个重要的研究方向可视化数据挖掘。利用可视化技术并结合人的视觉特点,把数据挖掘产生的信息以直观的形式展现给用户,从而使得挖掘结果更有价值,更易于理解。网络信息化时代产生的数据多是通过网络访问和网络用户行为所生成的,大多存储于网络信息平台的大型资源数据库中,这些数据往往是字段可变的而且数据格式多样,可以是文本、图像、声音、视频等,而对这些数据库日志文件和数据文件的分析和应用是电子商务数据流到信息流转换分析的关键。论文首先针对大数据的背景进行介绍,在此基础上提出了Web可视化和数据挖掘技术,并对可视化技术在数据挖掘中的研究现状与研究意义作了介绍,为研究可视化技术在数据挖掘中的应用奠定了基础。其次,深入探讨了Web可视化和数据挖掘相关技术,介绍了可视化的基本流程、常用的Web前端可视化工具库、多维标度算法、Hadoop分布式处理系统等,为论文的整体设计提供技术支持。最后,结合具体实例,深入剖析了可视化技术在数据挖掘中的应用。对于传统的多维属性变量数据,采用多维标度算法(MDS,multi-dimensional scaling),将多维属性变量简化到低维空间进行定位、分析,发掘数据的信息特征并可视化的展现数据结果。对于某商业论坛的日志数据,采用Hadoop海量数据处理系统,结合分布式文件系统(HDFS,Hadoop distributed file system)和Map/Reduce分布式计算模型,从该论坛的页面浏览量、注册用户数、独立IP数、跳出数等分析视角出发,构建可视化的数据模型,并设计了从后台数据统计挖掘到前端可视化显示的整体技术方案。
[Abstract]:Recently, with the rapid development of computer software and hardware and the popularity of the Internet, the amount of data generated by people through the network and various mobile devices is increasing explosively. It can be said that we are living in a vast ocean of data. Therefore, it is very important to quickly analyze and statistics useful information features from mass data, and the rapid development of Internet technology provides an effective way to solve this problem. Data mining emerges as the times require, it is to extract potential, valuable information features from a large number of data sets. Visualization is the process of transforming the obtained information features into visual representations. In the theoretical research and application of data mining, another important research direction is visual data mining. Using visualization technology and human visual characteristics, the information generated by data mining is presented to users in an intuitive form, which makes the mining results more valuable and easier to understand. Most of the data generated in the era of network information are generated by network access and network user behavior. Most of the data are stored in the large-scale resource database of the network information platform. These data are often variable in field and diverse in data format. It can be text, image, sound, video and so on, and the analysis and application of these database log files and data files is the key to the analysis of the conversion from electronic commerce data stream to information flow. Firstly, the paper introduces the background of big data, then puts forward Web visualization and data mining technology, and introduces the research status and significance of visualization technology in data mining. It lays a foundation for studying the application of visualization technology in data mining. Secondly, the related technologies of Web visualization and data mining are discussed in depth. The basic flow of visualization, the commonly used Web front-end visualization tool library, the multi-dimensional scaling algorithm and Hadoop distributed processing system are introduced. To provide technical support for the overall design of the paper. Finally, the application of visualization technology in data mining is analyzed with concrete examples. For the traditional multidimensional attribute variable data, the multi-dimensional scaling), (Multidimensional scaling algorithm) is used to simplify the multidimensional attribute variable to the low-dimensional space to locate, analyze, explore the information features of the data and visualize the data results. For the log data of a business forum, the Hadoop massive data processing system is adopted, and the distributed file system (HDFS) Hadoop distributed file system) and Map/Reduce distributed computing model are used to analyze the page views, the number of registered users and the independent IP number of the forum. The visual data model is constructed from the perspective of number analysis, and the overall technical scheme from the background data statistics mining to the front-end visual display is designed.
【学位授予单位】:湘潭大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13
本文编号:2161134
[Abstract]:Recently, with the rapid development of computer software and hardware and the popularity of the Internet, the amount of data generated by people through the network and various mobile devices is increasing explosively. It can be said that we are living in a vast ocean of data. Therefore, it is very important to quickly analyze and statistics useful information features from mass data, and the rapid development of Internet technology provides an effective way to solve this problem. Data mining emerges as the times require, it is to extract potential, valuable information features from a large number of data sets. Visualization is the process of transforming the obtained information features into visual representations. In the theoretical research and application of data mining, another important research direction is visual data mining. Using visualization technology and human visual characteristics, the information generated by data mining is presented to users in an intuitive form, which makes the mining results more valuable and easier to understand. Most of the data generated in the era of network information are generated by network access and network user behavior. Most of the data are stored in the large-scale resource database of the network information platform. These data are often variable in field and diverse in data format. It can be text, image, sound, video and so on, and the analysis and application of these database log files and data files is the key to the analysis of the conversion from electronic commerce data stream to information flow. Firstly, the paper introduces the background of big data, then puts forward Web visualization and data mining technology, and introduces the research status and significance of visualization technology in data mining. It lays a foundation for studying the application of visualization technology in data mining. Secondly, the related technologies of Web visualization and data mining are discussed in depth. The basic flow of visualization, the commonly used Web front-end visualization tool library, the multi-dimensional scaling algorithm and Hadoop distributed processing system are introduced. To provide technical support for the overall design of the paper. Finally, the application of visualization technology in data mining is analyzed with concrete examples. For the traditional multidimensional attribute variable data, the multi-dimensional scaling), (Multidimensional scaling algorithm) is used to simplify the multidimensional attribute variable to the low-dimensional space to locate, analyze, explore the information features of the data and visualize the data results. For the log data of a business forum, the Hadoop massive data processing system is adopted, and the distributed file system (HDFS) Hadoop distributed file system) and Map/Reduce distributed computing model are used to analyze the page views, the number of registered users and the independent IP number of the forum. The visual data model is constructed from the perspective of number analysis, and the overall technical scheme from the background data statistics mining to the front-end visual display is designed.
【学位授予单位】:湘潭大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13
【参考文献】
相关期刊论文 前3条
1 曹阳;高志远;杨胜春;姚建国;梁云;孙云枫;;云计算模式在电力调度系统中的应用[J];中国电力;2012年06期
2 张浩;郭灿;;数据可视化技术应用趋势与分类研究[J];软件导刊;2012年05期
3 徐戈;王厚峰;;自然语言处理中主题模型的发展[J];计算机学报;2011年08期
相关硕士学位论文 前2条
1 胡琴琴;基于Hadoop的数据可视化技术研究与应用[D];北方工业大学;2016年
2 姚银锋;基于Hadoop的应用可视化研究与实现[D];北京邮电大学;2015年
,本文编号:2161134
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2161134.html