微博客数据分析系统的设计与实现
发布时间:2018-06-20 17:47
本文选题:微博客 + 微博爬虫 ; 参考:《北京邮电大学》2014年硕士论文
【摘要】:微博客,简称微博,作为一种基于互联网技术的应用,其用户数量持续不断增长,呈现出爆发式增长的趋势。微博用户通过主动地“发布”和“转发”信息,能够使信息在极短的时间内获得最大的传播效果。微博的快速发展,产生了大量的微博相关的数据,在这些数据中隐藏着巨大的价值,但是目前对于微博数据的获取和数据分析,以及分析结果展示的相关技术仍然不够完善,不能够有效的获取和分析微博数据,数据分析结果展示方式较为单一。本文首先对微博及特点进行分析,重点研究分析新浪微博平台的特点;其次,对微博数据获取方法进行研究,设计实现一种针对新浪微博平台的,基于模拟登陆的微博爬虫;最后对微博数据的分析方法及结果展示进行研究,针对微博数据设计有效的分析方法,并且对分析结果设计直观,美观,交互的展示方式。本文的具体工作如下: 1)研究微博的概念,特点和主要应用。新浪微博作为本文的研究重点,文中针对新浪微博的特点进行了研究分析。 2)研究微博数据获取方法,分析基于微博API接口的数据获取方法,明确该方法存在的限制。同时对传统网络爬虫及其方法进行介绍。 3)设计针对新浪微博的微博数据获取系统,包括微博数据获取系统需求分析,数据库设计,微博爬虫设计。微博爬虫设计包括微博模拟登陆,网页数据提取和不同类型的微博数据获取方法的设计。 4)设计微博消息数据分析系统,包括微博消息分析系统需求分析,分析系统数据库设计,以及数据分析方法设计。本文设计的数据分析方法包括微博消息关键词提取;微博消息传播分析:微博受众分析和关键转发者发现;微博水军用户检测方法。 5)设计B/S架构的微博数据分析展示平台,该平台采用HTML5与JSP相结合的技术,将数据分析结果以网页的形式进行展示。 本文设计的微博客数据分析系统,能够有效地获取微博数据,对微博消息数据进行分析,并将分析结果以美观和新颖的方式在微博数据分析展示平台上进行展示,平台具有较好的用户交互性。
[Abstract]:As a kind of application based on Internet technology, micro-blog, referred to as Weibo, has a trend of explosive growth in the number of users. Weibo users can actively "publish" and "forward" the information in a very short period of time to obtain the maximum effect of dissemination. With the rapid development of Weibo, a large number of Weibo related data are produced, and there is great value hidden in these data. But at present, the acquisition and analysis of Weibo data, as well as the related techniques to display the analysis results, are still not perfect. Can not effectively obtain and analyze Weibo data, data analysis results show a single way. In this paper, Weibo and its characteristics are analyzed, and the characteristics of Sina Weibo platform are studied. Secondly, the method of Weibo data acquisition is studied, and a Weibo crawler based on simulated landing is designed and implemented for Sina Weibo platform. Finally, the analysis method and result display of Weibo data are studied, and an effective analysis method is designed for Weibo data, and an intuitive, beautiful and interactive display method is designed for the analysis result. The main work of this paper is as follows: 1) the concept, characteristics and main applications of Weibo are studied. Sina Weibo is the focal point of this paper. According to the characteristics of Sina Weibo, this paper studies the method of Weibo data acquisition, analyzes the method of data acquisition based on Weibo API interface, and clarifies the limitations of this method. At the same time, the traditional web crawler and its methods are introduced. 3) the Weibo data acquisition system for Sina Weibo is designed, including the requirement analysis of Weibo data acquisition system, database design and Weibo crawler design. Weibo crawler design includes the design of Weibo simulation landing, web page data extraction and different types of Weibo data acquisition methods. 4) the design of Weibo message data analysis system, including Weibo message analysis system requirements analysis, Analysis system database design, and data analysis method design. The data analysis methods designed in this paper include Weibo message keyword extraction, Weibo message dissemination analysis, Weibo audience analysis and key retweeter discovery. 5) A Weibo data analysis and display platform based on B / S architecture is designed. The platform adopts the technology of HTML5 and JSPs to display the data analysis results in the form of web pages. The microblog data analysis system designed in this paper can effectively obtain Weibo data, analyze the Weibo message data, and display the analysis results on the Weibo data analysis and display platform in a beautiful and novel way. The platform has good user interaction.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前4条
1 方俊;郭雷;王晓东;;基于语义的关键词提取算法[J];计算机科学;2008年06期
2 王晶;朱珂;汪斌强;;基于信息数据分析的微博研究综述[J];计算机应用;2012年07期
3 许晔;;微博——正在改变世界的创新应用[J];中国科技论坛;2012年08期
4 王淼;刘友华;;微博客的情报特征及其获取方法[J];现代情报;2013年01期
,本文编号:2045159
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2045159.html