微博内容的采集、分析及其可视化研究
发布时间:2018-07-22 10:35
【摘要】:随着微博、微信等社交媒体的发展,智能终端的不断涌现,这些新兴事物在改变人们生活方式的同时,也带来了体积庞大、多重维度、非结构化的信息数据。多数研究者认为,这些数据是这个时代赐予的宝藏,面向数据科学的研究也愈演愈热。本文从三个方面论述了面向新浪微博数据的研究工作:首先是微博数据的采集,其次是基于用户微博数据的情感新词发现,最后是基于微博转发数据的传播网络可视化研究。(1)针对新浪微博数据采集方法,本文首先对比分析了两种不同的新浪微博模拟登录验证方式,分别探讨了两种方法的利弊。其次,在获取验证之后,介绍了新浪微博四类数据的采集过程,分别为用户个人信息,用户微博信息,用户关注列表和单条微博的转发和评论数据,为后续的研究奠定了语料基础。(2)针对用户的新浪微博数据,由于其口语化、非正式等特点,常常伴有大量情感未登录新词出现,本文基于用户的微博数据进行了词语级情感倾向性判断的研究。首先采用基于统计量的方法,识别微博语料中的新词,然后利用神经网络去训练语料中词语的词向量,获取词语之间的内在联系,最后提出了基于词向量的情感新词发现方法。从实验结果来看,本文的方法具有一定的实用价值。(3)针对新浪微博的转发数据,本文对单条微博的传播过程做了WEB可视化的分析。首先通过微博转发数据,构建传播网络。然后根据转发者个人信息数据,从三个方面:节点的筛选、层次化的信息展示以及交互式功能的设计论述了可视化的实现过程。通过可视分析的方式,简单、快速的找出微博传播过程中至关重要的节点,判断消息传播的影响范围。
[Abstract]:With the development of social media such as micro-blog and WeChat and the emergence of intelligent terminals, these new things have also brought large, multidimensional, unstructured information data while changing people's lifestyles. Most researchers believe that these data are the treasure of this time generation, and the more research of data science is becoming more and more the more. Heat. This paper discusses the research work of sina micro-blog data from three aspects: first, the acquisition of micro-blog data, the second is the discovery of emotional neologisms based on the user's micro-blog data, and the last is the research of the communication network visualization based on the micro-blog forwarding data. (1) in this paper, two kinds of methods are compared and analyzed in this paper. The advantages and disadvantages of the different Sina micro-blog simulation login verification methods are discussed respectively. Secondly, after obtaining the verification, the collection process of the four types of sina micro-blog data is introduced, which are user personal information, user micro-blog information, user attention list and the forwarding and comment data of single micro-blog, which have laid a corpus for subsequent research. (2) (2) according to the user's Sina micro-blog data, because of its colloquial and informal characteristics, it is often accompanied by a large number of unregistered words. This paper is based on the user's micro-blog data to make a study of the emotional tendency judgment of the word level. First, the method based on statistics is used to identify the new words in the micro-blog corpus, and then use the neural network. On the basis of the experimental results, the method has some practical value. (3) in view of the forwarding data of sina micro-blog, this paper makes a WEB visualization analysis of the transmission process of single micro-blog in this paper. First, the transmission network is constructed through the micro-blog forwarding data. Then according to the forwarder's personal information data, the visual realization process is discussed from three aspects: node selection, hierarchical information display and interactive function design. Through visual analysis, it is very simple and fast to find the vital part of the micro-blog communication process. Nodes determine the scope of the impact of message propagation.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP393.092;TP391.1
本文编号:2137157
[Abstract]:With the development of social media such as micro-blog and WeChat and the emergence of intelligent terminals, these new things have also brought large, multidimensional, unstructured information data while changing people's lifestyles. Most researchers believe that these data are the treasure of this time generation, and the more research of data science is becoming more and more the more. Heat. This paper discusses the research work of sina micro-blog data from three aspects: first, the acquisition of micro-blog data, the second is the discovery of emotional neologisms based on the user's micro-blog data, and the last is the research of the communication network visualization based on the micro-blog forwarding data. (1) in this paper, two kinds of methods are compared and analyzed in this paper. The advantages and disadvantages of the different Sina micro-blog simulation login verification methods are discussed respectively. Secondly, after obtaining the verification, the collection process of the four types of sina micro-blog data is introduced, which are user personal information, user micro-blog information, user attention list and the forwarding and comment data of single micro-blog, which have laid a corpus for subsequent research. (2) (2) according to the user's Sina micro-blog data, because of its colloquial and informal characteristics, it is often accompanied by a large number of unregistered words. This paper is based on the user's micro-blog data to make a study of the emotional tendency judgment of the word level. First, the method based on statistics is used to identify the new words in the micro-blog corpus, and then use the neural network. On the basis of the experimental results, the method has some practical value. (3) in view of the forwarding data of sina micro-blog, this paper makes a WEB visualization analysis of the transmission process of single micro-blog in this paper. First, the transmission network is constructed through the micro-blog forwarding data. Then according to the forwarder's personal information data, the visual realization process is discussed from three aspects: node selection, hierarchical information display and interactive function design. Through visual analysis, it is very simple and fast to find the vital part of the micro-blog communication process. Nodes determine the scope of the impact of message propagation.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP393.092;TP391.1
【参考文献】
相关期刊论文 前2条
1 王素格;李德玉;魏英杰;宋晓雷;;基于同义词的词汇情感倾向判别方法[J];中文信息学报;2009年05期
2 罗江华;;基于MD5与Base64的混合加密算法[J];计算机应用;2012年S1期
,本文编号:2137157
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2137157.html