山东省科学院舆情系统的设计与实现
发布时间:2018-06-07 08:13
本文选题:网络舆情 + 网络爬虫 ; 参考:《济南大学》2014年硕士论文
【摘要】:时至今日,现代科技的迅速发展,迎来了互联网时代,互联网已经普及到社会大众,随着人们接触网络,越来越多的人会从网络中查找自己感兴趣的事或人,并且发表自己对某些事和人的看法。因此,网络舆情成为了新型的社会舆论的一种重要表现形式。而建立了解网络舆情的舆情系统也成为社会发展的一种必不可少的重大课题。为此,,山东省科学院舆情系统基于抓取有关其内容并进行处理,对于其不利因素能够尽量避免从而达到提升自身发展的空间并方便其有关部门能够更好的了解目前关于科学院的现状。 本文主要工作包括:山东省科学院系统框架构建及运行环境和开发环境的搭建、叙述Nutch抓取数据信息的工作原理、数据信息的采集技术、文本内容数据预处理。 (1)叙述本课题网络舆情研究的背景及意义,介绍网络舆情对社会发展的重要意义,以及目前对舆情系统的设计必不可少的原因并分析目前网络舆情的研究现状。 (2)研究网络挖掘数据的Nutch采集技术、网络爬虫的工作原理。 (3)实现系统框架的搭建:其一、运行环境的搭建是将Nutch编译后的文件导入到Cygwin的模拟环境中;其二、开发环境的搭建基于Eclipse下将Nutch源码导入并进行编译,并配置其文件。 (4)舆情系统的数据信息处理:对采集的信息进行网页的逻辑结构分析,并对抓取的数据进行信息净化、中文分词、文本聚类等。 最后通过对山东省科学院舆情系统的整体分析来确定整体系统的架构,并实现舆情系统。整个系统是将获取的信息内容进行分析,通过对网页数据信息的净化、中文分词、文本聚类等等处理技术来实现系统的关键内容。
[Abstract]:Today, with the rapid development of modern science and technology, the Internet has been popularized to the public. With the contact of people with the Internet, more and more people will look up the things or people they are interested in. And express their views on certain things and people. Therefore, network public opinion has become a new type of public opinion an important form of expression. Establishing a public opinion system to understand network public opinion has become an indispensable and important issue in social development. For this reason, the public opinion system of Shandong Academy of Sciences is based on grabbing and processing its contents. To its disadvantage factors can avoid as much as possible to achieve the space of improving their own development and facilitate the relevant departments to better understand the current status of the Academy of Sciences. The main work of this paper is as follows: the construction of system framework and the operating environment and development environment of Shandong Academy of Sciences, the working principle of Nutch capture data information, the collection technology of data information, and the preprocessing of text content data. 1) narrate the background and significance of the research on network public opinion, introduce the significance of network public opinion to social development, and the essential reasons for the design of network public opinion system, and analyze the current research situation of network public opinion. This paper studies the Nutch acquisition technology of network mining data and the working principle of network crawler. First, the running environment is to import the Nutch compiled files into the simulation environment of Cygwin; secondly, the development environment is based on the Eclipse to import and compile the Nutch source code, and configure its files. 4) data information processing of public opinion system: the logical structure of the collected information is analyzed, and the captured data is purified, Chinese word segmentation, text clustering and so on. Finally, through the overall analysis of the public opinion system of Shandong Academy of Sciences to determine the overall system structure, and realize the public opinion system. The whole system is to analyze the content of the obtained information, through the purification of web page data information, Chinese word segmentation, text clustering and other processing techniques to achieve the key content of the system.
【学位授予单位】:济南大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP311.52;TP393.09
【参考文献】
相关期刊论文 前7条
1 姜胜洪;;我国网络舆情的现状及其引导[J];广西社会科学;2009年01期
2 吴丽辉 ,王斌 ,余智华;一种通用Web信息采集系统的设计与实现[J];计算机工程;2005年03期
3 姚天f ;娄德成;;汉语语句主题语义倾向分析方法的研究[J];中文信息学报;2007年05期
4 洪宇;张宇;刘挺;李生;;话题检测与跟踪的评测及研究综述[J];中文信息学报;2007年06期
5 王明文;付剑波;罗远胜;陆旭;;基于协同聚类的两阶段文本聚类方法[J];模式识别与人工智能;2009年06期
6 苟元琴;;聚类分析在图书馆馆藏书目中的挖掘与应用[J];内蒙古科技与经济;2009年13期
7 陈晓云;陈垎;王雷;李荣陆;胡运发;;基于分类规则树的频繁模式文本分类[J];软件学报;2006年05期
本文编号:1990490
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1990490.html