当前位置:主页 > 科技论文 > 搜索引擎论文 >

互联网舆情监测分析系统的设计与实现

发布时间:2018-04-10 02:35

  本文选题:数据采集 切入点:Heritrix 出处:《北京交通大学》2017年硕士论文


【摘要】:我国互联网产业近十几年来发展极为迅猛,由于其具有传播速度快、受众基数大、内容覆盖广泛、社会动员能力强等优势,互联网开辟了新的社会舆论聚集地。但有些不法分子借助这种途径向广大人民群众传播虚假信息,散布反动性言论,并已造成恶劣影响。诸如此类问题都是在信息传播方式从传统媒体向新媒体转变过程中衍生出来的。因此加大对网络舆论的监管,增强政府对社会舆论正确导向的把控能力,对我国在新形势下的健康发展有着至关重要的作用。本文主要介绍互联网舆情监测分析系统的设计与实现过程。用户可通过本系统获取某自定义敏感事件的多维度舆情分析结果,如情感极性占比、事件随时间的发展趋势等,还可以对事件进行预警、生成报告等操作。要对舆论做到尽可能全面的监控,大规模数据的采集必不可少。本系统的数据来源包括新闻网站、移动新闻客户端、论坛等互联网媒体中的公开文本信息,数据采集模块在采用Heritrix爬虫框架的基础上进行扩展开发。爬虫模块具备近千个国内外站点信息的采集能力,并可形成标准格式化文件供数据分析程序使用。海量详情数据存储采用HBase非关系型数据库。高性能系统需要在尽可能短的时间内,准确返回用户想要的数据信息,这些都要依靠一个高效的搜索引擎。本文还将介绍搜索引擎Solr在系统文本搜索、海量数据统计中的应用。Solr是一个高效的数据检索工具,在整个互联网舆情监测分析系统中将承担十分重要的工作。论文在研究国内外数据采集和搜索引擎相关成果的基础上,借鉴成熟文本情感分析产品的特性,运用现代软件工程管理的基本思想,提炼各类用户故事后,形成了核心业务处理模型以及可推广的同类产品通用解决方案。本系统已成功上线进行商业运作,为各级政府部门提供了便捷高效的互联网舆情监控工具,打击了扰乱社会稳定的网络犯罪,推动了正能量信息的传播,为净化网络环境、抑制不良事件的发生做出了积极贡献。
[Abstract]:The Internet industry in China has developed very rapidly in the past ten years. Because of its advantages such as fast communication, large audience base, wide coverage of content and strong ability of social mobilization, the Internet has opened up a new gathering place of public opinion.But some lawless elements use this way to spread false information and reactionary statements to the broad masses of the people, and have caused adverse effects.Such problems are derived from the process of information dissemination from traditional media to new media.Therefore, strengthening the supervision of network public opinion and strengthening the government's ability to control the public opinion correctly is of vital importance to the healthy development of our country under the new situation.This paper mainly introduces the design and implementation of Internet public opinion monitoring and analysis system.Through this system, users can obtain the results of multi-dimensional public opinion analysis of a self-defined sensitive event, such as the proportion of emotional polarity, the development trend of events with time, and so on.To monitor public opinion as comprehensively as possible, large-scale data collection is essential.The data sources of the system include news website, mobile news client, forum and other Internet media open text information. The data acquisition module is developed on the basis of Heritrix crawler framework.The crawler module has the ability of collecting information of nearly 1,000 domestic and foreign sites, and can form standard format files for data analysis program.HBase non-relational database is used to store mass detail data.High-performance systems need to return the data information users want in as short a time as possible, all of which depend on an efficient search engine.This paper also introduces the application of search engine Solr in the system text search and mass data statistics. Solr is an efficient data retrieval tool, which will undertake very important work in the whole Internet public opinion monitoring and analysis system.Based on the research of domestic and foreign data acquisition and search engine related achievements, this paper draws lessons from the characteristics of mature text emotional analysis products and extracts various user stories by using the basic idea of modern software engineering management.Formed the core business processing model and general-purpose solutions for similar products.The system has been successfully put online for commercial operation, providing government departments at all levels with convenient and efficient Internet public opinion monitoring tools, cracking down on network crimes that disrupt social stability, promoting the dissemination of positive energy information, and purifying the network environment.Restraining the occurrence of adverse events has made a positive contribution.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP274;TP391.3

【相似文献】

相关期刊论文 前10条

1 ;2008年3月互联网舆情分析报告[J];今传媒;2008年05期

2 许鑫;章成志;;互联网舆情分析及应用研究[J];情报科学;2008年08期

3 魏丽萍;;互联网舆情形成机制探析[J];潍坊学院学报;2010年01期

4 陈永刚;孙卉W,

本文编号:1729304


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1729304.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户9865c***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com