互联网舆情信息采集分析系统关键技术研究
发布时间:2018-04-14 20:04
本文选题:舆情 + 网络爬虫 ; 参考:《天津大学》2012年硕士论文
【摘要】:在当前Internet网络环境日趋复杂的条件下,网络舆情已经对社会的稳定和众多上网的人们产生了重大的影响。网络舆情发生的范围广,传播的速度快,并且舆情的爆发点具有不易发现和控制等特点,这使得对互联网中舆情信息采集和分析变得非常重要。 本文对互联网中舆情信息采集系统的需求进行深入分析,然后将网络拓扑和基于关键字网页内容过滤技术以及广度优先搜索技术设计并实现了一个面向舆情信息采集的垂直搜索引擎爬虫,并采用分词和主题词抽取方法分析出相应的热点舆情专题,并实现对突发舆情事件、涉及内容安全的敏感话题及时发现与预警,通过机器自动识别本地区的突发舆情,同时设计并实现了一种舆情报告半自动生成系统的算法,将检索的结果数据依据关键词的频率、权重,网页类别,网页内容预警,网页热度进行相关指标进行排序,半自动生成舆情简报。 该系统实现了对新闻网站、论坛网站、博客和贴吧等网站的舆情信息的有效采集,,并能实现对采集结果进行统计分析、主题分析,实现舆情报告的半自动输出。
[Abstract]:With the increasing complexity of Internet network environment, network public opinion has had a great impact on social stability and many Internet users.The network public opinion has a wide range of occurrence, the speed of dissemination is fast, and the burst point of public opinion is difficult to find and control, which makes the collection and analysis of public opinion information in the Internet become very important.In this paper, the requirements of the public opinion information collection system in the Internet are deeply analyzed.Then we design and implement a vertical search engine crawler based on Web topology, keyword based content filtering technology and breadth-first search technology, which is oriented to the collection of public opinion information.And using word segmentation and theme word extraction method to analyze the corresponding hot topic of public opinion, and realize the emergency public opinion event, the sensitive topic related to the content security timely discovery and early warning, through the machine automatic identification of the sudden public opinion in the region,At the same time, an algorithm of semi-automatic generation system of public opinion report is designed and implemented. The result data are sorted according to the frequency, weight, category, early warning and heat of the page.Semi-automatic generation of public opinion briefings.The system realizes the effective collection of public opinion information of news website, forum website, blog and post bar, and can realize the statistical analysis of collection result, theme analysis and semi-automatic output of public opinion report.
【学位授予单位】:天津大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP393.09
【参考文献】
相关期刊论文 前2条
1 曾润喜;;网络舆情管控工作机制研究[J];图书情报工作;2009年18期
2 丁振国;吴宝贵;辛友强;;基于Bloom Filter的大规模网页去重策略研究[J];现代图书情报技术;2008年03期
本文编号:1750812
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1750812.html