专题新闻分析系统的研究与实现
发布时间:2018-06-26 07:43
本文选题:自然语言处理 + 情感分析 ; 参考:《北京邮电大学》2016年硕士论文
【摘要】:新闻分析的研究是一个交叉性的前沿学科,具有广泛的应用前景。新闻分析的研究会促进自然语言处理技术的发展,同时也将在信息检索、决策支持、文本挖掘等应用领域发挥积极的作用。互联网媒体有着大量的新闻稿件,通过该课题的研究,可以很好的了解在专题报道中新闻稿件专题所持的态度,民众关心该专题的热度,不同地区的民众对该专题的参与程度,以及与该专题所涉及到的不同角度的问题,该专题新闻之间的关联关系等,都是及时了解及观察民众舆情的一个重要窗口,为决策部门制定相关政策提供一定的参考,具有十分重要的实用价值与社会效益。本论文的研究将借助于自然语言处理与数据挖掘技术,进行人物、地点、机构、关键词的重要内容提取和排序;借助于情感分析技术,完成专题情感分析;正负面新闻分析及专题新闻基于时间的热度分析;稿件数量时间轴分布分析,并构建专题新闻文本集信息可视化的概念模型和展现模型,对关键内容进行可视化展现。本论文的主要研究工作有以下四点:第一、新闻情感分析技术的研究与实现。本文对情感分析的研究作用于新闻层面,针对新闻文本的情感分析可以了解媒体以及领域专家对某个事件或国家政策的态度。方便人民大众做出自己的判断。对于新闻情感分析算法的研究,本文融合了自然语言处理、数据挖掘等算法。第二、实体提取算法的研究与实现。在本论文中,我们主要把CRF算法用在实体提取领域。通过训练模型自动识别出人名、地名和组织机构名三种实体。通过模板匹配的方式识别时间和日期实体。第三、关键词抽取算法的研究与实现。我们将LDA算法作用于专题新闻的关键词抽取技术。同时考虑中文的语义表达提出了组合词生成算法,主要解决现有分词系统的局限性。最后,本文的研究依托实体提取算法、关键词提取算法和情感分析算法,将多维数据展示技术相结合,设计并实现了专题新闻分析系统。此系统可以展示某一专题下出现次数最多的实体以及和实体相关的新闻文章;可以展示某一专题下最能表现主题思想的关键词;可以将某一专题新闻的情感分析结果以柱状图的形式展现出来;可以展现以时间为线索的新闻热点关注。本论文的研究有以下创新之处:在情感分析算法中,对于新闻标题和新闻正文采取了不同的算法。标题情感分析中加入了中性识别算法。正文情感分析加入了主观句识别和主体词识别算法。
[Abstract]:The research of news analysis is an intersecting frontier subject, which has a wide application prospect. The research of news analysis will promote the development of natural language processing technology, and will also play an active role in information retrieval, decision support, text mining and other applications. Internet media have a large number of news articles. Through the research on this topic, we can have a good understanding of the attitude of news articles in special reports, the public's concern about the heat of the topic, and the degree of participation of people from different regions in this topic. As well as the problems from different angles involved in the topic, the relationship between the news of the topic, and so on, it is an important window for timely understanding and observation of public opinion, which provides a certain reference for the policy-making departments to formulate relevant policies. It has very important practical value and social benefit. The research of this paper will use natural language processing and data mining technology to extract and sort the important contents of people, places, institutions and keywords, and complete the thematic emotional analysis with the help of affective analysis technology. The analysis of positive and negative news and the time-based heat analysis of special news, the analysis of time axis distribution of contribution quantity, and the construction of conceptual model and presentation model of information visualization of feature news text set, to visualize the key content. The main research work of this thesis is as follows: first, the research and realization of news emotion analysis technology. In this paper, the research of emotional analysis plays an important role in news. The emotional analysis of news texts can understand the attitude of media and field experts to a certain event or national policy. It is convenient for the masses to make their own judgments. For the research of news emotion analysis algorithm, this paper combines natural language processing, data mining and other algorithms. Second, the research and implementation of entity extraction algorithm. In this paper, we mainly use CRF algorithm in the field of entity extraction. Three kinds of entities are identified automatically by training model, such as name of person, place name and organization name. Identify time and date entities by template matching. Third, the research and implementation of keyword extraction algorithm. We apply LDA algorithm to keyword extraction technology of special news. At the same time, considering the semantic expression of Chinese, a combinatorial word generation algorithm is proposed, which mainly solves the limitations of existing word segmentation systems. Finally, based on entity extraction algorithm, keyword extraction algorithm and emotion analysis algorithm, a special news analysis system is designed and implemented by combining multidimensional data display technology. The system can display the most frequently appeared entities under a particular topic and the news articles related to entities, and the key words that can best express the theme thought under a given topic. The emotional analysis results of a particular news can be shown in the form of a histogram, and the news hot spots with time as a clue can be displayed. The research of this thesis has the following innovations: in the affective analysis algorithm, different algorithms are adopted for news headlines and news text. The neutral recognition algorithm is added to the title emotional analysis. Subjective sentence recognition and subject word recognition are added to the text affective analysis.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1
【参考文献】
相关期刊论文 前5条
1 闻彬;何婷婷;罗乐;宋乐;王倩;;基于语义理解的文本情感分类方法研究[J];计算机科学;2010年06期
2 叶强;张紫琼;罗振雄;;面向互联网评论情感分析的中文主观性自动判别方法研究[J];信息系统学报;2007年01期
3 郑家恒,卢娇丽;关键词抽取方法的研究[J];计算机工程;2005年18期
4 郑家恒,张辉;基于HMM的中国组织机构名自动识别[J];计算机应用;2002年11期
5 李建华,王晓龙;中文人名自动识别的一种有效方法[J];高技术通讯;2000年02期
相关博士学位论文 前2条
1 张莹;在线新闻评论的情感分析研究[D];南开大学;2013年
2 万源;基于语义统计分析的网络舆情挖掘技术研究[D];武汉理工大学;2012年
相关硕士学位论文 前2条
1 于墨;基于情感分析的新闻浏览平台关键技术研究[D];哈尔滨工业大学;2011年
2 史海峰;基于CRF的中文命名实体识别研究[D];苏州大学;2010年
,本文编号:2069700
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2069700.html