当前位置:主页 > 科技论文 > 软件论文 >

汉越双语新闻观点句抽取及分析方法研究

发布时间:2018-04-26 19:26

  本文选题:跨语言 + 观点分析 ; 参考:《昆明理工大学》2017年硕士论文


【摘要】:越南是我国的重要邻国之一,与我国在政治、经济、军事及文化上有着紧密的联系,分析及掌握两国新闻的舆情动向有着重要的作用。然而,互联网上存在着海量的新闻文本,人工对其进行分析和总结费时又费力。因此,研究能够自动化分析汉越双语新闻文本的方法具有重大的意义和价值。新闻文本主要包含对已经发生的客观事实的描述和对客观事实的主观评判两部分,其中客观事实代表性的词汇称为新闻要素,如“人名,地名,机构名”,主观评判代表性的词汇称为情感词,如“意义,影响,赞扬”等。基于此,本文将新闻要素关联及情感关联融入图模型,研究基于图模型的观点句抽取方法,在此基础上研究差异摘要生成方法,并进一步分析观点句的情感倾向性,主要完成了以下研究工作:(1)基于要素关联及情感关联的汉越双语新闻文本观点句抽取方法不论何种语言的新闻均包含新闻要素和情感词,结合这个特点提出了一种基于要素关联及情感关联的观点句抽取方法。首先,根据句子包含的要素及情感信息对句子进行关联分析,构建句子关联图模型。然后,根据要素关联强度和情感关联强度计算图模型中边的权重,对图模型进行求解,实现观点句的抽取。(2)基于图模型排序汉越双语新闻的差异观点摘要抽取方法对于相同的事件,越南语新闻和汉语新闻表达的观点不尽相同,与相同的观点相比,不同的观点具有更大的价值。为了提取出汉越双语新闻所表达的具有差异的观点,在第一个研究工作的基础上,进一步研究了一种基于无向图模型的差异观点生成方法。该方法根据机器翻译搭建不同语言之间的桥梁,首先,计算汉语和越南语新闻句之间的相似度,根据相似度对双语句子进行过滤。然后,以过滤后的句子为节点构建无向图模型,相同语言节点之间边的权值为相似度,不同语言节点之间边的权值为差异度。最后,根据边的权值结合随机游走算法计算节点的权重,抽取权重高的句子作为差异摘要句。(3)基于卷积神经网络的汉越双语新闻观点句情感倾向性判别方法为了进一步分析观点句的情感倾向性,研究了一种基于卷积神经网络的跨语言情感倾向性判别方法。该方法与传统方法相比,不需要构建情感词典,或者进行复杂的特征提取工作。为了使用卷积神经网络解决汉越双语新闻句的情感倾向性判别问题,首先,收集大量汉语和越南语未标注语料,分别训练汉语词向量和越南语词向量。然后,对于汉语句子,利用机器翻译将其翻译到越南语进行处理,以此解决越南语语料匮乏的问题。最后,将句子的汉语词向量和越南语词向量作为不同的channel输入卷积神经网络模型进行训练,实现跨语言情感倾向性的判别。
[Abstract]:Vietnam is one of the important neighbors of our country, and has close relation with our country in politics, economy, military and culture. It plays an important role in analyzing and mastering the public opinion trend of the news of the two countries. However, there is a huge amount of news text on the Internet, which is time-consuming and laborious to analyze and summarize manually. Therefore, it is of great significance and value to study the automatic analysis of Chinese-Vietnamese bilingual news texts. The news text mainly includes two parts: the description of the objective facts that have taken place and the subjective judgment of the objective facts, in which the words representative of the objective facts are called news elements, such as "name of person, place name, agency name". The representative words of subjective judgment are called affective words, such as meaning, influence, praise and so on. Based on this, this paper integrates news element association and affective association into graph model, studies the method of extracting opinion sentence based on graph model, and then studies the method of generating differential summary, and further analyzes the affective tendency of opinion sentence. The main work of this paper is as follows: (1) A method of extracting Chinese and Vietnamese bilingual news text viewpoint sentences based on element relevance and emotional relevance is presented. The news in any language contains news elements and affective words. According to this feature, a method of extracting opinion sentences based on element association and emotional correlation is proposed. Firstly, according to the elements and affective information contained in the sentence, the sentence association analysis is carried out, and the sentence association graph model is constructed. Then, the graph model is solved by calculating the weights of the edges in the graph model according to the factor association strength and emotional association strength. Abstract extraction of Chinese and Vietnamese Bilingual Journalism based on Graph Model for the same event, Vietnamese news and Chinese news expressed different views, compared with the same view. Different views have greater value. In order to extract the different viewpoints expressed by Chinese and Vietnamese bilingual news, a method of generating difference views based on undirected graph model is further studied on the basis of the first research work. The method builds bridges between different languages according to machine translation. Firstly, the similarity between Chinese and Vietnamese news sentences is calculated, and the bilingual sentences are filtered according to the similarity. Then, the undirected graph model is constructed with filtered sentences as nodes. The weights of edges of the same language nodes are similar, and the weights of edges of different language nodes are different. Finally, according to the weight of the edge and the random walk algorithm, the weight of the node is calculated. Abstract sentences with high weight as abstracts of differences. 3) based on convolutional neural network, the discriminant method of emotional tendency of Chinese and Vietnamese bilingual news opinion sentences is presented in order to further analyze the affective tendency of opinion sentences. A method of discriminating cross-language affective tendency based on convolution neural network is studied. Compared with the traditional method, this method does not need to construct emotion dictionary or perform complex feature extraction. In order to use convolutional neural network to determine the emotional orientation of Chinese-Vietnamese bilingual news sentences, a large number of untagged Chinese and Vietnamese language data were collected to train Chinese word vectors and Vietnamese word vectors respectively. Then, Chinese sentences are translated into Vietnamese by machine translation to solve the problem of lack of Vietnamese corpus. Finally, the Chinese word vector and the Vietnamese word vector of the sentence are trained as different channel input convolution neural network models to distinguish the cross-language affective tendency.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1

【参考文献】

相关期刊论文 前1条

1 田久乐;赵蔚;;基于同义词词林的词语相似度计算方法[J];吉林大学学报(信息科学版);2010年06期

相关博士学位论文 前1条

1 刘楠;面向微博短文本的情感分析研究[D];武汉大学;2013年

相关硕士学位论文 前1条

1 许力波;产品评价对象与情感词搭配关系的抽取[D];北京邮电大学;2013年



本文编号:1807356

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1807356.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户738e2***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com