基于表示学习的虚假信息检测研究

发布时间:2018-07-12 10:39

  本文选题:虚假信息检测 + 矛盾检测 ; 参考:《哈尔滨工业大学》2017年博士论文


【摘要】:文本虚假信息检测是自然语言处理领域的热点问题之一,其目的在于从网络文本中识别、过滤不真实或不正确的信息。虚假信息检测研究的意义在于识别虚假信息及不可靠的信息源,避免人们在认识事物及购物消费时受到虚假信息的误导。网络中的文本信息可以分为客观信息及主观信息。客观信息是指对事物的客观描述,其真实信息在内容上具有唯一性,与真实信息相矛盾的信息即为虚假信息。主观信息指人们的主观感受或经历体验,其真实信息在内容上不具有唯一性。主观信息的真实性与虚假性的区别在于是否来自于真实的用户经历。针对客观信息及主观信息的虚假信息检测问题,研究者需要针对二者不同的特点分别进行检测技术的研究。当该问题的研究场景中还包括信息源、用户反馈等外部数据时,结合外部数据检测虚假信息也是有价值的研究方向。在虚假信息检测问题中,一个核心问题是如何对虚假信息的文本内容及外部数据进行有效地表示。基于机器学习的方法在虚假信息检测任务上应用最为广泛,而特征表示在很大程度上决定了机器学习算法的性能。由于虚假信息多来自于人为编造,编造者在内容及写作风格上模仿真实信息,使得虚假信息具有很强的迷惑性。特征设计依赖专家经验,虚假信息的强迷惑性对特征设计带来了挑战。表示学习能够从数据中自动学习潜在的规律特征,对信息有加工和抽象的能力。表示学习方法为虚假信息检测研究带来了新的机遇。本课题在虚假信息检测的背景下研究网络文本及外部数据的表示学习方法,从而提升虚假信息的检测性能。本课题的研究内容从以下四个方面展开:1.本文提出了基于依存分析及矛盾词向量学习的方法进行矛盾关系检测。针对客观信息中虚假信息与真实信息内容语义相互矛盾的现象,提出了基于矛盾检测的虚假信息检测方法。语义矛盾现象中对于矛盾词的理解和检测是一个难点,WordNet等词汇资源无法识别矛盾词对相互矛盾的语义关系。本文通过学习针对矛盾检测任务的特定词向量表示,并将其运用到神经网络模型中,有效提升了矛盾检测任务的性能。从而识别句子间的矛盾关系,发现虚假信息。2.本文提出了基于句子权重的虚假信息表示学习方法进行虚假信息检测。由于缺少佐证,主观信息的虚假性较难判断。然而谎言在遣词造句中仍与真实信息间有一定的区分度,存在一些潜在的规律及特点。本文运用文档语义表示学习的方法挖掘数据内部的规律特点。由于信息中每个句子对信息的虚假性判断具有不同的重要性,本文提出了结合句子权重学习文档表示的虚假信息检测方法。此种方法通过文档表示学习替代了传统的特征工程方法。将句子权重计算与文档表示学习相结合,有效提升了系统的检测性能。3.本文提出了融合信息源可靠度的虚假信息表示学习方法进行虚假信息检测。人们在日常生活中遇到的很多信息如航班的登机时间等,存在着不同信息源的信息相互冲突矛盾的情况。由于信息的内容简单平实,真实信息与虚假信息在语言、语法等文本特征上的区分度小,不易检测虚假信息。本文将信息源的可靠度看做待检测信息的外部知识,通过记忆网络模型将信息源可靠度与信息的可信度相结合进行迭代计算,预测信息的虚假性。4.本文提出了融合用户反馈的虚假信息表示学习方法进行虚假信息检测。在社交媒体上,面对一条信息(如微博或Tweet),其他用户会在该信息的转发微博中表达支持、反对或质疑的观点或态度。转发微博作为一种用户反馈包含了用户对所转发信息的虚假性判断,是一种群体智慧的体现。本文运用基于注意力机制的表示学习方法,对源微博及用户反馈信息进行表示学习及表示的语义合成。在模型中运用注意力机制自动对用户反馈信息进行权重分配,有效提升了对该问题的检测性能。总体来讲,本论文利用表示学习在语义表示上的通用性,深入地研究了其在客观信息、主观信息、与信息源结合及与用户反馈结合的不同研究场景下对虚假信息检测问题的应用。希望本研究能够对虚假信息检测及自然语言处理领域的学者提供一些参考。
[Abstract]:The detection of false information in text is one of the hot issues in the field of Natural Language Processing. The purpose of the false information detection is to identify the untrue or incorrect information from the network text. The significance of the false information detection research is to identify false information and unreliable information sources, and to avoid people receiving false information when they know things and shopping. The text information in the network can be divided into objective information and subjective information. Objective information refers to the objective description of things, the real information is unique in the content, and the information contradicting the real information is false information. Subjective information refers to people's subjective feelings or experience experience, and the real information is not in content. Uniqueness. The difference between the authenticity of the subjective information and the falsehood lies in whether it comes from the real user experience. For the false information detection of the objective information and the subjective information, the researchers need to study the detection techniques for the two different characteristics. It is also a valuable research direction to detect false information with external data. In the problem of false information detection, a core problem is how to effectively express the text content and external data of false information. The method based on machine learning is the most widely used in the false information detection task, and the feature is expressed. To a great extent, the performance of the machine learning algorithm is determined. Because the false information comes from the artificial creation, the creator imitates the real information in the content and writing style, making the false information very puzzling. The feature design relies on the expert experience. The strong and puzzling of the false information brings challenges to the feature design. It is able to automatically learn potential regularity features from data and have the ability to process and abstract information. The presentation of learning method has brought new opportunities for the research of false information detection. This subject studies the expression learning method of network text and external data in the background of false information detection so as to improve the detection performance of false information. The research content of the question is carried out from the following four aspects: 1. this paper puts forward the contradiction relationship detection based on the method of dependency analysis and the vector learning of contradictory words. In view of the contradiction between the false information and the content semantics of the real information in the objective information, a false information detection method based on the contradiction detection is proposed. The understanding and detection of contradictory words is a difficult point. WordNet and other lexical resources can not identify the contradictory semantic relations of contradictory words. By learning the specific word vector of the contradiction detection task, this paper applies it to the neural network model and effectively improves the performance of the spear shield detection task, thus identifying the sentences. Paradox relationship and false information.2. this paper puts forward the false information based on the sentence weight to express the false information detection. Because of the lack of evidence, the falsehood of the subjective information is difficult to judge. However, there are some differences between the lies and the real information in the words and sentences, and there are some potential laws and characteristics. The method of using document semantics to express learning is used to excavate the regularity of the internal data. Since each sentence in the information has different importance to the false judgment of information, this paper proposes a false information detection method which combines the sentence weight learning document representation. This method has replaced the traditional feature engineering by means of document representation. Method. Combining the sentence weight calculation with the document representation learning, it effectively improves the detection performance of the system.3.. This paper proposes a false information representation learning method which combines the reliability of the information source to detect the false information. Many information, such as the boarding time of the flight, are encountered in the daily life, and there are different information sources. Because the content of information is simple and simple, the distinction between real information and false information in language, grammar and other text features is small, and it is not easy to detect false information. In this paper, the reliability of the information source is regarded as the external knowledge of the information to be detected, and the reliability of the information source is trusted and the information is trusted by the memory network model. .4. in this paper presents a false information that combines user feedback with a learning method for false information detection. In social media, in the face of a message (such as micro-blog or Tweet), other users will express support, objection or attitude in the forwarding micro-blog of the information. The forward micro-blog, as a user feedback, contains the false judgment of the users' forwarded information. It is an embodiment of group intelligence. This paper uses a representation learning method based on attention mechanism to represent the semantic synthesis of the source micro-blog and the user feedback information. In the model, the attention mechanism is used to automatically be used for users. The weight allocation of feedback information can effectively improve the detection performance of the problem. In general, this paper makes use of the generality of expression learning in semantic representation, and deeply studies the application of the false information detection problem in the different research scenes in the different research scenes, which are the objective information, the subjective information, the combination of the information source and the information source and the user feedback. It is hoped that this study can provide some reference for false information detection and scholars in the field of Natural Language Processing.
【学位授予单位】:哈尔滨工业大学
【学位级别】:博士
【学位授予年份】:2017
【分类号】:TP391.1

【相似文献】

相关期刊论文 前10条

1 杨明,刘嵊超,阙沛文;基于混沌的信息检测技术[J];机械工艺师;2000年10期

2 朱涛;常国岑;郭戎潇;李项军;;基于局部信息检测的多粒度社团挖掘方法[J];计算机科学;2009年08期

3 凌肇元;;弱电信息检测[J];电气时代;1984年01期

4 欧阳荣彪;;电视特征信息检测的若干方法[J];光学工程;1980年05期

5 欧阳荣彪;;电视特征信息检测的若干方法[J];电视技术;1982年01期

6 冯占岭;试论电子信息检测技术[J];电子产品世界;1997年02期

7 张国伟,施文康,卢秋红,吉小军;基于混沌的信息检测技术[J];计算机自动测量与控制;2002年10期

8 邰海军;杨超;孙志刚;;病态信息检测中检测强度的研究[J];兵工自动化;2008年07期

9 邵忻;徐倩漪;;网络伪装不良信息检测方法的研究与仿真[J];计算机仿真;2012年02期

10 庞海杰;;面向文本情感分析的商品评价信息检测[J];计算机应用;2012年07期

相关会议论文 前6条

1 徐德;常发亮;孙同景;;低成本集散信息检测管理系统[A];1996中国控制与决策学术年会论文集[C];1996年

2 胡翊;高隽;范之国;张翼;;基于AT91RM9200的大气偏振信息检测系统设计[A];中国仪器仪表学会第十一届青年学术会议论文集[C];2009年

3 马宏伟;聂珍;尚长春;;煤矿救援机器人环境信息检测与处理系统研究[A];第七届全国信息获取与处理学术会议论文集[C];2009年

4 曾莱蓓;孙星明;向凌云;罗纲;;基于虚词变换的文本隐藏信息检测方法研究[A];第八届全国信息隐藏与多媒体安全学术大会湖南省计算机学会第十一届学术年会论文集[C];2009年

5 张宣;赵军平;郑兵;宋浩;宋伟;;基于双模卫星定位、生命信息检测和无线传输终端的设计[A];第一届中国卫星导航学术年会论文集(下)[C];2010年

6 刘子豪;庄毅;;一种电子邮件敏感信息检测算法[A];第八届全国信息隐藏与多媒体安全学术大会湖南省计算机学会第十一届学术年会论文集[C];2009年

相关博士学位论文 前4条

1 李璐e,

本文编号:2116885


资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/2116885.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e233b***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com