基于云模型理论的LDA最大熵模型观点挖掘研究

发布时间：2018-04-23 18:55

本文选题：观点挖掘 + LDA模型　；参考：《华中师范大学》2016年硕士论文

【摘要】：随着互联网的高速发展,当今社会进入了“互联网+”和大数据时代。电子商务和网络社交媒体的空前繁荣使得人们有了更多在网络上发表评论信息的机会,在线评论中包含着许多有价值的信息,这些信息能很好地服务商家、用户和政府部门。但面对互联网公司在日常运营中累积的用户网络行为所生成的海量在线评论数据,仅靠人工的方式根本无法有效地提取出关键信息。在这种大环境下,观点挖掘技术的诞生为解决以上问题提供了新的思路,而细粒度观点挖掘由于能为用户提供更具体有效的信息,从而获得国内外研究人员的广泛关注。观点挖掘的三大主要任务分别是：情感分类、观点抽取和观点分析,为了完成以上任务,研究人员提出了很多的方法,它们可以被简单归纳为基于规则的方法和基于统计机器学习的方法。早期的观点挖掘使用的大都是基于规则的方法,需要人工定义领域特征词和词的提取规则,虽然在一定程度上满足了观点挖掘的需要,但工作量大、适应性差的特点使得该方法不适合于海量数据的观点挖掘。而以主题模型为代表的基于统计机器学习的观点挖掘方法,以其领域适应性和无监督性得到了研究人员的认可,逐渐替代了基于规则的方法。LDA模型是一种无监督的统计主题模型,它将文档表示为若干词语的集合,类似于一个词袋,其中的词语相互独立没有关联,然后根据变分推断、EM算法、吉布斯采样等方法对文档-主题分布和主题-单词分布进行近似推理求解,从而得到文章中的相关观点。而文档中,词语的位置和语义关系对观点表达有着重要的影响,要想获取更准确和有价值的观点信息,需要对传统LDA模型进行改进后才能用于细粒度观点挖掘。此外,自然语言存在多义性和相关性,导致评论的观点挖掘和建模存在不确定性：一方面进行定性概念描述时存在界限的模糊性；另一方面模型的定量表示又具有随机性。目前的LDA模型用于观点挖掘时,仅考虑了定量的随机性而忽视了定性概念的模糊性。为解决以上问题,本文引入最大熵模型和云模型理论对标准LDA模型进行改进,用于在线评论的细粒度观点挖掘。首先,在LDA模型中加入最大熵模型,充分利用词语的位置和语义信息,对文档中的词进行更细致的划分；其次,将云模型理论用于最大熵LDA模型中,用云模型对文档中具有不确定性的情感进行定性定量转换建模,在此基础上提出情感修正算法,通过贴近度和云期望曲线实现全局情感和主题情感的相似度计算,进而完成对情感偏差的校正,有效提升了观点挖掘的精确度。本文最后进行了仿真实验,并对实验相关结果进行了可视化。实验的结果印证了本文所提出理论的有效性。本文内容主要分为五章,第一章给出了本文的研究背景和意义,分析了当前国内外观点挖掘领域的研究现状。第二章介绍了观点挖掘的理论基础,同时列出了文中所涉及到的数学和建模理论基础。第三章对基于云模型理论的LDA最大熵模型进行了详细介绍,给出相关建模思想和理论推导,在此基础上提出情感修正算法。第四章主要介绍了仿真实验过程和相关结果分析,并对实验结果进行了可视化,验证了本文理论的有效性。第五章对相关研究课题进行了总结和展望。
[Abstract]:With the rapid development of the Internet, today's society entered the era of big data and Internet plus ". The unprecedented prosperity of e-commerce and social media network makes people have more opportunities to comment on the information on the Internet, online reviews contain many valuable information, this information can better service users and businesses Zheng But in the face of the massive online review data generated by the user network behavior accumulated by the Internet Co in the daily operation, the key information can not be extracted by the artificial way. In this environment, the birth of the viewpoint mining technology provides a new way of thinking to solve the above problems, and the fine grained view is excavated from the point of view. In order to provide more specific and effective information for the users, the researchers at home and abroad have received extensive attention. The three main tasks of view mining are: emotional classification, viewpoint extraction and viewpoint analysis. In order to accomplish the above tasks, researchers have put forward a lot of methods, which can be simply summed up as rules based methods and Based on the method of statistical machine learning. Early view mining is mostly based on rule based method. It needs to define the extraction rules of domain feature words and words manually. Although it satisfies the need of view mining to a certain extent, the workload is large and the characteristics of poor adaptability make the method unfit for the view mining of mass data. The method of point mining based on statistical machine learning, represented by the topic model, is recognized by the researchers in terms of its domain adaptability and unsupervised nature, and gradually replaced the rule based.LDA model as an unsupervised statistical theme model, which represents a document as a collection of words, similar to a word bag. The words and expressions are independent and independent of each other. Then according to the variational inference, EM algorithm, Gibbs sampling and other methods to approximate the document theme distribution and the topic word distribution, the relevant views in the article are obtained. In the document, the position and semantic relation of words have an important influence on the opinion expression, and want to get more accurate. The exact and valuable information of view needs to be improved for the traditional LDA model. In addition, the nature language has the ambiguity and correlation, which leads to the uncertainty of the view mining and modeling. On the one hand, the ambiguity in the boundary is stored in the qualitative concept description; on the other hand, the model is quantified. In order to solve the above problems, this paper introduces the maximum entropy model and cloud model to improve the standard LDA model, and uses the maximum entropy model and cloud model theory to mine the fine grained viewpoint of online comments. First, add the LDA model to the LDA model. In the maximum entropy model, we make full use of the position and semantic information of words to make a more detailed division of the words in the document. Secondly, the cloud model theory is used in the maximum entropy LDA model, and the cloud model is used to model the qualitative and quantitative transformation of the uncertain emotion in the document. On this basis, the emotion correction algorithm is proposed and the close degree is adopted. And the cloud expectation curve realizes the similarity calculation of the global emotion and theme emotion, then completes the correction of the emotional deviation, effectively improves the accuracy of the viewpoint mining. Finally, the simulation experiment is carried out, and the experimental results are visualized. The results of the experiment confirm the validity of the proposed theory. This paper is the main content of this paper. The first chapter is divided into five chapters. The first chapter gives the background and significance of the study, analyzes the current status of the research in the field of viewpoint mining at home and abroad. The second chapter introduces the theoretical basis of the viewpoint mining, and lists the mathematical and modeling theoretical basis involved in the article. The third chapter is detailed on the LDA maximum entropy model based on the cloud model theory. In the fourth chapter, the simulation experiment process and the analysis of related results are introduced, and the results of the experiment are visualized, and the validity of the theory is verified. The fifth chapter is a summary and Prospect of the related research subjects.

【学位授予单位】：华中师范大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.1

【相似文献】