基于贝叶斯理论的网络舆情主题分类模型研究

发布时间：2018-09-02 06:40

【摘要】：随着互联网的普及，，网民的数量越来越多，很多人通过互联网来关注舆情，人们在网络上浏览自己感兴趣的舆情，并且发表评论、宣泄情感。然而网络舆情信息繁杂，网民进行浏览时具有一定的盲目性，目前各大门户网站、论坛等对网络舆情主题进行了规划，但在一定程度具有抽象性。因此，对网络舆情主题进行分类，不仅方便用户浏览舆情新闻，同时是对网络舆情进行有效预警，能够使相关部门正确引导网络舆情。关于中文文本分类的方法已有多种，其中常见的分类方法有朴素贝叶斯、K-近邻和支持向量机三种。本文在利用结构简单、分类高效的朴素贝叶斯对网络舆情主题分类进行研究时发现，朴素贝叶斯的条件独立性假设限制了其应用范围，降低了分类精度，并且该方法在面对增量的网络舆情信息时，需要通过学习来修正先验信息，而每一次学习所有文本都需要参与，缺乏灵活性。针对上述问题，本文运用增量学习机制和动态约简对朴素贝叶斯分类方法进行优化，结合文本挖掘技术，提出了一种优化的网络舆情主题分类模型。本文的研究重点主要有以下几个方面： 1.网络舆情文本信息的收集，通过利用网络爬虫技术收集信息，并且结合HTML解释器和网页净化技术对舆情信息进行解析和提取，利用优化的特征加权方法表示网络舆情文本，提高网络舆情文本表示的准确性。 2.利用增量学习机制和(F-λ)广义动态约简对朴素贝叶斯分类方法进行优化，提高其分类精度。(F-λ)广义动态约简通过引入动态约简精度系数λ，减少参与属性约简的文本数，释放了条件独立性假设，降低计算复杂度，提高其分类精度；朴素贝叶斯利用增量学习，解决了对增量网络舆情进行主题分类时需要学习所有文本来修正先验信息的问题，在增量学习过程中，通过引入类置信度，避免了噪音分类加入原始训练集而降低分类器的分类精度。 3.通过数据实验分析对比文中所提到的非增量非动态约简分类算法、增量分类算法、动态约简分类算法以及既增量又动态约简分类算法，以检验本文所提出的优化的网络舆情主题分类算法的有效性，并且通过仿真实验研究了网络舆情主题分类算法的可行性。
[Abstract]:With the popularity of the Internet, the number of Internet users more and more, many people through the Internet to pay attention to public opinion, people in the Internet browse their own interest in public opinion, and comment, vent feelings. However, the network public opinion information is complicated, Internet users have certain blindness when browsing, at present, the major portal websites, forums and so on have carried on the plan to the network public opinion theme, but has the abstraction to a certain extent. Therefore, classifying the topic of network public opinion is not only convenient for users to browse the news of public opinion, but also an effective early warning of network public opinion, which can make the relevant departments guide the network public opinion correctly. There are many methods for Chinese text classification, among which the common classification methods are naive Bayesian K-nearest neighbor and support vector machine. In this paper, we study the topic classification of network public opinion by using naive Bayes with simple structure and efficient classification. It is found that the conditional independence hypothesis of naive Bayes limits its application scope and reduces the classification accuracy. In the face of the incremental network public opinion information, the method needs to modify the prior information by learning, and every time learning all the texts need to participate, so it is inflexible. Aiming at the above problems, this paper uses incremental learning mechanism and dynamic reduction to optimize the naive Bayes classification method, and combines the text mining technology, proposes an optimized network public opinion topic classification model. The main research focus of this paper is as follows: 1. The collection of network public opinion text information, through the use of web crawler technology to collect information, and combined with HTML interpreter and page purification technology to analyze and extract public opinion information, using the optimized feature weighting method to express network public opinion text. Improve the accuracy of network public opinion text representation. 2. By using incremental learning mechanism and (F- 位) generalized dynamic reduction, the naive Bayes classification method is optimized and its classification accuracy is improved. (F- 位) generalized dynamic reduction reduces the number of text involved in attribute reduction by introducing dynamic reduction precision coefficient 位. The assumption of conditional independence is released, the computational complexity is reduced, and the classification accuracy is improved. By using incremental learning, naive Bayes solves the problem that we need to learn all the texts to correct the prior information when classifying the topic of incremental network public opinion. In the process of incremental learning, the accuracy of the classifier is reduced by introducing the confidence degree of the class and avoiding the noise classification from being added to the original training set. Through the data experiment analysis and comparison of the non-incremental non-dynamic reduction classification algorithm, incremental classification algorithm, dynamic reduction classification algorithm and both incremental and dynamic reduction classification algorithm, In order to test the effectiveness of the optimized algorithm, the feasibility of the algorithm is studied through simulation experiments.
【学位授予单位】：江苏科技大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.09

【参考文献】