当前位置:主页 > 经济论文 > 电子商务论文 >

基于SVM和深度学习的情感分类算法研究

发布时间:2018-05-04 22:43

  本文选题:情感分析 + SVM ; 参考:《重庆邮电大学》2016年硕士论文


【摘要】:互联网的蓬勃发展,引发了传统生活方式、商业经济结构的巨大变革。从电子商务、社交软件、再到打车软件,处处标示着互联网的标签。人们通过微信、微博等社交工具联络他人、展示自己、发表评论。藉此产生了大量蕴含着观点和意见的数据信息,具有难以估量的价值,这使得文本大数据处理成为当前非常热门的领域。文本数据的情感分类是该领域相关研究的主要内容之一,本文以文本情感信息分类为主要研究对象,针对目前中文情感分析领域,基于机器学习的相关研究在进行特征提取时通常基于统计学的知识,存在对复杂句式有效分析能力不足和无法深入反映文本语义的问题进行深入研究。针对复杂句式有效分析能力不足的问题,本文构建了对应各种复杂句式的特征提取规则,提出了基于SVM(Support Vector Machine)和复杂句式的文本情感分析方法。在实验中以情感词、词性和否定词特征组合为基础,依次加入条件句式和转折句式特征,并使用不同分类器及内核进行多次实验,得到的最佳分类结果为90.12%。同时,在实验中发现这类方法非常依赖人工设计的具体任务的特征,领域适应性差,难以覆盖所有的信息。针对无法深入反映文本语义与本文上述研究中发现的问题,本文引入了基于深度学习的Word2vec工具,它能训练出包含着深层语义信息的低维词向量。研究中,使用Word2vec训练词向量作为特征,融合TF-IDF(Term Frequency Inverse Document Frequency)训练的词频权重特征,使用SVM分类器获得了理想效果。进一步调整惩罚系数C,当C=10的时候,获得的最佳准确度高达94.37%。同时,本文还提出了词向量融合Hash映射特征的方法,同样取得了良好的分类性能。通过本文的研究,使用传统的统计特征加上复杂句式特征,比单独使用统计特征组合提高了7.16%的准确度。本文进一步引入深度学习思想,使用词向量作为特征,在融合统计特征之后大幅增进了情感分类的性能,准确度比前者提高了4.25%,正面评论的评价指数都获得大幅提升。基于上述研究,本文设计并实现了一个文本情感分析系统,主要包括数据预处理、分词、情感分类以及结果展示等功能。
[Abstract]:The vigorous development of the Internet has brought about great changes in the traditional way of life and the structure of the commercial economy. Everything from e-commerce to social software to car-hailing software is marked with Internet tags. People use WeChat, Weibo and other social tools to contact others, show themselves, and make comments. Therefore, a large amount of data information containing views and opinions is produced, which has inestimable value, which makes the text big data processing become a very popular field at present. The emotion classification of text data is one of the main contents of the related research in this field. In this paper, the text emotion information classification is taken as the main research object, aiming at the current Chinese emotion analysis field. The related research based on machine learning is usually based on the knowledge of statistics when it comes to feature extraction. There is a deep study on the problem that the efficient analysis ability of complex sentence patterns is insufficient and the semantic of text can not be deeply reflected. In order to solve the problem of the lack of effective analysis ability of complex sentence patterns, this paper constructs the feature extraction rules corresponding to various complex sentence patterns, and proposes a text emotion analysis method based on SVM(Support Vector Machine and complex sentence patterns. On the basis of the combination of affective words, parts of speech and negative words, the conditional sentence pattern and the turning sentence feature are added in turn, and many experiments are carried out with different classifiers and kernels. The best classification result is 90.12g. At the same time, it is found in the experiment that this kind of method is very dependent on the characteristics of the specific task designed by human being, and the adaptability of the field is poor, so it is difficult to cover all the information. In this paper, we introduce a Word2vec tool based on deep learning, which can train low-dimensional word vector with deep semantic information. In the study, the Word2vec training word vector is used as the feature, and the frequency weight feature of the TF-IDF(Term Frequency / Inverse Document frequency training is fused, and the ideal effect is obtained by using the SVM classifier. Further adjust the penalty coefficient C, when C10, obtain the best accuracy as high as 94.37. At the same time, the method of word vector fusion for Hash mapping features is proposed, and good classification performance is obtained. Through the research in this paper, the accuracy of using the traditional statistical feature and the complex sentence feature is 7.16% higher than that of using the statistical feature combination alone. This paper further introduces the idea of in-depth learning, using word vector as the feature, after merging the statistical features, the performance of emotion classification is greatly improved, the accuracy is 4.25% higher than the former, and the evaluation index of positive comments is greatly improved. Based on the above research, this paper designs and implements a text emotion analysis system, which includes data preprocessing, word segmentation, emotion classification and result display.
【学位授予单位】:重庆邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1

【参考文献】

相关期刊论文 前10条

1 李镇君;周竹荣;;基于Document Triage的TF-IDF算法的改进[J];计算机应用;2015年12期

2 李抵非;田地;胡雄伟;;基于深度学习的中文标准文献语言模型[J];吉林大学学报(工学版);2015年02期

3 邸鹏;李爱萍;段利国;;基于转折句式的文本情感倾向性分析[J];计算机工程与设计;2014年12期

4 宋晖;;再论“转折”[J];语文研究;2014年04期

5 李婷婷;姬东鸿;;基于SVM和CRF多特征组合的微博情感分析[J];计算机应用研究;2015年04期

6 周咏梅;阳爱民;林江豪;;中文微博情感词典构建方法[J];山东大学学报(工学版);2014年03期

7 汪海燕;黎建辉;杨风雷;;支持向量机理论及算法研究综述[J];计算机应用研究;2014年05期

8 张健军;;转折复句的定景机制及其研究意义[J];语言教学与研究;2014年02期

9 王振宇;吴泽衡;胡方涛;;基于HowNet和PMI的词语情感极性计算[J];计算机工程;2012年15期

10 谢丽星;周明;孙茂松;;基于层次结构的多策略中文微博情感分析和特征抽取[J];中文信息学报;2012年01期



本文编号:1845032

资料下载
论文发表

本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/1845032.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户263ff***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com