面向评论的文本倾向性分析中关键问题的研究

发布时间：2018-06-14 12:54

本文选题：文本倾向 + 特征聚类　；参考：《北京化工大学》2016年硕士论文

【摘要】：如今,我国电子商务已经极其普遍,淘宝、京东等大型购物网站已经占据了大部分市场。面对大量产品评论,企业为了获取商业收益以及消费者更好地做出购买决策,需要了解用户对产品的态度、观点。利用人力去标注文本情感,费力费时,因此需利用计算机实现自动分析文本的情感倾向,这种技术称为文本倾向性分析。如今,该技术取得了很多研究成果。本文的研究重点为对现有的文本倾向性分析方法中存在的关键问题进行探索。在基于机器学习的文本倾向性分析中,重点研究由于训练、测试文本不在同一个领域引起的准确率低的问题。针对文本分类中特征降维环节,提出一种基于通用领域框架的特征聚类算法。针对基于加权SimRank跨领域文本倾向性方法中,存在的两个领域特征对齐时,在共现加权时未能考虑近义词的问题,提出将基于通用领域框架的特征聚类应用于该方法。实验表明,在保证了准确率的前提下,节省了内存空间,缓解了数据稀疏问题。在基于语义的文本倾向性分析中,重点研究词语倾向性计算问题。针对常用的词语倾向计算方法中,存在过于依赖知识库以及不能准确挖掘语义关系问题,本文提出一种基于词向量的领域情感词倾向性计算方法。该方法面向对象为领域情感词,即在特定领域下经常出现且具有明显情感的词语。基于Google的word2vec工具,其通过神经网络进行学习得到词向量,将向量之间余弦距离作为词的相近程度度量方式,衡量词语与基准词相近程度,进而判断其倾向性。实验表明该方法有领域适应性,且准确率高。
[Abstract]:Nowadays, e-commerce in China has been extremely common, Taobao, JingDong and other large shopping sites have occupied most of the market. In the face of a large number of product reviews, enterprises need to understand the attitudes and viewpoints of customers in order to obtain business profits and make better purchase decisions. It is difficult and time-consuming to use manpower to label text emotion, so it is necessary to use computer to realize automatic analysis of text emotional tendency. This technique is called text orientation analysis. Today, the technology has made a lot of research results. The research focus of this paper is to explore the key problems in the existing text orientation analysis methods. In text orientation analysis based on machine learning, this paper focuses on the problem of low accuracy caused by the fact that the test text is not in the same domain because of training. A feature clustering algorithm based on general domain framework is proposed for feature dimensionality reduction in text classification. In the weighted SimRank cross-domain text orientation method, when the two domain features are aligned, the synonyms are not considered when they are weighted together, so the feature clustering based on the general domain framework is applied to this method. Experiments show that the memory space is saved and the problem of data sparsity is alleviated. In semantic-based text orientation analysis, the emphasis is placed on word orientation calculation. In order to solve the problem of relying too much on the knowledge base and not mining the semantic relation accurately in the common methods of word tendency calculation, this paper proposes a method for calculating the tendency of domain affective words based on word vector. This method is object oriented for domain affective words, that is, words that often appear in specific fields and have obvious emotions. Based on the word2vec tool, the word vector is obtained by using neural network. The cosine distance between the vectors is regarded as the measure of the degree of similarity between the words, and the similarity between the words and the reference words is measured, and then the tendency of the word is judged. Experiments show that the method is domain adaptive and has high accuracy.
【学位授予单位】：北京化工大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.1

【相似文献】