基于CNN特征空间的微博多标签情感分类
发布时间:2018-04-29 20:45
本文选题:情感分类 + 多标签分类 ; 参考:《工程科学与技术》2017年03期
【摘要】:面对微博情感评测任务中的多标签分类问题时,基于向量空间模型的传统文本特征表示方法难以提供有效的语义特征。基于深度学习的词向量表示技术,能够很好地体现词语的语法和语义关系,且可以依据语义合成原理有效地构建句子的特征表示向量。作者提出一个针对微博句子的多标签情感分类系统,首先从1个大规模的无标注微博文本数据集中学习中文词语的词向量表示,然后采用卷积神经网络(convolution neural network,CNN)模型进行有监督的多情感分类学习,利用学习到的CNN模型将微博句子中的词向量合成为句子向量,最后将这些句子向量作为特征训练多标签分类器,完成微博的多标签情感分类。2013年NLPCC(Natural Language Processing and Chinese Computing)会议的微博情感评测公开数据集中,相比最优评测结果的宽松指标和严格指标,本系统的最佳分类性能分别提升了19.16%和17.75%;采用Recursive Neural Tensor Network模型合成句子向量的方法,取得目前已知文献中的最佳分类性能,系统将2个指标分别提升了3.66%和2.89%。采用多种多标签分类器来对比不同的特征表示方法,发现基于CNN特征空间的句子向量具有最好的情感语义区分度;通过对CNN迭代训练过程的分析,体现了语义合成过程中的模式识别规律。进一步的工作包括引入更多合适的深度学习模型,并深入探索基于词向量的语义合成现象。
[Abstract]:In the face of the problem of multi-label classification in Weibo's emotional evaluation task, the traditional text feature representation method based on vector space model is difficult to provide effective semantic features. The technology of word vector representation based on deep learning can well reflect the grammar and semantic relationship of words, and can effectively construct the feature representation vector of sentences according to the principle of semantic composition. The author proposes a multi-label affective classification system for Weibo sentences. Firstly, a large scale untagged Weibo text data set is used to learn the word vector representation of Chinese words. Then the convolutional neural network neural network is used for supervised multi-emotion classification learning, and the word vectors in Weibo sentences are synthesized into sentence vectors by using the learned CNN model. Finally, these sentence vectors are used as feature training multi-label classifiers to complete the multi-label affective classification of Weibo. The Weibo affective evaluation open data set of the 2013 NLPCC(Natural Language Processing and Chinese Computing) conference is compared with the loose and strict indexes of the optimal evaluation results. The optimal classification performance of the system was improved by 19.16% and 17.75%, respectively, and the best classification performance was obtained by using Recursive Neural Tensor Network model to synthesize sentence vectors, and the two indexes were improved by 3.66% and 2.89% respectively. Several multi-label classifiers are used to compare different feature representation methods. It is found that sentence vectors based on CNN feature space have the best emotional and semantic discriminations, and the process of CNN iterative training is analyzed. It embodies the pattern recognition law in the process of semantic synthesis. Further work includes introducing more appropriate depth learning models and exploring the semantic synthesis phenomenon based on word vector.
【作者单位】: 武汉大学计算机学院;武汉大学软件工程国家重点实验室;
【基金】:国家自然科学基金资助项目(61303115;61373039;61472290) 高等学校博士学科点专项科研基金资助项目(2013014111002512)
【分类号】:TP183;TP391.1
【相似文献】
相关期刊论文 前4条
1 杨怀恒;闵乐泉;;设计局部最大灰度值探测CNN模板的定理与应用[J];计算机工程与应用;2006年19期
2 陈瑞森;;数字CNN微处理器的指令集设计[J];现代电子技术;2009年24期
3 沙莎;刘金珠;闵乐泉;;复合4邻域圈提取CNN的鲁棒性设计[J];计算机工程与应用;2011年02期
4 ;[J];;年期
相关会议论文 前1条
1 刘国华;张颖;陈子军;陈子阳;;改进的CNN搜索算法[A];第二十届全国数据库学术会议论文集(技术报告篇)[C];2003年
相关重要报纸文章 前1条
1 本报记者 马佳;调查CNN“中国黑客”报道[N];北京科技报;2008年
,本文编号:1821484
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1821484.html