深度学习算法在藏文情感分析中的应用研究
发布时间:2018-06-22 05:48
本文选题:深度学习 + 情感分析 ; 参考:《计算机科学与探索》2017年07期
【摘要】:针对以往进行藏文情感分析时算法忽略藏文语句结构、词序等重要信息而导致结果准确率较低的问题,将深度学习领域内的递归自编码算法引入藏文情感分析中,以更深层次提取语义情感信息。将藏文分词后,用词向量表示词语,则藏文语句变为由词向量组成的矩阵;利用无监督递归自编码算法对该矩阵向量化,此时获得的最佳藏文语句向量编码融合了语义、语序等重要信息;利用藏文语句向量和其对应的情感标签,有监督地训练输出层分类器以预测藏文语句的情感倾向。在实例验证部分,探讨了不同向量维度、重构误差系数及语料库大小对算法准确度的影响,并分析了语料库大小和模型训练时间之间的关系,指出若要快速完成模型的训练,可适当减小数据集语句条数。实例验证表明,在最佳参数组合下,所提算法准确度比传统机器学习算法中性能较好的语义空间模型高约8.6%。
[Abstract]:Aiming at the problem that the algorithm ignored the Tibetan sentence structure and the word order and other important information in the past, the recursive self encoding algorithm in the domain of deep learning was introduced to the Tibetan emotional analysis to extract the semantic emotion information in a deeper level. The text is transformed into a matrix consisting of a word vector; using an unsupervised recursive self encoding algorithm to quantify the matrix, the best Tibetan sentence vector encoding at this time combines semantic, word order and other important information. Using the Tibetan sentence vector and its corresponding emotional label, the output layer classifier is trained to predict the situation of Tibetan sentences. In case validation part, the effect of different vector dimensions, reconstruction error coefficient and corpus size on the accuracy of the algorithm is discussed, and the relationship between the size of the corpus and the training time of the model is analyzed. It is pointed out that the number of data sets can be reduced properly if the training of the model is to be completed quickly. Example verification shows that the best parameter is in the best case. In combination, the accuracy of the proposed algorithm is about 8.6%. higher than that of the traditional machine learning algorithm.
【作者单位】: 西藏大学藏文信息技术研究中心;西南交通大学信息科学与技术学院;
【基金】:国家自然科学基金61540060 国家软科学研究计划项目2013GXS4D150 西藏自治区科技厅科学研究项目~~
【分类号】:TP18;TP391.1
,
本文编号:2051836
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2051836.html