当前位置:主页 > 科技论文 > 自动化论文 >

用于文本分类的局部化双向长短时记忆

发布时间:2018-11-03 18:45
【摘要】:近年来,深度学习越来越广泛地应用于自然语言处理领域,人们提出了诸如循环神经网络(RNN)等模型来构建文本表达并解决文本分类等任务。长短时记忆(long short term memory,LSTM)是一种具有特别神经元结构的RNN。LSTM的输入是句子的单词序列,模型对单词序列进行扫描并最终得到整个句子的表达。然而,常用的做法是只把LSTM在扫描完整个句子时得到的表达输入到分类器中,而忽略了扫描过程中生成的中间表达。这种做法不能高效地提取一些局部的文本特征,而这些特征往往对决定文档的类别非常重要。为了解决这个问题,该文提出局部化双向LSTM模型,包括MaxBiLSTM和ConvBiLSTM。MaxBiLSTM直接对双向LSTM的中间表达进行max pooling。ConvBiLSTM对双向LSTM的中间表达先卷积再进行max pooling。在两个公开的文本分类数据集上进行了实验。结果表明,局部化双向LSTM尤其是ConvBiLSTM相对于LSTM有明显的效果提升,并取得了目前的最优结果。
[Abstract]:In recent years, in-depth learning is more and more widely used in the field of natural language processing. People put forward some models such as cyclic neural network (RNN) to construct text representation and solve the task of text classification. Long and short memory (long short term memory,LSTM) is a kind of RNN.LSTM with special neuronal structure. The input of the RNN.LSTM is the word sequence of the sentence. The model scans the sequence of words and finally obtains the expression of the whole sentence. However, the usual approach is to input the expression obtained by LSTM into the classifier after scanning the whole sentence, while ignoring the intermediate expression generated in the scanning process. This approach can not efficiently extract some local text features, which are often very important in determining the classification of documents. In order to solve this problem, this paper proposes a localized bidirectional LSTM model, which includes MaxBiLSTM and ConvBiLSTM.MaxBiLSTM directly implementing max pooling.ConvBiLSTM for the intermediate expression of bidirectional LSTM and convolution then max pooling. for the intermediate expression of bidirectional LSTM. Experiments are carried out on two published text categorization data sets. The results show that the localization of bidirectional LSTM, especially ConvBiLSTM, has a significant improvement over LSTM, and the best results are obtained.
【作者单位】: 中国科学院计算技术研究所;中国科学院大学;
【基金】:973基金项目(2014CB340401,2012CB316303) 国家自然科学基金(6122010,61472401,61433014,61425016,61203298) 中国科学院青年创新促进会(2014310,2016102)
【分类号】:TP18;TP391.1


本文编号:2308635

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2308635.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户1e898***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com