基于卷积神经网络的自适应权重multi-gram语句建模系统

发布时间：2018-02-10 02:15

本文关键词： 深度学习自然语言处理自适应权重 multi-gram　出处：《计算机科学》2017年01期 　论文类型：期刊论文

【摘要】：如今信息量呈爆炸式增长,自然语言处理得到了越来越广泛的重视。传统的自然语言处理系统过多地依赖昂贵的人工标注特征和语言分析工具的语法信息,导致预处理中语法信息的错误传递到系统训练和预测过程中。因此,深度学习的应用受到了学者们的关注。因为它能实现端对端预测并尽可能少地依赖外部信息。自然语言处理领域流行的深度学习框架为了更好地获取句子信息,采用multi-gram策略。但不同任务和不同数据集的信息分布状况不尽相同,而且这种策略并没有考虑到不同n-gram的重要性分布。针对该问题,提出了一种基于深度学习的自适应学习multi-gram权重的策略,从而根据各n-gram特征的贡献为其分配相应的权重;并且还提出了一种新的multigram特征向量结合方法,大大降低了系统复杂度。将该模型应用到电影评论正负倾向判断和关系分类两种分类任务中,实验结果证明采用的自适应multi-gram权重策略能够大大改善模型的分类效果。
[Abstract]:Nowadays, the amount of information is increasing explosively, and natural language processing has been paid more and more attention. Traditional natural language processing systems rely too much on expensive manual tagging features and grammatical information of language analysis tools. Causes errors in preprocessing syntax information to be passed into the system training and prediction process. The application of in-depth learning has attracted the attention of scholars because it can realize end-to-end prediction and rely as little as possible on external information. The popular in-depth learning framework in the field of natural language processing is used to obtain sentence information better. The information distribution of different tasks and data sets is different, and this strategy does not take into account the importance distribution of different n-grams. In this paper, a strategy of adaptive learning multi-gram weight based on deep learning is proposed, which can be assigned according to the contribution of n-gram features, and a new multigram feature vector combination method is also proposed. The system complexity is greatly reduced. The model is applied to both the positive and negative tendency judgment of movie reviews and the relationship classification. The experimental results show that the adaptive multi-gram weight strategy can greatly improve the classification effect of the model.
【作者单位】：山东财经大学计算机科学与技术学院;北京邮电大学信息与通信工程学院;山东大学计算机科学与技术学院;
【基金】：国家自然科学基金重点项目:基于机器学习的多模态医学影像信息处理与分析(U1201258) 山东省自然科学杰出青年基金项目:基于机器学习的生物特征识别研究(JQ201316)资助
【分类号】：TP391.1;TP183

【相似文献】