利用拼音特征的深度学习文本分类模型

发布时间：2018-04-30 08:39

本文选题：文本分类 + 意图理解　；参考：《高技术通讯》2017年07期

【摘要】：针对人-机器人语音交互中经过语音识别的文本指令,提出了一种利用汉语拼音中声韵母作为特征的深度学习文本分类模型。首先,以无人驾驶车语音导航控制为人机交互的应用背景,分析其文本指令结构并分别构建单一意图与复杂意图语料库;其次,在以字符作为文本分类特征的基础上,结合汉语拼音与英文单词的区别,提出了一种利用拼音声韵母字符作为中文文本分类的特征表示方法;然后,用门控递归单元(GRU)代替传统递归神经网络单元以解决其难以捕获长时间维度特征的不足,为提取信息的高阶特征、缩短特征序列长度并加快模型收敛速度,建立了一种结合卷积神经网络及GRU递归神经网络的深度学习文本分类模型。最后,为验证模型在处理长、短序列任务上的表现,在上述两个语料库上对提出的模型分别进行十折交叉测试,并与其他分类方法进行比较与分析,结果表明该模型显著地提高了分类准确率。
[Abstract]:In this paper, a deep learning text classification model based on phonetic mother in Chinese pinyin is proposed for the text instruction of speech recognition in human-robot speech interaction. Firstly, the structure of text instruction is analyzed and the corpus of single intention and complex intention is constructed based on the application background of man-machine interaction based on the voice navigation control of driverless vehicle. Secondly, on the basis of the character as the text classification feature, the structure of the text instruction structure is analyzed and the corpus of single intention and complex intention is constructed separately. Combined with the difference between Chinese phonetic alphabet and English words, this paper proposes a method of feature representation of Chinese text classification by using phonetic rhyme characters as Chinese text classification. In order to extract the high order feature of information, shorten the length of feature sequence and speed up the convergence of model, grub is used to replace the traditional recursive neural network unit to solve the problem that it is difficult to capture the feature of long time dimension. An in-depth learning text classification model combining convolution neural network and GRU recurrent neural network is established. Finally, in order to verify the performance of the model in processing long and short sequence tasks, the proposed models are tested on the above two corpora, and compared with other classification methods. The results show that the classification accuracy of the model is improved significantly.
【作者单位】：上海交通大学自动化系系统控制与信息处理教育部重点实验室;上海交通大学人文学院;上海交通大学安泰经济与管理学院;
【基金】：国家自然科学基金(91646205)资助项目
【分类号】：TP18;TP391.1

【相似文献】