基于深度神经网络的韵律结构预测研究

发布时间：2018-08-19 09:07

【摘要】：汉语韵律预测对合成语音的自然度起着重要作用,本文的研究重点是如何提高韵律层级结构的预测准确率。以往基于统计的韵律结构预测模型,在输入特征的选取中往往使用POS (Part of Speech)词性特征,POS特征为浅层信息,无法利用词语的语义信息,且选取的特征因为词与词之间缺乏关联,往往会形成“词汇鸿沟”现象,导致即使同义词也无法表现出相应的关联性。因此,需要使用一种能体现词语关联性的表达方式作为模型的输入特征。在模型选择上,隐马尔科夫模型、决策树模型虽然在韵律结构预测中取得了成功,但会产生适用范围窄、过拟合等问题。随着数据复杂性越来越大,需要使用一种对数据建模能力更强的模型方法,而深度神经网络对复杂数据有很好的建模能力。因此,本文围绕以词向量为输入特征的基于深度神经网络模型的韵律结构预测模型展开研究。本文一方面对词向量进行训练,由词向量构建韵律词向量,采用复合向量作为模型的输入；另一方面,对传统神经网络模型进行了改进,使神经网络的隐藏层能更好的捕捉词与词之间的交互联系。主要工作包括：(1)配置Gensim词向量训练模块,通过Gensim模块训练词向量,利用训练后的词向量学习韵律词向量,通过不同层级的向量来抓取上下文中的韵律层级信息；(2)利用韵律层级标注数据训练神经网络模型,使用词典词向量、韵律词向量、前词韵律层级向量和当前词的词长向量作步输入特征,通过在模型的输入层使用复合输入特征来提高模型的预测能力；(3)对模型隐藏层进行改进,在隐藏层中添加张量矩阵,通过张量矩阵捕捉词与词之间、不同韵律层级之间的联系,分别从窗口长度、空间维度、隐藏层单元数量、输入特征等方面对模型的韵律结构预测能力进行验证。实验结果显示：多向量结合的复合输入特征与单一词向量作为输入特征的预测结果相比,韵律词的错误率降低了3.2%(从15.3%降至12.1%),而韵律短语的错误率降低了5%(从40.3%降至35.3%)；隐藏层中添加张量矩阵之后,韵律词的错误率降低了0.5%(从12.1%降至11.6%)。实验结果表明：复合输入特征能有效改善韵律预测的错误率；带有张量矩阵的隐藏层与普通隐藏层相比,能够更好地捕捉韵律层级之间的信息。
[Abstract]:The prediction of Chinese prosody plays an important role in the naturalness of synthetic speech. This paper focuses on how to improve the prediction accuracy of prosodic hierarchical structure. In the former statistical prosodic structure prediction model, POS (Part of Speech) feature is often used in the selection of input features for shallow information, which can not use semantic information of words, and the selected features are lack of correlation between words and words. The phenomenon of lexical gap is often formed, resulting in even synonyms can not show the corresponding relevance. Therefore, it is necessary to use a kind of expression which can reflect the relevance of words as the input feature of the model. In model selection, hidden Markov model and decision tree model have been successful in prosodic structure prediction, but some problems such as narrow scope of application and over-fitting will occur. With the increasing complexity of data, it is necessary to use a more powerful modeling method for data, while the depth neural network has a good ability to model complex data. Therefore, this paper focuses on the prosodic structure prediction model based on depth neural network model with word vector as input feature. On the one hand, we train the word vector, construct the prosodic word vector from the word vector, and use the compound vector as the input of the model; on the other hand, we improve the traditional neural network model. The hidden layer of neural network can better capture the interaction between words and words. The main work includes: (1) configure Gensim word vector training module, train word vector through Gensim module, use trained word vector to learn prosodic word vector, grab prosodic level information in context by different level vector; (2) training neural network model with prosodic level tagging data, using dictionary word vector, prosodic word vector, preword prosodic level vector and word length vector of current word as step input features. The prediction ability of the model is improved by using the compound input feature in the input layer of the model. (3) the hidden layer of the model is improved by adding the Zhang Liang matrix to the hidden layer, and the words and words are captured by the Zhang Liang matrix. The relationship between different prosodic levels verifies the prediction ability of the prosodic structure from the aspects of window length spatial dimension the number of hidden layer units input features and so on. The experimental results show that the error rate of prosodic words decreases by 3.2% (from 15.3% to 12.1%) and the error rate of prosodic phrases decreases by 5% (from 40.3% to 35.3%) compared with the prediction results of single word vector. After adding Zhang Liang matrix to the hidden layer, the error rate of prosodic words decreased by 0.5% (from 12.1% to 11.6%). The experimental results show that the compound input feature can effectively improve the error rate of prosodic prediction and that the hidden layer with Zhang Liang matrix can capture the information between prosodic levels better than the ordinary hidden layer.
【学位授予单位】：北京交通大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.1

【相似文献】