当前位置:主页 > 科技论文 > 软件论文 >

基于深度神经网络的韵律结构预测研究

发布时间:2018-08-19 09:07
【摘要】:汉语韵律预测对合成语音的自然度起着重要作用,本文的研究重点是如何提高韵律层级结构的预测准确率。以往基于统计的韵律结构预测模型,在输入特征的选取中往往使用POS (Part of Speech)词性特征,POS特征为浅层信息,无法利用词语的语义信息,且选取的特征因为词与词之间缺乏关联,往往会形成“词汇鸿沟”现象,导致即使同义词也无法表现出相应的关联性。因此,需要使用一种能体现词语关联性的表达方式作为模型的输入特征。在模型选择上,隐马尔科夫模型、决策树模型虽然在韵律结构预测中取得了成功,但会产生适用范围窄、过拟合等问题。随着数据复杂性越来越大,需要使用一种对数据建模能力更强的模型方法,而深度神经网络对复杂数据有很好的建模能力。因此,本文围绕以词向量为输入特征的基于深度神经网络模型的韵律结构预测模型展开研究。本文一方面对词向量进行训练,由词向量构建韵律词向量,采用复合向量作为模型的输入;另一方面,对传统神经网络模型进行了改进,使神经网络的隐藏层能更好的捕捉词与词之间的交互联系。主要工作包括:(1)配置Gensim词向量训练模块,通过Gensim模块训练词向量,利用训练后的词向量学习韵律词向量,通过不同层级的向量来抓取上下文中的韵律层级信息;(2)利用韵律层级标注数据训练神经网络模型,使用词典词向量、韵律词向量、前词韵律层级向量和当前词的词长向量作步输入特征,通过在模型的输入层使用复合输入特征来提高模型的预测能力;(3)对模型隐藏层进行改进,在隐藏层中添加张量矩阵,通过张量矩阵捕捉词与词之间、不同韵律层级之间的联系,分别从窗口长度、空间维度、隐藏层单元数量、输入特征等方面对模型的韵律结构预测能力进行验证。实验结果显示:多向量结合的复合输入特征与单一词向量作为输入特征的预测结果相比,韵律词的错误率降低了3.2%(从15.3%降至12.1%),而韵律短语的错误率降低了5%(从40.3%降至35.3%);隐藏层中添加张量矩阵之后,韵律词的错误率降低了0.5%(从12.1%降至11.6%)。实验结果表明:复合输入特征能有效改善韵律预测的错误率;带有张量矩阵的隐藏层与普通隐藏层相比,能够更好地捕捉韵律层级之间的信息。
[Abstract]:The prediction of Chinese prosody plays an important role in the naturalness of synthetic speech. This paper focuses on how to improve the prediction accuracy of prosodic hierarchical structure. In the former statistical prosodic structure prediction model, POS (Part of Speech) feature is often used in the selection of input features for shallow information, which can not use semantic information of words, and the selected features are lack of correlation between words and words. The phenomenon of lexical gap is often formed, resulting in even synonyms can not show the corresponding relevance. Therefore, it is necessary to use a kind of expression which can reflect the relevance of words as the input feature of the model. In model selection, hidden Markov model and decision tree model have been successful in prosodic structure prediction, but some problems such as narrow scope of application and over-fitting will occur. With the increasing complexity of data, it is necessary to use a more powerful modeling method for data, while the depth neural network has a good ability to model complex data. Therefore, this paper focuses on the prosodic structure prediction model based on depth neural network model with word vector as input feature. On the one hand, we train the word vector, construct the prosodic word vector from the word vector, and use the compound vector as the input of the model; on the other hand, we improve the traditional neural network model. The hidden layer of neural network can better capture the interaction between words and words. The main work includes: (1) configure Gensim word vector training module, train word vector through Gensim module, use trained word vector to learn prosodic word vector, grab prosodic level information in context by different level vector; (2) training neural network model with prosodic level tagging data, using dictionary word vector, prosodic word vector, preword prosodic level vector and word length vector of current word as step input features. The prediction ability of the model is improved by using the compound input feature in the input layer of the model. (3) the hidden layer of the model is improved by adding the Zhang Liang matrix to the hidden layer, and the words and words are captured by the Zhang Liang matrix. The relationship between different prosodic levels verifies the prediction ability of the prosodic structure from the aspects of window length spatial dimension the number of hidden layer units input features and so on. The experimental results show that the error rate of prosodic words decreases by 3.2% (from 15.3% to 12.1%) and the error rate of prosodic phrases decreases by 5% (from 40.3% to 35.3%) compared with the prediction results of single word vector. After adding Zhang Liang matrix to the hidden layer, the error rate of prosodic words decreased by 0.5% (from 12.1% to 11.6%). The experimental results show that the compound input feature can effectively improve the error rate of prosodic prediction and that the hidden layer with Zhang Liang matrix can capture the information between prosodic levels better than the ordinary hidden layer.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1

【相似文献】

相关会议论文 前8条

1 李雅;卢颖超;许小颖;陶建华;;连续语流中韵律层级和调型组合对重音感知的影响[A];第九届中国语音学学术会议论文集[C];2010年

2 郑秋豫;;语流中韵律结构的主要徵信[A];第六届全国人机语音通讯学术会议论文集[C];2001年

3 张锦玉;;普通话语篇停延与呼吸特征初探[A];第九届中国语音学学术会议论文集[C];2010年

4 梁洁;杨新璐;;维汉广播新闻韵律层级边界特征研究[A];第十一届全国人机语音通讯学术会议论文集(一)[C];2011年

5 李雅;卢颖超;许小颖;陶建华;;连续语流中韵律层级和调型组合对重音感知的影响[A];第十一届全国人机语音通讯学术会议论文集(二)[C];2011年

6 李雅;卢颖超;许小颖;陶建华;;连续语流中韵律层级和调型组合对重音感知的影响[A];第十一届全国人机语音通讯学术会议论文集(一)[C];2011年

7 王天庆;李爱军;;基于SFC模型的韵律词音高模式研究[A];第八届全国人机语音通讯学术会议论文集[C];2005年

8 曹剑芬;;汉语韵律切分的语音学和语言学线索[A];新世纪的现代语音学——第五届全国现代语音学学术会议论文集[C];2001年

相关重要报纸文章 前1条

1 记者胡言午 通讯员黄立鹤;工程技术手段推动韵律研究[N];中国社会科学报;2012年

相关博士学位论文 前1条

1 于泽;书面韵律信息的作用及其加工机制的即时加工研究[D];辽宁师范大学;2011年

相关硕士学位论文 前1条

1 王琦;基于深度神经网络的韵律结构预测研究[D];北京交通大学;2016年



本文编号:2191200

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2191200.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户95aba***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com