基于深度神经网络的韵律结构预测研究
[Abstract]:The prediction of Chinese prosody plays an important role in the naturalness of synthetic speech. This paper focuses on how to improve the prediction accuracy of prosodic hierarchical structure. In the former statistical prosodic structure prediction model, POS (Part of Speech) feature is often used in the selection of input features for shallow information, which can not use semantic information of words, and the selected features are lack of correlation between words and words. The phenomenon of lexical gap is often formed, resulting in even synonyms can not show the corresponding relevance. Therefore, it is necessary to use a kind of expression which can reflect the relevance of words as the input feature of the model. In model selection, hidden Markov model and decision tree model have been successful in prosodic structure prediction, but some problems such as narrow scope of application and over-fitting will occur. With the increasing complexity of data, it is necessary to use a more powerful modeling method for data, while the depth neural network has a good ability to model complex data. Therefore, this paper focuses on the prosodic structure prediction model based on depth neural network model with word vector as input feature. On the one hand, we train the word vector, construct the prosodic word vector from the word vector, and use the compound vector as the input of the model; on the other hand, we improve the traditional neural network model. The hidden layer of neural network can better capture the interaction between words and words. The main work includes: (1) configure Gensim word vector training module, train word vector through Gensim module, use trained word vector to learn prosodic word vector, grab prosodic level information in context by different level vector; (2) training neural network model with prosodic level tagging data, using dictionary word vector, prosodic word vector, preword prosodic level vector and word length vector of current word as step input features. The prediction ability of the model is improved by using the compound input feature in the input layer of the model. (3) the hidden layer of the model is improved by adding the Zhang Liang matrix to the hidden layer, and the words and words are captured by the Zhang Liang matrix. The relationship between different prosodic levels verifies the prediction ability of the prosodic structure from the aspects of window length spatial dimension the number of hidden layer units input features and so on. The experimental results show that the error rate of prosodic words decreases by 3.2% (from 15.3% to 12.1%) and the error rate of prosodic phrases decreases by 5% (from 40.3% to 35.3%) compared with the prediction results of single word vector. After adding Zhang Liang matrix to the hidden layer, the error rate of prosodic words decreased by 0.5% (from 12.1% to 11.6%). The experimental results show that the compound input feature can effectively improve the error rate of prosodic prediction and that the hidden layer with Zhang Liang matrix can capture the information between prosodic levels better than the ordinary hidden layer.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1
【相似文献】
相关会议论文 前8条
1 李雅;卢颖超;许小颖;陶建华;;连续语流中韵律层级和调型组合对重音感知的影响[A];第九届中国语音学学术会议论文集[C];2010年
2 郑秋豫;;语流中韵律结构的主要徵信[A];第六届全国人机语音通讯学术会议论文集[C];2001年
3 张锦玉;;普通话语篇停延与呼吸特征初探[A];第九届中国语音学学术会议论文集[C];2010年
4 梁洁;杨新璐;;维汉广播新闻韵律层级边界特征研究[A];第十一届全国人机语音通讯学术会议论文集(一)[C];2011年
5 李雅;卢颖超;许小颖;陶建华;;连续语流中韵律层级和调型组合对重音感知的影响[A];第十一届全国人机语音通讯学术会议论文集(二)[C];2011年
6 李雅;卢颖超;许小颖;陶建华;;连续语流中韵律层级和调型组合对重音感知的影响[A];第十一届全国人机语音通讯学术会议论文集(一)[C];2011年
7 王天庆;李爱军;;基于SFC模型的韵律词音高模式研究[A];第八届全国人机语音通讯学术会议论文集[C];2005年
8 曹剑芬;;汉语韵律切分的语音学和语言学线索[A];新世纪的现代语音学——第五届全国现代语音学学术会议论文集[C];2001年
相关重要报纸文章 前1条
1 记者胡言午 通讯员黄立鹤;工程技术手段推动韵律研究[N];中国社会科学报;2012年
相关博士学位论文 前1条
1 于泽;书面韵律信息的作用及其加工机制的即时加工研究[D];辽宁师范大学;2011年
相关硕士学位论文 前1条
1 王琦;基于深度神经网络的韵律结构预测研究[D];北京交通大学;2016年
,本文编号:2191200
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2191200.html