当前位置:主页 > 科技论文 > 网络通信论文 >

自然口语语音识别中的声学建模研究

发布时间:2018-01-15 22:35

  本文关键词:自然口语语音识别中的声学建模研究 出处:《北京理工大学》2014年博士论文 论文类型:学位论文


  更多相关文章: 连续语音识别 声学模型 说话人自适应 区分性训练 区分性线性变换


【摘要】:声学建模是语音识别领域中的关键问题之一,其精确性直接影响语音识别系统的性能。如何建立更精确的声学模型一直以来都是研究者关注的重点。本文以提高声学模型参数的准确性和连续语音识别系统的性能为主要目的,对声学模型训练中状态聚类前三音子模型参数的估计和声学模型自适应进行了研究。 首先,为了提高汉语连续语音识别中决策树状态聚类的精度,对状态聚类前三音子模型的优化进行了研究。决策树的构建与其所用的三音子模型参数的准确度存在密切的关系。训练语料中存在大量的稀疏三音子,因此在声学模型的训练过程中,状态聚类前三音子模型的训练存在数据稀疏问题。针对此问题,提出采用最大后验概率(MAP)准则估计状态聚类前三音子的模型参数。另外,MAP估计对模型的初始参数要求较高,而仅是音调不同的带调声韵母三音子集合之间的相似度比只有中心音子相同的带调声韵母三音子集合之间的相似度要高,因此采用无调声韵母三音子的模型参数初始化有调声韵母三音子的模型的方法,来提高有调声韵母三音子模型的初始参数的准确度。通过这些策略,提高了系统的识别性能。 其次,对区分性最大后验概率自适应进行了研究。最小音素错误最大后验概率(MPE-MAP)算法在区分性训练中融入先验信息,实现了声学模型的区分性自适应。先验分布中的超参数的准确程度对MPE-MAP的性能有很大的影响,针对此情况,分别采用最大互信息最大后验概率(MMI-MAP)和基于最大互信息准则与最大似然准则相结合的H-criterion最大后验概率(H-MAP)自适应算法估计先验分布中的超参数,提出了MPE-MMI-MAP和MPE-H-MAP算法。两种算法通过提高超参数的准确度来使自适应后的模型得到优化,从而提高了自适应的性能。 然后,对区分性线性变换自适应进行了研究。I-smoothing技术对区分性线性变换自适应方法非常重要,其通过在区分性目标函数中加入变换矩阵的对数先验分布来实现。本论文在实现区分性线性变换中的平滑时采用均值的先验分布,提出了基于均值先验的平滑方法。如果用最大似然(ML)估计的统计量定义均值先验分布中的超参数,可以得到和I-smoothing相同的结果。针对自适应情景中数据量非常少,采用ML估计的参数存在准确度不高的问题,提出采用MAP估计的统计量定义先验分布中的超参数,使区分性线性变换在少量自适应数据的情况下得到性能提升。另外,为了将区分性和最大后验概率相结合,本论文设计了一个新的目标函数来估计线性变换参数,提出了区分性最大后验概率线性回归自适应算法。实验结果表明,该算法在少量自适应数据的情况下可以提高自适应的性能,在大量自适应数据情况下仍能保持区分性线性变换的性能。 最后,对线性投影(LP)自适应方法进行了研究。LP函数对多个初始模型进行线性变换,来得到自适应后的模型,,可以看作是线性回归(LR)函数的扩展。本论文提出了基于变换矩阵的LP自适应方法,该方法采用说话人自适应(SA)模型作为初始模型,并用变换矩阵表示特定人信息。在选择初始模型时采用了最大似然的方法,以选择具有最重要信息的模型作为初始模型,减少所要估计的参数的数量,从而实现了一种快速自适应算法。
[Abstract]:Acoustic modeling is one of the key issues in the field of speech recognition, its accuracy directly affects the performance of the speech recognition system. How to establish a more accurate acoustic model has always been the focus of researchers. In order to improve the accuracy of acoustic model parameters and continuous speech recognition performance system as the main purpose, estimation and acoustic model the adaptive state of acoustic model training in the three tone clustering model parameters are studied.
First of all, in order to improve Chinese continuous speech recognition in decision tree state clustering accuracy was studied to optimize clustering before three triphone models. Close relationship exists to build decision tree model and its parameters of three tone with accuracy. There are a large number of sparse three tone in the training corpus, and so on during the training of acoustic models, clustering before three tone sub models exist in the training data sparseness problem. To solve this problem, the maximum a posteriori (MAP) estimation of model parameters before three triphone state clustering criteria. In addition, the initial MAP estimates the parameters of the model are higher, but only the similarity of tone different tonal vowel sound three tone set between the center tone than only the same tonal vowel sound three tone between sets to be high, so the initial model parameters of unpitched sound tone of the three finals The model of the three tone child of the tone and vowel is used to improve the accuracy of the initial parameters of the three tone sub model with adjustable voice. Through these strategies, the recognition performance of the system is improved.
Secondly, to distinguish the maximum a posteriori probability is studied. The adaptive minimum phone error maximum a posteriori (MPE-MAP) algorithm with prior information in discriminative training, the discriminative adaptive acoustic model. Have a great influence on the performance accuracy of the hyper parameters in the prior distribution of MPE-MAP, in this case respectively, using maximum mutual information and maximum a posteriori (MMI-MAP) and the maximum mutual information criterion based on maximum likelihood criterion and combining the H-criterion maximum a posteriori (H-MAP) estimation adaptive algorithm hyperparameter of the proposed MPE-MMI-MAP and MPE-H-MAP algorithm. Two improved optimization accuracy parameters to adaptive the model by the algorithm, which improves the adaptive performance.
Then, the distinction of.I-smoothing adaptive linear transformation technology is very important to distinguish between linear transform and adaptive method, the logarithmic transformation matrix in the prior distribution of added distinction in the objective function to achieve. This thesis distinguish smooth linear transformation in the prior distribution of the mean, the mean value smoothing method based on prior. If using the maximum likelihood (ML) parameter statistics definition mean distribution a priori estimates of the I-smoothing and can get the same results. According to the data in the context of very small amount of adaptive, using ML to estimate the parameters of the existing accuracy is not high, the super parameter statistics definition MAP estimation of prior distribution in the linear transform. The difference between performance improvement in a small amount of adaptive data. In addition, in order to distinguish and maximum a posteriori probability In combination, this paper designed a new objective function to estimate the parameters of linear transformation, presents a discriminative maximum a posteriori linear regression algorithm. The experimental results show that this algorithm can improve the performance of adaptive adaptive in a small amount of data, can still maintain the performance of discriminative linear transform in adaptive the absence of data.
Finally, the linear projection (LP) adaptive method is used to study the.LP function of a linear transformation of the initial model, to get the adaptive model, can be seen as a linear regression (LR) functions. This paper proposes a LP adaptive method based on the transformation matrix, the method adopts the speaker adaptation (SA) model as the initial model, and indicates the specific information with the transformation matrix. In the selection of the initial model using the maximum likelihood method, is the most important information to select the model as the initial model, reducing the number of parameters to be estimated, so as to achieve a fast adaptive algorithm.

【学位授予单位】:北京理工大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:TN912.34

【参考文献】

相关期刊论文 前9条

1 张文林;牛铜;张连海;李弼程;;基于最大似然可变子空间的快速说话人自适应方法[J];电子与信息学报;2012年03期

2 倪崇嘉;刘文举;徐波;;汉语大词汇量连续语音识别系统研究进展[J];中文信息学报;2009年01期

3 黄浩;朱杰;哈力旦;;汉语语音识别中的区分性声调建模方法[J];计算机工程与应用;2009年11期

4 郭锐,朱小燕;参数共享在语音识别中的应用[J];清华大学学报(自然科学版);2002年10期

5 李净,郑方,张继勇,吴文虎;汉语连续语音识别中上下文相关的声韵母建模[J];清华大学学报(自然科学版);2004年01期

6 吴华,徐波,黄泰翼;基于三音子模型的语料自动选择算法[J];软件学报;2000年02期

7 郑方,牟晓隆,徐明星,武健,宋战江;汉语语音听写机技术的研究与实现[J];软件学报;1999年04期

8 吕萍,王作英,陆大■;基于矩阵线性插值的说话人自适应算法[J];清华大学学报(自然科学版);2002年01期

9 ;Speaker Adaptation with Transformation Matrix Linear Interpolation[J];Wuhan University Journal of Natural Sciences;2004年06期



本文编号:1430403

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/wltx/1430403.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户8857b***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com