三维几何发音模型的构建与控制

发布时间：2019-05-22 08:32

【摘要】：基于发音机理的语音合成模型模拟语音生成的发音运动和空气动力学过程。我们尝试构建一个更加精确的发音运动模型来逼近发音器官的形态学特性,从而得到一个更好的发音合成系统。目前有两个主流的建模策略:生理模型和几何模型。本文基于中文数据库构建三维几何发音模型,与神经生理模型相比较,这一几何模型忽略复杂肌肉力的影响。因此,几何发音模型的实时性随着运算量的减少而得到提高,这使得几何发音模型适用于实时性要求比较高的应用。本文提出了一种基于MRI(磁共振成像)和CBCT(锥形束C T)构建三维几何发音模型的新方法,由于磁共振成像技术能够比较清晰地成像出声道发音器官轮廓的形状,并且磁共振成像技术对人体造成的伤害较小,因此越来越多的应用于语音合成研究。由于骨质结构不能在MRI中直接清晰地采集成像,我们采集了CBCT的数据来补充骨质结构的信息,进行上下颚的填补。通过磁共振成像技术采集得到的发音器官的数据库,对于构建出声道模型进而分析不同发音带来的声道发音器官形状的变化规律具有很大的优势。并且以其建立精确的三维声道模型,进一步对发音过程的声道可视化,对于语音教学应用和语音生成机理分析等都具有重要的意义。本文对中文磁共振数据库中一个受试者的104组发音数据进行研究,研究方法具体步骤如下:数据库及其预处理,数据标注以及三维网格建模,数据分析以及验证评价,碰撞检测以及响应。线性成分分析方法结果显示,每个发音器官可以用三个以内参数来很好地进行描述,并且参数控制集的累积贡献率高于80%。用此分析结果对各个发音器官进行重构而得到的均方根误差均小于1.0 mm。本文创新点在于提出了一种新颖的三维声道发音器官建模方法,其中我们考虑了发音器官的生理边界点,建模过程有两个主要的改进,融合不同切片的数据来提升发音器官轮廓的标注精确性以及根据发音器官的解刨结构来建立发音器官的三维网格。这样既保证了发音器官的完整性,又保留了发音器官上生理特征点的对应性。最后,本文构建了基于汉语发音数据的三维几何发音模型,这对于汉语的语音语言教学,汉语普通话的广泛推广,语音的病理纠正等应用提供了理论基础。
[Abstract]:The speech synthesis model based on pronunciation mechanism simulates the pronunciation motion and aerodynamics process of speech generation. We try to construct a more accurate pronunciation motion model to approximate the morphological characteristics of pronunciation organs, so as to obtain a better pronunciation synthesis system. At present, there are two mainstream modeling strategies: physiological model and geometric model. In this paper, a three-dimensional geometric pronunciation model is constructed based on Chinese database. Compared with the neurophysiological model, this geometric model ignores the influence of complex muscle strength. Therefore, the real-time performance of geometric pronunciation model is improved with the decrease of computation, which makes the geometric pronunciation model suitable for applications with high real-time requirements. In this paper, a new method of constructing 3D geometric pronunciation model based on MRI (magnetic resonance imaging) and CBCT (conical beam CT) is proposed. Because magnetic resonance imaging (MRI) technology can clearly image the shape of vocal organ outline, And magnetic resonance imaging (MRI) is less harmful to human body, so it is more and more used in speech synthesis research. Because the bone structure can not be collected directly and clearly in MRI, we collect the data of CBCT to supplement the information of bone structure and fill the upper and lower jaws. The database of vocal organs collected by magnetic resonance imaging (MRI) has great advantages in building a channel model and analyzing the shape of vocal organs caused by different sounds. It is of great significance for the application of pronunciation teaching and the analysis of phonetic generation mechanism to establish an accurate three-dimensional channel model to further visualization of the pronunciation process. In this paper, 104 groups of pronunciation data of a subject in Chinese magnetic resonance database are studied. The specific steps of the research method are as follows: database and its preprocessing, data tagging and 3D grid modeling, data analysis and verification and evaluation. Collision detection and response. The results of linear component analysis show that each pronunciation organ can be well described by less than three parameters, and the cumulative contribution rate of the parameter control set is more than 80%. The root mean square errors obtained from the reconstruction of each pronunciation organ are less than 1.0 mm.. The innovation of this paper is to propose a novel modeling method of three-dimensional vocal organs, in which we consider the physiological boundary points of vocal organs, and there are two main improvements in the modeling process. The data of different slices are combined to improve the accuracy of phonetic organ outline marking and to establish the three-dimensional grid of pronunciation organ according to the unplaning structure of pronunciation organ. This not only ensures the integrity of the pronunciation organ, but also preserves the correspondence of the physiological characteristic points on the pronunciation organ. Finally, a three-dimensional geometric pronunciation model based on Chinese pronunciation data is constructed, which provides a theoretical basis for the application of Chinese phonetic language teaching, the extensive promotion of Chinese Putonghua, and the pathological correction of pronunciation.
【学位授予单位】：天津大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.41;TN912.3

【相似文献】