语音半自动标注系统的设计与实现

发布时间：2018-06-19 06:50

本文选题：DAEM算法 + STRAIGHT算法　；参考：《西北师范大学》2015年硕士论文

【摘要】：随着当代信息技术日新月异的发展,人们对语音合成和语音识别的效果提出更高的要求,越来越多的实验室研究成果被应用到实际生活中,各种语音系统产品不断问世。构建大规模的语料库是设计优秀语音系统不可缺少的一项任务,而是否对语料库进行精确标注,则决定语料库质量的优劣,因此语料库的标注在语音研究中起到关键性的作用。大量的人工标注不仅耗时、耗力、成本大,而且由于人耳对于词或语句中单个音节的边界不敏感,标注数据会产生较大的误差。论文设计了一个语音语料的半自动标注系统,能够自动计算出语音语料的边界和基频包络,在此基础上手工矫正自动标注结果,实现语音语料边界和基频包络的准确标注。论文的主要工作与创新如下:1.实现了语音基元边界的自动标注算法。对录制好的无时间标注语音文件,采用基于隐Markov模型(Hidden Markov Model,HMM)的强制对齐算法进行时间边界的自动对齐。在HMM模型训练过程的重估步骤中,引入了确定性模拟退火期望值最大(Deterministic Annealing Expectation Maximization,DAEM)算法,提高了语音基元边界强制对齐的准确性。2.实现了语音基频的自动标注算法。在语料时长边界标注的基础上,采用STRAIGHT(Speech Transformation and Representation based on Adaptive Interpolation of w eighted spectrogram)算法提取语音的基频,并对提取出的基频数据进行平滑。根据两峰值点距离是基频周期的关系,获得峰值点标注位置,从峰值点形成的基频包络曲线,可以直接发现漏标、错标的峰值点。通过人工修正,得到更加准确的标注数据。这也就是半自动标注系统的体现。3.设计实现了一个语音半标注系统。系统采用图形化用户交互界面,在语音波形上画出每个语音基元的边界,同时将STRAIGHT算法的基频,转换成语音波形上的峰值点标注。在此基础上,设计实现了手工修改语音基元边界和峰值点标注的功能,以完成更为精确的语音基元边界以及基频包络的标注,最终实现可视化的语音半自动标注系统的设计。4.对兰州方言进行了实验语音学分析。利用实现的语音半自动标注系统,标注了兰州方言单字的边界和基频,并进行了实验语音学分析,验证了兰州方言单字的语音学结论。
[Abstract]:With the rapid development of modern information technology, people put forward higher requirements for the effect of speech synthesis and speech recognition. More and more laboratory research results have been applied to the real life, and a variety of speech system products are coming out. Constructing a large scale corpus is an indispensable task in the design of excellent speech system. Whether or not to accurately annotate the corpus determines the quality of the corpus, so the annotation of the corpus plays a key role in the phonological research. A large number of manual tagging is not only time-consuming, labor-intensive and costly, but also because the ear is insensitive to the boundary of a single syllable in a word or sentence, the tagging data will produce a large error. In this paper, a semi-automatic tagging system of speech corpus is designed, which can automatically calculate the boundary of speech corpus and the envelope of fundamental frequency. On this basis, the automatic tagging results can be corrected manually, and the accurate tagging of the boundary of speech corpus and the envelope of fundamental frequency can be realized. The main work and innovation of this paper are as follows: 1. An automatic algorithm for marking the edge of speech primitives is implemented. Based on hidden Markov model and Hidden Markov Model (HMMM), an automatic time boundary alignment algorithm is used to automatically align the recorded time-free speech files. In the revaluation step of hmm training process, deterministic Annealing expectation maximization (DAEMEM) algorithm is introduced, which improves the accuracy of speech primitive boundary forced alignment. The automatic marking algorithm of speech fundamental frequency is realized. On the basis of time-length boundary annotation, the speech and representation based on Adaptive of w eighted spectrogram) algorithm is used to extract the fundamental frequency of speech, and the extracted fundamental frequency data is smoothed. According to the relationship between the distance between two peaks and the fundamental frequency period, the tagging position of the peak point is obtained, and the fundamental frequency envelope curve formed from the peak point can directly find the missing mark and the wrong target peak point. Through manual correction, more accurate tagging data can be obtained. This is the semiautomatic tagging system. 3. A speech semi-label system is designed and implemented. The system uses a graphical user interface to draw the boundaries of each speech primitive on the speech waveform. At the same time, the fundamental frequency of the Straight algorithm is converted into the peak point annotation on the speech waveform. On this basis, the function of manually modifying the speech primitive boundary and peak point tagging is designed and realized, so as to complete the more accurate voice-element boundary and the tagging of the fundamental frequency envelope. Finally, the design of the visualized semi-automatic voice tagging system .4. This paper analyzes the experimental phonetics of Lanzhou dialect. In this paper, the boundary and fundamental frequency of single words in Lanzhou dialect are annotated by using the realized phonetic semiautomatic marking system, and the experimental phonetics analysis is carried out to verify the phonological conclusions of Lanzhou dialect words.
【学位授予单位】：西北师范大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TN912.3

【相似文献】