Research of Automatic Speech Recognition of the Asante-Twi D
发布时间:2021-12-17 07:14
自动语音识别(ASR)是语音翻译系统的第一个也是最重要的阶段,语音数据库是其中最重要的资源。然而,高质量的ASR需要一个非常大的语音数据库资源。属于阿肯语的阿桑特-特维方言被认为资源极为匮乏,语音数据资源的收集成为严重障碍。本文提出了一种利用小型数据库构建低资源方言ASR系统的新方法,并取得了良好的效果。首先分析了该方言的特点,设计并收集整理了一个典型的Asante-Twi语音数据库,为更多的语音识别工作奠定了基础。由于没有相关人员进行过Asante-Twi方言识别的相关工作,没有可信参照,为了选择一个可靠地Asante Twi语音识别系统的算法和特征,本文利用Kaldi工具包建立了三个不同特征和方法的ASR系统。为了提高ASR系统的性能,采用倒谱均值方差归一化(CMVN)和δ(Δ)动态特征对系统的所有特征提取方法进行了改进。此外,采用GMM-HMM模式分类器算法对每个ASR系统的声学模型单元进行了改进,训练了两个上下文相关(triphone)模型,以提供更好的性能。第一个ASR系统采用了MFCC特征提取方法,第二个ASR系统使用上下文相关参数的MFCCs,第三个ASR系统则使用PLP...
【文章来源】:西南科技大学四川省
【文章页数】:72 页
【学位级别】:硕士
【文章目录】:
摘要
ABSTRACT
Main Symbol Table
1 Introduction
1.1 Background and Significance of Study
1.2 Problem Statement
1.3 Akan Language and the Twi Dialect
1.4 Related Work
1.5 Goals of the Thesis
1.6 Thesis Chapter Arrangement
2 Basics of Automatic Speech Recognition
2.1 Mathematical Representation of an ASR System
2.2 Basic Architecture of an ASR System
2.2.1 Signal Processing / Feature Extraction
2.2.2 Language Model
2.2.3 Lexicon
2.2.4 Acoustic Model
2.2.5 Pattern Classification of Acoustic Vectors
2.2.6 Decoding
2.3 Metrics for Performance Measurement
2.4 Summary of the Chapter
3 Approach to Asante-Twi ASR System Realization
3.1 The Kaldi Toolkit Overview
3.2 Asante-Twi Dialect Manual Data Preparation
3.2.1 Audio Data
3.2.2 Acoustic Data
3.2.3 Language Data
3.3 Asante-Twi Dialect Feature Extraction Processes
3.3.1 Mel Frequency Cepstral Coefficients (MFCC)
3.3.2 Perceptual Linear Prediction (PLP)
3.3.3 Cepstral Mean and Variance Normalization(CMVN)
3.3.4 Delta and Delta-Delta Features
3.4 Asante-Twi Dialect Language Modeling
3.5 Acoustic Modeling
3.5.1 Gaussian Mixture Model(GMM)
3.5.2 Hidden Markov Model(HMM)
3.5.3 Generative Learning Approach: GMM-HMM Algorithm
3.6 Asante-Twi Dialect ASR Systems Training
3.6.1 Monophone Training
3.6.2 First Triphone Training
3.6.3 Second Triphone Training
3.7 Asante-Twi Dialect ASR Systems Testing
3.7.1 Monophone Testing
3.7.2 First Triphone Testing
3.7.3 Second Triphone Testing
3.8 Summary of the Chapter
4 Results and Discussion of Asante-Twi ASR Systems
4.1 Performance Measurement Metrics for Asante-Twi ASR Systems
4.1.1 Word Error Rate(WER)
4.1.2 Sentence Error Rate(SER)
4.2 Analysis of Results of Decoding
4.2.1 First Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 11000Gaussians) and?-?(2500Leaves, 15000Gaussians)transformations
4.2.2 Second Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 10000Gaussians) and ?-?(2500Leaves, 15000Gaussians)transformations
4.2.3 Third Asante-Twi Dialect ASR System using PLPs and ?(2000Leaves, 10000Gaussians) + ?-?(2500Leaves, 15000Gaussians)transformations
4.2.4 Comparison of the Best Performances of All Three Asante-Twi Dialect ASR Systems
4.3 Summary of the Chapter
5 Conclusion
5.1 Overall Summary
5.2 Limitations, Future Works and Beyond
Acknowledgement
References
本文编号:3539633
【文章来源】:西南科技大学四川省
【文章页数】:72 页
【学位级别】:硕士
【文章目录】:
摘要
ABSTRACT
Main Symbol Table
1 Introduction
1.1 Background and Significance of Study
1.2 Problem Statement
1.3 Akan Language and the Twi Dialect
1.4 Related Work
1.5 Goals of the Thesis
1.6 Thesis Chapter Arrangement
2 Basics of Automatic Speech Recognition
2.1 Mathematical Representation of an ASR System
2.2 Basic Architecture of an ASR System
2.2.1 Signal Processing / Feature Extraction
2.2.2 Language Model
2.2.3 Lexicon
2.2.4 Acoustic Model
2.2.5 Pattern Classification of Acoustic Vectors
2.2.6 Decoding
2.3 Metrics for Performance Measurement
2.4 Summary of the Chapter
3 Approach to Asante-Twi ASR System Realization
3.1 The Kaldi Toolkit Overview
3.2 Asante-Twi Dialect Manual Data Preparation
3.2.1 Audio Data
3.2.2 Acoustic Data
3.2.3 Language Data
3.3 Asante-Twi Dialect Feature Extraction Processes
3.3.1 Mel Frequency Cepstral Coefficients (MFCC)
3.3.2 Perceptual Linear Prediction (PLP)
3.3.3 Cepstral Mean and Variance Normalization(CMVN)
3.3.4 Delta and Delta-Delta Features
3.4 Asante-Twi Dialect Language Modeling
3.5 Acoustic Modeling
3.5.1 Gaussian Mixture Model(GMM)
3.5.2 Hidden Markov Model(HMM)
3.5.3 Generative Learning Approach: GMM-HMM Algorithm
3.6 Asante-Twi Dialect ASR Systems Training
3.6.1 Monophone Training
3.6.2 First Triphone Training
3.6.3 Second Triphone Training
3.7 Asante-Twi Dialect ASR Systems Testing
3.7.1 Monophone Testing
3.7.2 First Triphone Testing
3.7.3 Second Triphone Testing
3.8 Summary of the Chapter
4 Results and Discussion of Asante-Twi ASR Systems
4.1 Performance Measurement Metrics for Asante-Twi ASR Systems
4.1.1 Word Error Rate(WER)
4.1.2 Sentence Error Rate(SER)
4.2 Analysis of Results of Decoding
4.2.1 First Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 11000Gaussians) and?-?(2500Leaves, 15000Gaussians)transformations
4.2.2 Second Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 10000Gaussians) and ?-?(2500Leaves, 15000Gaussians)transformations
4.2.3 Third Asante-Twi Dialect ASR System using PLPs and ?(2000Leaves, 10000Gaussians) + ?-?(2500Leaves, 15000Gaussians)transformations
4.2.4 Comparison of the Best Performances of All Three Asante-Twi Dialect ASR Systems
4.3 Summary of the Chapter
5 Conclusion
5.1 Overall Summary
5.2 Limitations, Future Works and Beyond
Acknowledgement
References
本文编号:3539633
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/3539633.html