当前位置:主页 > 科技论文 > 信息工程论文 >

Research of Automatic Speech Recognition of the Asante-Twi D

发布时间:2021-12-17 07:14
  自动语音识别(ASR)是语音翻译系统的第一个也是最重要的阶段,语音数据库是其中最重要的资源。然而,高质量的ASR需要一个非常大的语音数据库资源。属于阿肯语的阿桑特-特维方言被认为资源极为匮乏,语音数据资源的收集成为严重障碍。本文提出了一种利用小型数据库构建低资源方言ASR系统的新方法,并取得了良好的效果。首先分析了该方言的特点,设计并收集整理了一个典型的Asante-Twi语音数据库,为更多的语音识别工作奠定了基础。由于没有相关人员进行过Asante-Twi方言识别的相关工作,没有可信参照,为了选择一个可靠地Asante Twi语音识别系统的算法和特征,本文利用Kaldi工具包建立了三个不同特征和方法的ASR系统。为了提高ASR系统的性能,采用倒谱均值方差归一化(CMVN)和δ(Δ)动态特征对系统的所有特征提取方法进行了改进。此外,采用GMM-HMM模式分类器算法对每个ASR系统的声学模型单元进行了改进,训练了两个上下文相关(triphone)模型,以提供更好的性能。第一个ASR系统采用了MFCC特征提取方法,第二个ASR系统使用上下文相关参数的MFCCs,第三个ASR系统则使用PLP... 

【文章来源】:西南科技大学四川省

【文章页数】:72 页

【学位级别】:硕士

【文章目录】:
摘要
ABSTRACT
Main Symbol Table
1 Introduction
    1.1 Background and Significance of Study
    1.2 Problem Statement
    1.3 Akan Language and the Twi Dialect
    1.4 Related Work
    1.5 Goals of the Thesis
    1.6 Thesis Chapter Arrangement
2 Basics of Automatic Speech Recognition
    2.1 Mathematical Representation of an ASR System
    2.2 Basic Architecture of an ASR System
        2.2.1 Signal Processing / Feature Extraction
        2.2.2 Language Model
        2.2.3 Lexicon
        2.2.4 Acoustic Model
        2.2.5 Pattern Classification of Acoustic Vectors
        2.2.6 Decoding
    2.3 Metrics for Performance Measurement
    2.4 Summary of the Chapter
3 Approach to Asante-Twi ASR System Realization
    3.1 The Kaldi Toolkit Overview
    3.2 Asante-Twi Dialect Manual Data Preparation
        3.2.1 Audio Data
        3.2.2 Acoustic Data
        3.2.3 Language Data
    3.3 Asante-Twi Dialect Feature Extraction Processes
        3.3.1 Mel Frequency Cepstral Coefficients (MFCC)
        3.3.2 Perceptual Linear Prediction (PLP)
        3.3.3 Cepstral Mean and Variance Normalization(CMVN)
        3.3.4 Delta and Delta-Delta Features
    3.4 Asante-Twi Dialect Language Modeling
    3.5 Acoustic Modeling
        3.5.1 Gaussian Mixture Model(GMM)
        3.5.2 Hidden Markov Model(HMM)
        3.5.3 Generative Learning Approach: GMM-HMM Algorithm
    3.6 Asante-Twi Dialect ASR Systems Training
        3.6.1 Monophone Training
        3.6.2 First Triphone Training
        3.6.3 Second Triphone Training
    3.7 Asante-Twi Dialect ASR Systems Testing
        3.7.1 Monophone Testing
        3.7.2 First Triphone Testing
        3.7.3 Second Triphone Testing
    3.8 Summary of the Chapter
4 Results and Discussion of Asante-Twi ASR Systems
    4.1 Performance Measurement Metrics for Asante-Twi ASR Systems
        4.1.1 Word Error Rate(WER)
        4.1.2 Sentence Error Rate(SER)
    4.2 Analysis of Results of Decoding
        4.2.1 First Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 11000Gaussians) and?-?(2500Leaves, 15000Gaussians)transformations
        4.2.2 Second Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 10000Gaussians) and ?-?(2500Leaves, 15000Gaussians)transformations
        4.2.3 Third Asante-Twi Dialect ASR System using PLPs and ?(2000Leaves, 10000Gaussians) + ?-?(2500Leaves, 15000Gaussians)transformations
        4.2.4 Comparison of the Best Performances of All Three Asante-Twi Dialect ASR Systems
    4.3 Summary of the Chapter
5 Conclusion
    5.1 Overall Summary
    5.2 Limitations, Future Works and Beyond
Acknowledgement
References



本文编号:3539633

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/3539633.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户caeaa***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com