基于性别分类的说话人识别研究

发布时间：2018-01-29 11:51

本文关键词： 汉语方言数据库性别识别说话人识别矢量量化支持向量机　出处：《江苏师范大学》2012年硕士论文　论文类型：学位论文

【摘要】：语音信号既包含了说话人的语义信息，又包含了说话人的个性信息，人们从中可以提取说话人的性别、年龄、籍贯等身份信息。说话人识别是根据语音信号中反映说话人的语音参数自动确定说话人身份的技术。作为一种生物认证技术，在信息检索、公安破案、语音身份验证、电话银行等领域具有重要的应用价值和广泛的应用前景。论文从数据采集到特征提取和分类识别进行了系统研究，取得了下列创新性成果。 1、建立一个汉语方言语音数据库参照国际上语音语料库的设计标准，考虑录音通道、方言种类、话者年龄与性别分布的选择。最终建立起一个涵盖了闽、粤、吴、湘、北方、赣、客家等七种地方方言和普通话的汉语方言语音数据库。包括宽带语音（麦克风）和窄带语音（手机、固定电话），，106小时的语音数据。 2、提出一种基于码本模型的性别辨识方法首次在性别识别研究中引入半监督聚类技术，利用半监督学习的思想对汉语方言的语音数据进行矢量量化，形成具有监督信息的男、女性别码本的模型。该方法充分考虑了语音特征空间的概率分布状态，优化了码本的生成方法，提高了码本模型的精确度，解决了传统矢量量化算法中码本生成精度低的问题，有效提高了系统的识别效果。实验结果表明，在有噪语音和纯净语音环境下与传统矢量量化算法比较，在识别精度、系统稳定性鲁棒性等方面都明显提高。 3、改进混合SVM的说话人识别方法 SVM以结构风险最小化为准则，类别区分能力强，输出结果反映了异类样本间的差异性，适用于处理连续输入向量下的分类问题。为此，我们改进了应用于说话人识别的混合SVM模型识别系统。该方法在将大样本数据进行分割和聚类的基础上，为每一类样本语音都构造一个SVM进行训练，并综合所有的SVM输出结果进行决策分类。较好的解决因话者数量增加和语音数据规模过大带来的系统时间代价过大、识别效率低下的问题，有效地提高了话者识别系统的分类决策能力。 4、建立了分层话者识别系统目前说话人识别难以大量数据下系统的实时应用，随着语音数据库规模的不断扩大，依据现有技术，系统在识别时间、内存需求及识别精度等方面都难以满足实时辨识的需求。论文讨论了MFCC、SDC等几种不同特征在识别系统中的表现，并依据分类搜索的思想，利用方言辨识、性别辨识等技术，缩小说话人识别中的数量和范围，再使用话者识别技术进行辨识，最终确定每一位说话人的身份，努力寻求建立一个最优的说话人识别系统模型。
[Abstract]:Speech signals not only contain the semantic information of the speaker, but also contain the personality information of the speaker, from which people can extract the sex and age of the speaker. Speaker recognition is a technology that automatically determines the identity of the speaker according to the voice parameters of the speaker. As a biometric authentication technology, it is used in information retrieval and public security to solve a case. Voice authentication, telephone banking and other fields have important application value and wide application prospects. This paper systematically studies data acquisition, feature extraction and classification recognition, and obtains the following innovative results. 1. Establish a phonetic database of Chinese dialects Referring to the design standards of international phonetic corpus, considering the choice of recording channels, dialect types, age and gender distribution of speakers. Finally, to establish a covering Fujian, Guangdong, Wu, Xiang, northern, Jiangxi. Hakka and other seven local dialects and Mandarin Chinese dialect voice database, including broadband voice (microphone) and narrowband voice (mobile phone, fixed telephone / telephone / 106 hours of voice data). 2. A method of sex identification based on codebook model is proposed. For the first time, semi-supervised clustering technology is introduced in the research of gender recognition, and the speech data of Chinese dialects are vectorized by semi-supervised learning to form men with supervised information. This method fully considers the probability distribution state of speech feature space, optimizes the codebook generation method, and improves the accuracy of codebook model. It solves the problem of low precision of codebook generation in the traditional vector quantization algorithm and effectively improves the recognition effect of the system. The experimental results show that the algorithm is compared with the traditional vector quantization algorithm in noisy speech and pure speech environment. The recognition accuracy and system stability robustness are improved obviously. 3, improve the method of speaker recognition based on hybrid SVM SVM takes structural risk minimization as the criterion and has strong ability to distinguish categories. The output results reflect the differences between different samples and are suitable for dealing with classification problems under continuous input vectors. We improve the hybrid SVM model recognition system which is applied to speaker recognition. Based on the segmentation and clustering of large sample data, we construct a SVM for each class of speech samples. And synthesizes all the SVM output results to carry on the decision classification, which solves the problem that the system time cost is too large and the recognition efficiency is low due to the increase of the number of speakers and the large scale of speech data. The classification decision ability of speaker recognition system is improved effectively. 4. A hierarchical speaker recognition system is established At present, speaker recognition is difficult to be used in real time under a large amount of data. With the continuous expansion of the scale of speech database, according to the existing technology, the system is in the recognition time. Memory requirements and recognition accuracy are difficult to meet the needs of real-time identification. This paper discusses the performance of several different features such as MFCC / SDC in the recognition system, and according to the idea of classification and search. By using dialect identification, gender identification and other techniques, the number and scope of speaker recognition are reduced, and then the speaker recognition technology is used to identify each speaker. Try to establish an optimal speaker recognition system model.
【学位授予单位】：江苏师范大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：H17

【参考文献】