改进的稀疏最小二乘支持向量机在语音识别中的应用

发布时间：2018-04-24 02:18

本文选题：语音识别 + 最小二乘支持向量机　；参考：《太原理工大学》2014年硕士论文

【摘要】：语音识别是一种最直接、最便捷的人机交互手段,属于多维模式识别的范畴。最小二乘支持向量机是机器学习领域目前研究较热的一种模式识别算法,作为标准支持向量机的一种扩展,具有小样本学习、能够避免“高维维数灾难”和模型训练算法简单易实现的优点,因此适用于复杂的语音信号的识别。但其具有解的稀疏性缺失的缺点,造成模型复杂度的提高和系统识别速度的降低,本文针对这个问题展开了研究,具体研究内容如下： (1)深入研究了语音识别系统和最小二乘支持向量机原理,将最小二乘支持向量机引入到语音识别系统中,克服了传统语音识别方法中隐马尔可夫模型需要先验分布知识、人工神经网络容易出现“过学习”的缺陷。 (2)仔细研究了模型参数对系统的学习能力和泛化能力的重要性,提出采用粒子群全局优化算法结合K折交叉验证的方案进行最佳参数寻优,避免了人工手动调试复杂和网格算法耗时长的问题。 (3)在深入研究最小二乘支持向量机稀疏性缺失的原因和语音样本特征维数对模型性能影响的基础上,提出采用基于独立成分分析的最小二乘支持向量机稀疏化方法。该方法首先采用独立成分分析方法进行语音特征降维；然后在模型训练之后采用基于独立成分分析的快速剪枝算法对核矩阵进行约简,约简过程中采用峰度和偏度的组合作为独立成分重要性的度量指标,以此来解决独立成分的排序问题。韩语语音库上的实验表明,该算法在有效实现模型稀疏化的同时保证了模型识别精度。 (4)针对非支持向量参与模型训练会造成模型复杂度提高和模型识别性能降低的问题,本文从数据挖掘和支持向量的几何分布含义两个方面出发,提出了基于支持向量预选取的最小二乘支持向量机稀疏化算法。该算法在模型训练之前,将K均值聚类算法提取的关键表征样本和中心距离比值算法选取的边界样本的并集作为预选支持向量,从而有效实现了稀疏化。经韩语语音库和Aurora-2语音库实验表明,该方法在几乎不损失识别精度的基础上提高了识别速度,达到了稀疏化的目的。
[Abstract]:Speech recognition is the most direct and convenient means of human-computer interaction, which belongs to the category of multidimensional pattern recognition. Least squares support vector machine (LS-SVM) is a hot pattern recognition algorithm in the field of machine learning. As an extension of standard SVM, LS-SVM has small sample learning. It can avoid the "high dimension disaster" and the advantages of simple and easy to implement the model training algorithm, so it is suitable for the recognition of complex speech signals. However, it has the disadvantage of lack of sparse solution, which leads to the increase of model complexity and the reduction of system recognition speed. This paper studies this problem, and the specific research contents are as follows: In this paper, the principle of speech recognition system and least square support vector machine (LS-SVM) is deeply studied, and the LS-SVM is introduced into speech recognition system, which overcomes the need of prior distribution knowledge in traditional speech recognition methods. Artificial neural network is prone to the defect of "overlearning". (2) the importance of model parameters to the learning ability and generalization ability of the system is studied carefully, and the particle swarm optimization algorithm combined with K-fold cross-validation is proposed to optimize the optimal parameters. The complex manual debugging and the time-consuming grid algorithm are avoided. 3) based on the in-depth study of the reasons for the lack of sparsity of LS-SVM and the effect of speech sample feature dimension on the performance of the model, an independent component analysis (ICA) based least-squares SVM thinning method is proposed. The method firstly uses independent component analysis (ICA) to reduce the dimension of speech features, and then, after model training, a fast pruning algorithm based on ICA is used to reduce the kernel matrix. The combination of kurtosis and skewness is used as a measure of the importance of independent components in the process of reduction, so as to solve the problem of sorting independent components. The experiments on the Korean language corpus show that the algorithm not only realizes the sparse model but also ensures the accuracy of model recognition. 4) aiming at the problem that non-support vector participation in model training will lead to higher model complexity and lower model recognition performance, this paper starts from two aspects: data mining and geometric distribution meaning of support vector. A least squares support vector machine thinning algorithm based on support vector preselection is proposed. Before the model training, the union of the key representation samples extracted by the K-means clustering algorithm and the boundary samples selected by the centroid distance ratio algorithm is taken as the pre-selected support vector. The experiments of Korean phonetic corpus and Aurora-2 corpus show that the method improves the recognition speed and achieves the purpose of thinning on the basis of almost no loss of recognition accuracy.
【学位授予单位】：太原理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.3

【参考文献】