基于张量分析的多因素音频信号建模与应用研究

发布时间：2018-01-02 21:25

本文关键词：基于张量分析的多因素音频信号建模与应用研究　出处：《北京理工大学》2016年博士论文　论文类型：学位论文

【摘要】：随着互联网技术和多媒体技术的不断发展,音频信号作为多媒体信号的重要组成部分,对其进行分析和处理引起了越来越多研究人员的关注,而张量分析是近年来被广泛使用的多边或者多线性分析工具,可以处理不止一个影响因素的信号,包括信号的高阶扩展形式或者本身是多维度的信号。本文把张量分析方法引入到多因素音频信号建模及其应用研究领域,利用其作为一种多因素分析方法在处理高阶信号方面可以保持数据结构信息的优势,解决音频信号的高阶特征建模、音频分类的高阶子空间分析和多声道音频信号丢失数据的恢复这三个应用问题,具体研究内容如下:1.针对音频信号的特征建模,本文对传统的一维、二维建模方式进行扩展,使用张量对音频信号的高阶特征建模,既体现了音频信号在不同属性子空间的物理意义,又保证了各子空间之间的联系,而且通过张量分解可以挖掘音频信号潜在的、本质的、具有区分度的结构信息。在无人车语音命令识别系统中构建一个帧结构、分解尺度、特征参数的三阶张量;在音频分类系统中,采用声学特征空间、感知特征空间和心理声学特征空间的不同属性构建三阶张量。通过使用张量建模和分解得到的音频特征集合,有利于提升音频识别和分类的正确率。2.针对音频分类这一模式识别问题,本文利用高阶子空间分析方法,创造性的使用非负张量分解技术进行音频分类。在有监督的音频分类训练时,把音频信号用非负张量模型表示,为了确保分解结果的唯一性,使用非负张量分解对每类音频信号分别进行学习,得到各类音频的非负核张量和因子矩阵;音频分类测试时,通过训练生成的非负因子矩阵把测试音频映射到各种类音频空间,通过Frobenius范数比较映射结果与训练时得到的各类音频核张量的相似度完成音频分类。与传统分类器相比,因为音频数据结构中的非线性关系并没有在非负张量分解过程中受到破坏,所以音频分类效果更好,可以更有效的实现音频数据库的分类标注。3.针对多声道音频信号丢失数据的恢复问题,本文把张量分解和张量补全技术首次引入到音频数据恢复中。张量分解方法是对有数据丢失的音频信号用三阶张量建模并分解,通过加权处理和交替迭代算法实现目标函数最小化;而张量补全方法主要通过定义张量的迹范数,利用凸松弛技术把秩函数最小化问题转化成迹范数最小化问题,即完成非凸优化到凸优化问题的转变,进而解决了非确定性多项式困难问题,利用基于块坐标下降算法的简单补全和基于交替方向乘子算法的精确补全完成多声道音频信号丢失数据的恢复。
[Abstract]:With the development of Internet technology and multimedia technology, audio signal, as an important part of multimedia signal, has attracted more and more researchers' attention. Zhang Liang analysis is a multilateral or multi-linear analysis tool widely used in recent years, which can deal with more than one factor of the signal. This paper introduces Zhang Liang analysis method into multi-factor audio signal modeling and application research field. As a multi-factor analysis method, it can maintain the advantage of data structure information in processing high-order signal, and solve the high-order feature modeling of audio signal. High order subspace analysis of audio classification and restoration of lost data of multi-channel audio signal are three application problems. The specific research contents are as follows: 1. Aiming at the feature modeling of audio signal, this paper focuses on the traditional one-dimensional. The two-dimensional modeling method is extended to use Zhang Liang to model the high-order features of audio signal, which not only reflects the physical meaning of audio signal in different attribute subspace, but also ensures the relationship between each subspace. And through Zhang Liang decomposition can mine audio signal potential, essential, has the discriminative structure information, constructs a frame structure in the unmanned vehicle speech command recognition system, decomposes the scale. Third order Zhang Liang of characteristic parameter; In the audio classification system, the third order Zhang Liang is constructed by using different attributes of acoustic feature space, perceptual feature space and psychoacoustic feature space. It is helpful to improve the accuracy of audio recognition and classification. 2. Aiming at the pattern recognition problem of audio classification, this paper uses high-order subspace analysis method. Creative use of non-negative Zhang Liang decomposition technology for audio classification. In the supervised audio classification training, the audio signal is represented by non-negative Zhang Liang model, in order to ensure the uniqueness of decomposition results. The non-negative Zhang Liang decomposition is used to study each kind of audio signal separately, and the non-negative nuclear Zhang Liang and factor matrix of all kinds of audio are obtained. In audio classification testing, the test audio is mapped to various kinds of audio space through the non-negative factor matrix generated by the training. By comparing the mapping result of Frobenius norm with the similarity of Zhang Liang, the audio kernel obtained from the training, the audio classification is completed, which is compared with the traditional classifier. Because the nonlinear relationship in the audio data structure is not destroyed in the process of non-negative Zhang Liang decomposition, the audio classification effect is better. It is more effective to realize the classification tagging of audio database. 3. To recover the lost data of multi-channel audio signal. In this paper, Zhang Liang decomposition and Zhang Liang complement technology are introduced into audio data recovery for the first time. Zhang Liang decomposition method is to model and decompose the audio signals with data loss by the third order Zhang Liang. The objective function is minimized by weighted processing and alternating iteration algorithm. Zhang Liang complements the whole method mainly by defining Zhang Liang's trace norm, using convex relaxation technique to transform the rank function minimization problem into the trace norm minimization problem, that is, to complete the transformation from non-convex optimization to convex optimization. Furthermore, the problem of uncertain polynomial is solved. The simple complement based on block coordinate descent algorithm and the exact complement based on alternating direction multiplier algorithm are used to restore the lost data of multi-channel audio signal.
【学位授予单位】：北京理工大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TN912.3

【参考文献】