基于盲源分离的语音音乐信号分离算法的研究
本文选题:语音音乐分离 + 牛顿下山法 ; 参考:《江南大学》2014年硕士论文
【摘要】:语音音乐分离就是将音频中混合的语音信号和音乐信号分离开来,分离后的信号可用于语音识别、乐器识别、音乐旋律提取和音乐流派分类等音频分析过程,盲源分离算法是解决从混合信号中提取各个原始信号的问题的有效方法,为语音和音乐信号的分离提供了有效途径。本文在线性瞬时混合的情况下研究了负熵最大化方法、基于时频比的盲源分离方法和信息最大化方法,并用于语音音乐信号分离,主要工作如下: 第一,研究了基于负熵最大化改进的算法,用于适定语音音乐分离。针对负熵最大化方法中分离性能依赖于初始矩阵选取的问题,采用牛顿下山法代替牛顿迭代法做为优化算法寻找最优矩阵,通过改变下山因子,使目标函数呈下降趋势,降低算法对初始值的依赖性。仿真实验结果表明,算法在不同初始值下均能较好的分离出源信号,改进后算法平均迭代时间比改进前减少26.44%,迭代次数减少69.15%,并且迭代时间和迭代次数均在较小范围内波动,较好地解决了初始值敏感的问题。 第二,研究了基于时频比改进的算法,用于适定语音音乐分离。针对基于时频比的盲源分离将信号变换到时频域后计算量大且对算法有效的时频点较少的问题,用重复结构周期内的时频点代替整个时频域进行单源点的检测。重复结构内的时频点在每个周期内都有相似的值,通过对一个周期内时频点的检测,得到单源点的时频比,对这些比值构成的矩阵求逆就可得到对源信号的估计。仿真实验结果表明,在达到几乎相同相似系数的情况下,,改进后算法检测的时频窗减少了51.90%,运行时间减少了56.72%,降低了运算量。 第三,研究了结合经验模态分解和互信息最大化方法的盲分离算法,用于欠定语音音乐信号分离。针对信息最大化方法只能应用于观测数不少于源信号数的情况,采用经验模态分解和互信息最大化相结合的算法。根据重构信号与原混合信号的相似度选取固有模态函数构造新的信号,并与原混合信号组成新的观测信号,将欠定盲源分离转化为适定盲源分离,再以输出与输入信号之间的互信息为目标函数,自然梯度法为优化算法分离信号。仿真实验结果表明,经验模态分解和互信息最大化相结合的方法能有效的解决欠定盲源分离问题。
[Abstract]:The separation of speech and music is the separation of the mixed audio signal from the music signal. The separated signal can be used in the audio analysis process such as speech recognition, musical instrument recognition, music melody extraction and music genre classification. Blind source separation (BSS) algorithm is an effective method to solve the problem of extracting the original signals from mixed signals, which provides an effective way for the separation of speech and music signals. In this paper, the negative entropy maximization method, the blind source separation method based on time-frequency ratio and the information maximization method are studied in the case of linear instantaneous mixing. The main work is as follows: Firstly, an improved algorithm based on negative entropy maximization is studied to separate speech and music. In order to solve the problem that separation performance depends on the selection of initial matrix in negative entropy maximization method, Newton downhill method is used instead of Newton iteration method to find the optimal matrix. By changing the downhill factor, the objective function presents a downward trend. The dependence of the algorithm on the initial value is reduced. The simulation results show that the algorithm can separate the source signal well under different initial values. The average iteration time of the improved algorithm is 26.44 less than that before the improvement, the iteration number is reduced 69.15, and the iteration time and number of iterations are fluctuated in a small range. The problem of sensitivity of initial value is well solved. Secondly, an improved algorithm based on time-frequency ratio is studied, which can be used to separate speech and music. In order to solve the problem that blind source separation based on time-frequency ratio can transform signals into time-frequency domain with a large amount of computation and less time frequency points which are effective to the algorithm, the time-frequency points in the cycle of repetitive structure are used instead of the whole time-frequency domain to detect the single source points. The time-frequency points in the repetitive structure have similar values in each period. By detecting the time-frequency points in a period, the time-frequency ratio of a single source point is obtained, and the estimation of the source signal can be obtained by inverse the matrix formed by these ratios. The simulation results show that the time-frequency window of the improved algorithm is reduced by 51.90, the running time is reduced by 56.72, and the computation amount is reduced. Thirdly, a blind separation algorithm combining empirical mode decomposition and mutual information maximization is proposed to separate underdetermined speech and music signals. For the information maximization method can only be applied to the case where the number of observations is not less than the number of source signals, the combination of empirical mode decomposition and mutual information maximization is adopted. According to the similarity between the reconstructed signal and the original mixed signal, the inherent mode function is selected to construct the new signal, and the new observation signal is formed with the original mixed signal. The under-determined blind source separation is transformed into the suitably blind source separation. Then the mutual information between the output and the input signal is taken as the objective function, and the natural gradient method is used as the optimization algorithm to separate the signal. Simulation results show that the combination of empirical mode decomposition and mutual information maximization can effectively solve the problem of under-determined blind source separation.
【学位授予单位】:江南大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TN912.3
【共引文献】
相关会议论文 前2条
1 梅玉龙;陶桂兰;;换填法垫层厚度优化设计[A];第十二届中国海岸工程学术讨论会论文集[C];2005年
2 王婧;陈振乾;施明恒;;房间空调系统的动态特性及控制分析[A];中国制冷学会2007学术年会论文集[C];2007年
相关博士学位论文 前10条
1 康锋;基于视觉特征的早期农林火灾检测方法的基础研究[D];浙江大学;2010年
2 陈聆;地球化学矿致异常非线性分析方法研究[D];成都理工大学;2011年
3 金江明;活塞式压缩机排气量无级调节系统关键技术的研究[D];浙江大学;2010年
4 陈建国;基于独立分量分析的机械故障特征提取及分类方法研究[D];大连理工大学;2011年
5 程常桂;气膜软接触连铸技术的基础研究[D];上海大学;2003年
6 邵振峰;基于航空立体影像对的人工目标三维提取与重建[D];武汉大学;2004年
7 陈娟;长输原油管道设计方案优化研究[D];西南石油学院;2004年
8 孟宏睿;生态轻质水泥基墙体材料性能及密肋复合墙体弹塑性分析模型研究[D];西安建筑科技大学;2007年
9 杨燕;基于主分量和独立分量分析的结构信号处理和损伤识别研究[D];武汉理工大学;2008年
10 王子云;长江水源热泵换热器研究[D];重庆大学;2008年
相关硕士学位论文 前10条
1 刘继芳;基于计算听觉场景分析的混合语音分离研究[D];哈尔滨工程大学;2009年
2 王沛;基于小波变换和EMD去噪的含噪混叠语音盲分离[D];昆明理工大学;2009年
3 秦军;Runge-Kutta法在求解微分方程模型中的应用[D];安徽大学;2010年
4 殷华;低截获概率雷达抗同频干扰方法研究[D];江南大学;2010年
5 朱会平;机载激光雷达测量系统检校与精度评价[D];河南理工大学;2011年
6 高巧玲;改进的快速独立分量分析及其在语音盲分离的应用研究[D];湖南师范大学;2011年
7 罗飞雪;基于EMD与ICA的GPS动态变形监测数据处理方法研究[D];中南大学;2011年
8 何大志;基于ARM11的便携式伽玛能谱仪应用软件开发[D];成都理工大学;2011年
9 李斌;轴承腔内油气两相流动与换热特性研究[D];南京航空航天大学;2010年
10 彭璇;二维波达方向估计算法及其DSP实现[D];华中科技大学;2011年
本文编号:1975541
本文链接:https://www.wllwen.com/kejilunwen/wltx/1975541.html