当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于计算听觉场景分析的双说话人混合语音分离研究

发布时间:2018-08-12 19:50
【摘要】:随着信息技术的发展,语音信号处理与搜索引擎和人工智能等领域联系紧密,而基于计算听觉场景分析的语音信号分离在多媒体检索和机器人研究等方向上具有广阔的应用前景,也逐渐成为研究人员的研究重点。目前基于计算听觉场景分析的语音分离系统中,针对多个说话人混合语音的分离难以取得令人满意的效果,原因在于大部分计算听觉场景分析系统在提取基音阶段不能准确得到多个基音的轨迹,进而影响到语音的分离,另一方面许多分离系统在组织阶段采用训练模型,需要依赖样本训练的有效性以及说话人的先验知识。 在现有研究基础上,本文提出一种双说话人混合语音分离方法,主要研究内容包括: (1)提出基于隐马尔科夫模型的多基音跟踪方法。首先通过外围处理模块将语音信号分解成时频单元。其次,在基音跟踪阶段利用语音信号的统计特性,通过基于隐马尔科夫模型的多基音跟踪算法计算出混合语音中多个基音轨迹,并设计出能在多个基音存在情况下完成时频标记的方法,得到同时语音流。实验表明该方法在提取多说话人语音材料基音轨迹的有效性。 (2)提出基于聚类的序列组合方法。首先提取混合语音材料中的gammatone倒谱系数,提出基于类内散布矩阵与类间散布矩阵的目标函数,然后通过最大化类内散布矩阵与类间散布矩阵的迹,搜索同时语音流的最佳分类,最终完成对双说话人的语音分离。实验表明该方法在分离双说话人混合语音的有效性。
[Abstract]:With the development of information technology, voice signal processing is closely related to search engine and artificial intelligence. The separation of speech signals based on computational auditory scene analysis has broad application prospects in multimedia retrieval and robot research and has gradually become the focus of researchers. At present, in the speech separation system based on computational auditory scene analysis, it is difficult to achieve satisfactory results for multi-speaker mixed speech separation. The reason lies in the fact that most of the computational auditory scene analysis systems can not accurately obtain multiple pitch tracks in the pitch extraction stage, which in turn affect the speech separation. On the other hand, many separation systems adopt training models in the organizational phase. It depends on the validity of the sample training and the prior knowledge of the speaker. On the basis of the existing research, this paper proposes a method of dual-speaker mixed speech separation. The main research contents are as follows: (1) A multi-pitch tracking method based on Hidden Markov Model is proposed. Firstly, the speech signal is decomposed into time-frequency unit by peripheral processing module. Secondly, in the pitch tracking stage, using the statistical characteristics of the speech signal, the multiple pitch tracking algorithm based on Hidden Markov Model is used to calculate multiple pitch tracks in the mixed speech. A method is designed to complete the time-frequency tag in the presence of multiple pitch, and the simultaneous speech stream is obtained. Experiments show that the proposed method is effective in extracting pitch trajectories of multi-speaker speech materials. (2) A clustering based sequence combination method is proposed. Firstly, the gammatone cepstrum number is extracted from the mixed speech materials, and the objective function based on the intra-class dispersion matrix and the inter-class dispersion matrix is proposed. Then, by maximizing the trace between the intra-class dispersion matrix and the inter-class dispersion matrix, the optimal classification of simultaneous speech flow is searched. The final completion of the dual speaker speech separation. Experiments show that the proposed method is effective in separating dual speaker mixed speech.
【学位授予单位】:广西大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TN912.3

【参考文献】

相关期刊论文 前1条

1 张学良;刘文举;李鹏;徐波;;改进谐波组织规则的单通道浊语音分离系统[J];声学学报;2011年01期

相关博士学位论文 前1条

1 赵立恒;基于计算听觉场景分析的单声道语音分离研究[D];中国科学技术大学;2012年



本文编号:2180174

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2180174.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户93c93***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com