录音真实性辨识和重翻录检测
发布时间:2019-03-05 09:27
【摘要】:语音信号数字化的普及,极大地方便了语音数据的存储、传输和共享。与此同时,操作简单、功能强大的音频编辑软件发展迅速,无论是录制一段语音,还是对其进行加工、润色或者其它处理,都成为一件轻而易举的工作。这些技术在给人们带来诸多便利的同时,也产生了许多安全隐患。例如,利用音频处理技术可以篡改语音内容或者产生伪造的语音,一旦这些虚假的录音被某些人用于非法的目的,将对社会及他人的生命和财产安全造成一定的威胁。因此,对数字语音信号的真实性检测具有十分重要的意义。尽管目前已经有许多针对数字音频的相关方面的研究,但远不能满足社会及大众的需求。针对目前存在的音频取证的相关问题,本文对录音的真实性辨识和重翻录检测进行了研究,主要内容如下:1)录音真实性辨识。随着音频编辑软件的普及,人们可以利用一些常用的功能(如滤波、混响等)对音频进行美化和修饰。这些操作简单、功能强大的编辑软件带给人们便利的同时,也给一些不法分子提供了可乘之机。比如音频伪造者可以使用软件中合适的滤波功能对拼接的音频进行平滑,从而掩盖拼接的痕迹。如果此类音频作为证据出现于法庭审判并被接纳,无疑会对审判结果产生重大影响。也有人通过变调功能来模仿其他人的声音进行电话诈骗,给他人的财产安全造成极大的威胁。可见,对数字语音信号的真实性检测是十分必要的。借鉴图像共生矩阵的思想,本文提出了适用于音频的幅度共生向量特征,即将语音信号进行量化操作,再对相邻多个样本点之间形成的共生向量进行概率分布的计算。该特征体现了相邻样本点之间的波动特性,对于处理语音的检测起到了很好的效果,实验的准确率能够达到95%。另外实验中我们列举了两种编辑软件中的12种操作处理,对其进行了检测和区分,结果证明该特征能够进一步对处理功能进行辨识。2)数字语音的重翻录检测。重翻录操作不仅可以起到伪造场景的作用,而且它能用于攻击基于语音特征的身份认证系统,因此检测重翻录操作也变得十分重要。本文主要通过数据统计分析的角度,使用扩展的幅度共生向量特征来区分原始语音和重翻录语音。我们对幅度共生向量的量化阈值T进行了分析,同时增加了不同采样间隔的样本点组合,使其更加适用于重翻录检测。另外,我们构建了一个重翻录的数据库,包含了多种录制设备和不同的录制环境等因素,为实验部分提供了充分的数据。在与梅尔倒谱系数特征和原有的幅度共生向量特征的对比中验证了该特征对重翻录检测的性能,基于该特征的重翻录检测的准确率能够达到96%。同时我们将数据库划分成不同场景的子数据集,进行相同场景和不同场景的检测,准确率分别能达到99.36%和95.69%。
[Abstract]:The popularity of digital speech signal greatly facilitates the storage, transmission and sharing of voice data. At the same time, the simple operation, powerful audio editing software has developed rapidly, whether to record a speech, or to process it, retouching or other processing, has become an easy job. These technologies bring a lot of convenience to people, at the same time, they also bring a lot of security risks. For example, the use of audio processing technology can tamper with speech content or produce forged speech. Once these false recordings are used for illegal purposes by some people, they will pose a threat to the safety of the life and property of society and others. Therefore, it is very important to detect the authenticity of digital speech signals. Although there have been a lot of research on digital audio, it can not meet the needs of the society and the general public. Aiming at the problems of audio forensics at present, this paper studies the authenticity identification and re-transcription detection of audio recording. The main contents are as follows: 1) recording authenticity identification. With the popularity of audio editing software, people can make use of some commonly used functions (such as filtering, reverberation, etc.) to beautify and modify audio. These operations are simple, powerful editing software brings convenience to people, but also gives some criminals an opportunity to take advantage of. For example, the audio forger can smooth the splicing audio by using the appropriate filtering function in the software to mask the stitching trace. If such audio appears as evidence in court and is accepted, it will undoubtedly have a significant impact on the outcome of the trial. Others make phone fraud by changing the voice of others, which poses a great threat to the property security of others. Therefore, it is very necessary to detect the authenticity of digital speech signals. Referring to the idea of image co-occurrence matrix, this paper proposes the amplitude co-occurrence vector feature suitable for audio frequency, that is to say, the speech signal is quantized, and then the probability distribution of the symbiosis vector formed between adjacent sample points is calculated. This feature reflects the fluctuation between adjacent sample points and has a good effect on speech detection. The accuracy of the experiment is up to 95%. In addition, we enumerate 12 kinds of operation processing in two kinds of editing software, and detect and distinguish them. The result shows that the feature can further identify the processing function. 2) the re-ripping detection of digital speech. Not only can it act as a forgery scene, but also it can be used to attack the authentication system based on speech features. Therefore, it is very important to detect the reentry operation. In this paper, the extended amplitude symbiosis vector feature is used to distinguish the original speech from the re-transcribed speech from the statistical analysis of the data. We analyze the quantized threshold T of amplitude co-occurrence vector and increase the combination of sample points with different sampling intervals, which makes it more suitable for re-reading detection. In addition, we build a database which contains many kinds of recording equipment and different recording environment, which provides sufficient data for the experimental part. In comparison with Mel cepstrum coefficient feature and original amplitude co-occurrence vector feature, the performance of this feature for re-ripping detection is verified, and the accuracy of re-ripping detection based on this feature can reach 96%. At the same time, we divide the database into sub-data sets of different scenarios, and detect the same scene and different scene, the accuracy can reach 99.36% and 95.69%, respectively.
【学位授予单位】:深圳大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3
本文编号:2434758
[Abstract]:The popularity of digital speech signal greatly facilitates the storage, transmission and sharing of voice data. At the same time, the simple operation, powerful audio editing software has developed rapidly, whether to record a speech, or to process it, retouching or other processing, has become an easy job. These technologies bring a lot of convenience to people, at the same time, they also bring a lot of security risks. For example, the use of audio processing technology can tamper with speech content or produce forged speech. Once these false recordings are used for illegal purposes by some people, they will pose a threat to the safety of the life and property of society and others. Therefore, it is very important to detect the authenticity of digital speech signals. Although there have been a lot of research on digital audio, it can not meet the needs of the society and the general public. Aiming at the problems of audio forensics at present, this paper studies the authenticity identification and re-transcription detection of audio recording. The main contents are as follows: 1) recording authenticity identification. With the popularity of audio editing software, people can make use of some commonly used functions (such as filtering, reverberation, etc.) to beautify and modify audio. These operations are simple, powerful editing software brings convenience to people, but also gives some criminals an opportunity to take advantage of. For example, the audio forger can smooth the splicing audio by using the appropriate filtering function in the software to mask the stitching trace. If such audio appears as evidence in court and is accepted, it will undoubtedly have a significant impact on the outcome of the trial. Others make phone fraud by changing the voice of others, which poses a great threat to the property security of others. Therefore, it is very necessary to detect the authenticity of digital speech signals. Referring to the idea of image co-occurrence matrix, this paper proposes the amplitude co-occurrence vector feature suitable for audio frequency, that is to say, the speech signal is quantized, and then the probability distribution of the symbiosis vector formed between adjacent sample points is calculated. This feature reflects the fluctuation between adjacent sample points and has a good effect on speech detection. The accuracy of the experiment is up to 95%. In addition, we enumerate 12 kinds of operation processing in two kinds of editing software, and detect and distinguish them. The result shows that the feature can further identify the processing function. 2) the re-ripping detection of digital speech. Not only can it act as a forgery scene, but also it can be used to attack the authentication system based on speech features. Therefore, it is very important to detect the reentry operation. In this paper, the extended amplitude symbiosis vector feature is used to distinguish the original speech from the re-transcribed speech from the statistical analysis of the data. We analyze the quantized threshold T of amplitude co-occurrence vector and increase the combination of sample points with different sampling intervals, which makes it more suitable for re-reading detection. In addition, we build a database which contains many kinds of recording equipment and different recording environment, which provides sufficient data for the experimental part. In comparison with Mel cepstrum coefficient feature and original amplitude co-occurrence vector feature, the performance of this feature for re-ripping detection is verified, and the accuracy of re-ripping detection based on this feature can reach 96%. At the same time, we divide the database into sub-data sets of different scenarios, and detect the same scene and different scene, the accuracy can reach 99.36% and 95.69%, respectively.
【学位授予单位】:深圳大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3
【参考文献】
相关期刊论文 前8条
1 鲁明明;张晖;沈庆宏;;基于功率谱特征的音频指纹实现[J];电子测量技术;2016年09期
2 王志锋;贺前华;李艳雄;;录音设备的建模和识别算法[J];信号处理;2013年04期
3 高程程;惠晓威;;基于灰度共生矩阵的纹理特征提取[J];计算机系统应用;2010年06期
4 邵松年;黄征;徐彻;施少培;杨旭;;数字音频与录制设备的相关性研究[J];计算机工程;2009年19期
5 姚秋明;柴佩琪;宣国荣;杨志强;施云庆;;基于期望最大化算法的音频取证中的篡改检测[J];计算机应用;2006年11期
6 薄华;马缚龙;焦李成;;图像纹理的灰度共生矩阵计算问题的分析[J];电子学报;2006年01期
7 白雪冰;王克奇;王辉;;基于灰度共生矩阵的木材纹理分类方法的研究[J];哈尔滨工业大学学报;2005年12期
8 童隆正,王磊,陈海荣,陈瑞芬,贺文;肝纤维化图像的灰度共生矩阵分析[J];首都医科大学学报;2003年03期
,本文编号:2434758
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/2434758.html