当前位置:主页 > 医学论文 > 生物医学论文 >

基于卷积—长短时记忆神经网络的时序信号多粒度分析处理方法研究

发布时间:2018-07-28 12:19
【摘要】:时序信号是一种极其重要的信号,是指如语音信号、生物电信号、雷达和声纳信号、机械振动和地震信号[1]等等这样的频率、幅值随着时间的变化而不断改变的多成分信号。时序信号具有非线性和非平稳的特点,目前的绝大多数研究中都是基于信号是短时平稳的假设,特征的提取主要以频域特征为主,分析的层面和粒度相对单一。而且信号中极为重要的大部分时序信息被忽略,极大地影响了对时变信号信息的提取的能力,限制了其在实际应用中性能的提升。本文针对时序信号中时序信息的提取和建模问题,借鉴人脑认知过程中能够自动优选和整合多粒度、多时段和多层次特征的能力,提出了多粒度特征的提取和融合方法框架,我们将信号按照帧、段和全局三个粒度进行特征的提取,这样既保留了现有方法普遍采用的全局特征,又增加了帧粒度和段粒度这两个包含信号中时序信息的动态特征,有效地从多个角度对时序信号中的信息进行了提取,对信号中信息的表达能力也更加丰富。在段粒度的划分上,我们参考人脑在认知活动中的规律来进行窗长的划分。之后,我们将三个粒度的特征统一在帧的层面上按照时间的顺序进行了融合,再利用对时序信息建模能力比较强的LSTM神经网络模型来进行分类。在多粒度特征的具体实现上,我们采用了两个方法。一是利用传统的时频分析方法对时序信号进行帧特征的提取,再利用高斯函数组在段粒度窗下对帧特征进行卷积计算得到段特征,全局特征则是通过对所有的帧特征进行统计计算得到。另一个方法是结合目前在各个领域都有突破性进展的深度学习技术,借助卷积神经网络可以在原始数据上进行端到端的信息提取的能力,以及在多个层级完成特征提取的特点,来对时序信号完成多粒度的特征提取,提出了C-LSTM的网络结构。我们将待分析的时序原始信号直接输入到深度卷积网络中,通过预先设置好的卷积核在信号上进行滑动卷积,在浅层CNN中获得帧粒度特征,同时继续对帧粒度特征用更高层的CNN进一步加工,分别在中层和高层CNN输出段粒度以及全局粒度的特征。最后将三个粒度的特征信息在帧层面上按照时序进行整合,得到多粒度融合特征,并利用长短时记忆网络对时序信息进行建模与分类。最后,我们将所提出的方法框架和网络结构模型分别在语音信号上的语音情感识别分类问题以及脑电信号上的运动想象信号分类识别问题进行了实验。在语音情感分类问题上,我们采用了中科院自动化所在2016多模态情感识别竞赛中公布的数据集,共包含了生气、焦虑、厌恶、高兴、悲伤、惊讶、担忧以及中性这八种情感类别,与数据集的基线系统相比,将识别率提高了4%以上,并超过了竞赛第一名所采用的一种方法。在脑运动想象识别分类中,我们采用BCI2008数据集,是左右手运动想象的二分类问题。我们针对脑电多通道、具有空间分布特征的特点,在C-LSTM的基础上进行了改进,将电极的空间信息通过数据整合以及小波变换脑网络的方法融合其中,建立了3D-C-LSTM模型,并在识别率上相较其他方法提高了近10%,到达了92.0%,表明在脑电信号中除了时序信息之外,空间信息也是十分重要的。本文的研究工作为目前时序信号的分析处理领域中存在的一些关键性的技术问题提供了有效的改进方案,经过语音信号和脑电信号的相关实验证明,CLSTM的网络结构对于时序信号的处理具有普适性,具有一定推广价值。同时也为卷积神经网络等深度学习方法在时序信号处理中的应用与发展提供了新的思路和方向。
[Abstract]:Time series signals are very important signals, such as speech signals, bioelectrical signals, radar and sonar signals, mechanical vibration and seismic signal [1], and so on, and so on, the amplitude of the multi component signals that are constantly changing with the change of time. The time series signal has the characteristics of nonlinear and non-stationary, and most of the current studies have been done. It is based on the assumption that the signal is short-time stationary. The feature extraction is mainly based on the frequency domain characteristics, the analysis level and the granularity are relatively simple. Moreover, the most important time sequence information in the signal is ignored, which greatly affects the ability to extract the information of the time-varying signal and limits its performance improvement in practical applications. This paper aims at this paper. The extraction and modeling of time series information in time series signals, drawing on the ability to automatically optimize and integrate multi granularity, multi time and multi-level features in the process of human brain cognition, a framework for extracting and merging multiple granularity features is proposed. We extract the characteristics of the signal according to the three granularity of frame, segment and global. The global features commonly used in the existing methods also increase the dynamic characteristics of the time series information contained in the two signals, including the frame granularity and segment granularity, effectively extracting the information in the timing signal from multiple angles, and the ability to express the information in the signal is more abundant. In the division of segment granularity, we refer to the human brain in cognitive activities. Then, we divide the length of the window into the division of the length of the window. After that, we integrate the three granularity characteristics at the frame level according to the order of time. Then we use the LSTM neural network model which has strong ability to model the time sequence information. In the concrete reality of the multi granularity characteristics, we use two methods. One is to use the method. The traditional time frequency analysis method extracts the frame features of the time series signal, and then uses the Gauss function group to convolution the frame features under the segment size window to obtain the feature of the frame. The global feature is calculated by the statistical calculation of all the frame features. The other method is combined with the breakthrough progress in various fields at present. Degree learning technology, with the help of the convolution neural network, can carry out the information extraction ability of the end to end on the original data and the feature extraction at multiple levels, to extract the multiple granularity of time sequence signals, and put forward the network structure of C-LSTM. We input the original signal to the depth convolution directly. In the network, a pre set convolution kernel is used to slide convolution on the signal, and the frame size features are obtained in the shallow CNN, while the frame granularity features are further processed with a higher level of CNN, respectively, in the medium and high level CNN output segments, as well as the global granularity. Finally, the three granularity feature information is pressed on the frame level. According to the integration of time series, the feature of multi granularity fusion is obtained, and the time series information is modeled and classified by long and short time memory network. Finally, we put forward the proposed method framework and network structure model to classify the speech emotion recognition on the speech signal and the classification and recognition of the motion imagination signal on the EEG signal. On the problem of speech emotion classification, we adopted the data set published in the more than 2016 mode emotion recognition contest of the Institute of automation of the Academy of Sciences, which included eight kinds of emotional categories, such as anger, anxiety, disgust, joy, sadness, surprise, worry and neutrality, and increased the recognition rate by more than 4% compared with the baseline system of the dataset. In the image recognition classification of brain motion, we use the BCI2008 data set in the brain motion picture recognition classification, which is the two classification problem of the left and right hand motion imagination. We improve the spatial distribution characteristics of the EEG multi-channel and have the characteristics of spatial distribution, and integrate the spatial information of the electrode through the data to integrate the data. And the method of wavelet transform brain network is fused, the 3D-C-LSTM model is established, and the recognition rate is increased by nearly 10% compared with other methods, and 92%. It shows that the spatial information is also very important in the EEG in addition to the time series information. Some key technical problems provide an effective improvement scheme. Through the experiments of speech signals and EEG signals, it is proved that the network structure of CLSTM has the universality and popularization value for the processing of time series signal. Meanwhile, it also provides the application and development of the convolution neural network and other depth learning methods in the time series signal processing. New ideas and directions are provided.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R318;TP183

【参考文献】

相关期刊论文 前3条

1 焦李成;杨淑媛;刘芳;王士刚;冯志玺;;神经网络七十年:回顾与展望[J];计算机学报;2016年08期

2 李星雨;杨承志;曲文韬;张荣;;基于自适应网格密度聚类的雷达信号分选算法[J];航天电子对抗;2013年02期

3 王登;苗夺谦;王睿智;;一种新的基于小波包分解的EEG特征抽取与识别方法研究[J];电子学报;2013年01期

相关会议论文 前1条

1 贾磊;;LSTM建模和CTC训练在语音建模技术中的应用[A];第十三届全国人机语音通讯学术会议(NCMMSC2015)论文集[C];2015年



本文编号:2150122

资料下载
论文发表

本文链接:https://www.wllwen.com/yixuelunwen/swyx/2150122.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户c1e59***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com