基于支持向量机的旋转机械故障诊断与预测方法研究
发布时间:2018-09-12 14:09
【摘要】:随着现代科学技术的迅猛发展,旋转机械不断朝着大型化、复杂化、高速化、连续化和自动化的方向发展。这些发展在带来较高生产效率的同时,对设备的安全运转也提出了更高的要求,一旦发生故障,将造成巨大的经济损失,甚至会导致灾难性的人员伤亡事故和严重的社会影响。对设备的故障进行诊断,以及根据历史数据对设备运行状态进行预测,是保证设备安全可靠运行的重要措施,只有如此才能及时有效的处理存在的问题,将故障消除在萌芽状态。本文研究基于支持向量机理论对设备运行状态进行故障诊断并根据历史运行数据对设备未来状态进行预测的方法,建立了设备运行状态预报和故障诊断模型,并对一些关键问题的解决进行了深入研究。本文主要创新工作如下: 1、提出了基于遗传算法的支持向量机样本平衡方法 对于故障诊断来说,故障样本一般少于正常样本,所以普遍存在不平衡样本问题。支持向量机在遇到样本不平衡问题时,往往造成“少样本类”的误诊。针对该问题,本文提出了基于遗传算法的样本平衡方法,利用遗传算法中的交叉、变异方式生成子代样本,对“少样本类”进行繁殖扩充,得到更多的该类样本进而达到两类样本的平衡。为使扩充的样本更具有针对性,更有利于形成正确的“分类超平面”,给出了父代样本的选择方法以及子代样本的评价方法。 2、分析了EMD方法产生虚假分量的原因,并提出了一种识别虚假分量的方法 作为特征提取的方法,EMD能够较好的处理非平稳、非线性问题,但利用EMD方法对信号进行处理时常常会引入虚假分量,影响分析的准确性,是严重制约EMD方法发展的瓶颈问题。为了消除虚假分量的影响,更好地发挥EMD方法在特征提取中的作用,本文提出基于K-L散度的虚假分量识别方法,该方法利用K-L散度来评价EMD分解得到的各个分量与原信号的关系程度,分量与原信号之间的K-L散度越小,关系程度越大,分量的真实性就越高,反之虚假性就越高,虚假分量通过设定阈值进行判别。同时,文中研究并给出了阂值设定方法。 3、提出了基于EMD特征提取的支持向量机算法 在根据历史数据对运行状态进行预测时,特征参数与预测点的关联程度在很大程度上决定了预测值的准确性。目前,特征参数的选取方法主要有两种:一种是基于实测数据的特征,即采集与预测值相关联的影响因素作为特征,如对风电功率预测时选取风速、气压等因素作为特征,但对于振动等一些预测量来说,其影响因素往往十分复杂,不易明确,利用这种方式也就无法建立高精度预测模型;另一种是通过对时间序列的计算得到其特征参数,这类方法中最具代表性且最常用的方法是基于相空间重构的方法,方法利用混沌理论计算嵌入维数和时间延迟重构相空间,得到时间序列的特征,然而嵌入维数以及时间延迟的确定只是从时间序列的动力学特性角度来考虑的,用其构造的特征对于预测模型来说并不一定合适,所以往往会由于二者选取的不恰当而无法得到合适的特征,进而造成预测的精度大大降低。故本文针对预测模型特征选取的问题,提出了基于EMD特征提取的支持向量机算法(EMD-SVM),利用EMD分解后各时间点的分量值作为特征,并与该时间点对应的时间序列值(目标值)共同构成样本,建立预测模型,并通过实验证明其较高的准确性和稳定性。 4、针对支持向量机存在大规模样本问题提出了基于信息熵的样本长度选择方法 大规模训练样本问题一直是困扰SVM计算速度提升的瓶颈,过多的训练样本会大大的增加计算成本,而且不一定会带来更准确的预测结果,甚至会导致更严重的偏差。所以,训练样本的长度必须控制在合适的范围内。针对该问题,本文提出了基于信息熵的样本长度选择方法。该方法的基本思想是选取与预测点最相关的历史数据作为训练样本,保证数据信息的完备性和不冗余。靠近预测点的历史数据通常与预测点之间关系越强,这些数据作为训练样本对于预测点来说意义较大;随着距离的增加(时间不断向前推移)关联性越弱,对于预测点意义较小,将这些数据加入到训练样本中会表现出数据波动性增大,平稳性降低,需要对这些点进行删减。本文方法基于这一思想,在历史数据中从前向后依次选取不同的位置作为起始点截取时间序列,计算不同起始点截得时间序列的信息熵值。将对应信息熵最小的起始点作为新时间坐标轴的“0点坐标”,坐标轴负半轴的数据时间久远且与当前状态相关程度低,需剔除,正半轴数据与当前相关程度高,故用来作为训练样本,如此可以保证信息的完备性,同时避免了建立模型时的寻优过程对相关程度较低的训练样本的“照料”。文中通过理论分析和实验的方式从计算时间和预测精度的角度考察了方法的有效性。
[Abstract]:With the rapid development of modern science and technology, rotating machinery is developing toward the direction of large-scale, complex, high-speed, continuous and automation. These developments bring about higher production efficiency, at the same time, higher requirements for the safe operation of equipment, once a fault occurs, it will cause enormous economic losses, or even lead to it. Disastrous casualties and serious social impacts. Diagnosis of equipment failures and prediction of equipment operation status based on historical data are important measures to ensure the safe and reliable operation of equipment. Only in this way can problems be dealt with in a timely and effective manner, and the failure can be eliminated in the embryonic state. Support Vector Machine (SVM) theory is used to diagnose the running state of the equipment and predict the future state of the equipment according to the historical running data.
1, a sample balance method based on genetic algorithm for support vector machines is proposed.
For fault diagnosis, the number of fault samples is usually less than that of normal samples, so unbalanced samples are common. Support vector machines often cause misdiagnosis of "few sample classes" when they encounter unbalanced samples. In order to make the expanded sample more pertinent and more conducive to forming a correct "classification hyperplane", the selection method of parent sample and the evaluation method of offspring sample are given.
2, the causes of false components produced by the EMD method are analyzed, and a method to identify false components is proposed.
As a method of feature extraction, EMD can deal with non-stationary and non-linear problems well, but when EMD is used to process signals, false components are often introduced. The accuracy of impact analysis is a bottleneck problem which seriously restricts the development of EMD. In order to eliminate the influence of false components, EMD method can be better used in feature extraction. In this paper, a method of false component identification based on K-L divergence is proposed. This method uses K-L divergence to evaluate the relationship between each component obtained by EMD decomposition and the original signal. The smaller the K-L divergence between the component and the original signal, the greater the relationship, the higher the authenticity of the component, otherwise the higher the falseness, and the false component by setting a threshold. At the same time, the method of setting the threshold value is studied and given.
3, a support vector machine algorithm based on EMD feature extraction is proposed.
At present, there are two main methods to select feature parameters: one is based on the characteristics of measured data, that is, the factors associated with the predicted values are collected as features, such as wind power. Wind speed, air pressure and other factors are selected as the characteristics of power forecasting, but for some forecasting variables such as vibration, the influencing factors are often very complex and difficult to define, so it is impossible to establish a high-precision forecasting model by using this method; the other is to get the characteristic parameters by calculating the time series, which is the most representative method. And the most commonly used method is based on the phase space reconstruction method. The method uses chaos theory to calculate the embedding dimension and time delay to reconstruct the phase space and get the characteristics of time series. Because of the improper selection of the two features, the prediction accuracy will be greatly reduced. Therefore, this paper proposes a support vector machine algorithm based on EMD feature extraction (EMD-SVM), which uses the component values of each time point after EMD decomposition as the prediction model feature selection problem. It is characterized by the time series value (target value) corresponding to the time point to form a sample, and the prediction model is established, and its high accuracy and stability are proved by experiments.
4. Aiming at the problem of large-scale sample in support vector machine, a sample length selection method based on information entropy is proposed.
The problem of large-scale training samples has always been a bottleneck to speed up the calculation of SVM. Too many training samples will greatly increase the calculation cost, and will not necessarily lead to more accurate prediction results, or even lead to more serious deviations. Therefore, the length of training samples must be controlled within a suitable range. The basic idea of this method is to select the most relevant historical data as training samples to ensure the completeness and non-redundancy of data information. With the increase of distance (time goes forward) the correlation is weaker and the significance for predicting points is smaller. Adding these data into training samples will increase the data volatility and reduce the stationarity, which needs to be deleted. Based on this idea, this method selects the historical data from front to back in turn. The information entropy value of the time series is calculated by using the same position as the starting point and the minimum starting point of the corresponding information entropy is regarded as the "0-point coordinate" of the new time coordinate axis. It is used as training sample to ensure the completeness of the information and avoid the "care" of the training sample with low correlation in the process of establishing the model.
【学位授予单位】:华北电力大学
【学位级别】:博士
【学位授予年份】:2013
【分类号】:TH165.3
[Abstract]:With the rapid development of modern science and technology, rotating machinery is developing toward the direction of large-scale, complex, high-speed, continuous and automation. These developments bring about higher production efficiency, at the same time, higher requirements for the safe operation of equipment, once a fault occurs, it will cause enormous economic losses, or even lead to it. Disastrous casualties and serious social impacts. Diagnosis of equipment failures and prediction of equipment operation status based on historical data are important measures to ensure the safe and reliable operation of equipment. Only in this way can problems be dealt with in a timely and effective manner, and the failure can be eliminated in the embryonic state. Support Vector Machine (SVM) theory is used to diagnose the running state of the equipment and predict the future state of the equipment according to the historical running data.
1, a sample balance method based on genetic algorithm for support vector machines is proposed.
For fault diagnosis, the number of fault samples is usually less than that of normal samples, so unbalanced samples are common. Support vector machines often cause misdiagnosis of "few sample classes" when they encounter unbalanced samples. In order to make the expanded sample more pertinent and more conducive to forming a correct "classification hyperplane", the selection method of parent sample and the evaluation method of offspring sample are given.
2, the causes of false components produced by the EMD method are analyzed, and a method to identify false components is proposed.
As a method of feature extraction, EMD can deal with non-stationary and non-linear problems well, but when EMD is used to process signals, false components are often introduced. The accuracy of impact analysis is a bottleneck problem which seriously restricts the development of EMD. In order to eliminate the influence of false components, EMD method can be better used in feature extraction. In this paper, a method of false component identification based on K-L divergence is proposed. This method uses K-L divergence to evaluate the relationship between each component obtained by EMD decomposition and the original signal. The smaller the K-L divergence between the component and the original signal, the greater the relationship, the higher the authenticity of the component, otherwise the higher the falseness, and the false component by setting a threshold. At the same time, the method of setting the threshold value is studied and given.
3, a support vector machine algorithm based on EMD feature extraction is proposed.
At present, there are two main methods to select feature parameters: one is based on the characteristics of measured data, that is, the factors associated with the predicted values are collected as features, such as wind power. Wind speed, air pressure and other factors are selected as the characteristics of power forecasting, but for some forecasting variables such as vibration, the influencing factors are often very complex and difficult to define, so it is impossible to establish a high-precision forecasting model by using this method; the other is to get the characteristic parameters by calculating the time series, which is the most representative method. And the most commonly used method is based on the phase space reconstruction method. The method uses chaos theory to calculate the embedding dimension and time delay to reconstruct the phase space and get the characteristics of time series. Because of the improper selection of the two features, the prediction accuracy will be greatly reduced. Therefore, this paper proposes a support vector machine algorithm based on EMD feature extraction (EMD-SVM), which uses the component values of each time point after EMD decomposition as the prediction model feature selection problem. It is characterized by the time series value (target value) corresponding to the time point to form a sample, and the prediction model is established, and its high accuracy and stability are proved by experiments.
4. Aiming at the problem of large-scale sample in support vector machine, a sample length selection method based on information entropy is proposed.
The problem of large-scale training samples has always been a bottleneck to speed up the calculation of SVM. Too many training samples will greatly increase the calculation cost, and will not necessarily lead to more accurate prediction results, or even lead to more serious deviations. Therefore, the length of training samples must be controlled within a suitable range. The basic idea of this method is to select the most relevant historical data as training samples to ensure the completeness and non-redundancy of data information. With the increase of distance (time goes forward) the correlation is weaker and the significance for predicting points is smaller. Adding these data into training samples will increase the data volatility and reduce the stationarity, which needs to be deleted. Based on this idea, this method selects the historical data from front to back in turn. The information entropy value of the time series is calculated by using the same position as the starting point and the minimum starting point of the corresponding information entropy is regarded as the "0-point coordinate" of the new time coordinate axis. It is used as training sample to ensure the completeness of the information and avoid the "care" of the training sample with low correlation in the process of establishing the model.
【学位授予单位】:华北电力大学
【学位级别】:博士
【学位授予年份】:2013
【分类号】:TH165.3
【参考文献】
相关期刊论文 前10条
1 王娟娟;任秋实;;基于SMO的不同惩罚系数的SVM算法[J];信息技术;2006年10期
2 李雪耀;邹晓杰;张汝波;钱真;;谱熵和主成分分析用于EMD分解研究[J];哈尔滨工程大学学报;2009年07期
3 程军圣;于德介;杨宇;;基于EMD和SVM的滚动轴承故障诊断方法[J];航空动力学报;2006年03期
4 翟永杰;刘林;王朋;;基于ADE-SVM和模糊理论的电力系统中期负荷预测[J];电力系统保护与控制;2012年08期
5 刘向东 ,骆斌 ,陈兆乾;支持向量机最优模型选择的研究[J];计算机研究与发展;2005年04期
6 李林峰;孙长银;;基于FCM聚类与SVM的电力系统短期负荷预测[J];江苏电机工程;2007年03期
7 范劲松,方廷健;特征选择和提取要素的分析及其评价[J];计算机工程与应用;2001年13期
8 刘慧婷,张e,
本文编号:2239252
本文链接:https://www.wllwen.com/kejilunwen/jixiegongcheng/2239252.html