超限学习机的优化改进及应用研究

发布时间：2018-11-18 20:44

【摘要】：随着互联网+时代的来临,不论在互联网行业、快消品行业、金融行业,还是传统的化工行业、制造行业,其数据量都正在以爆炸式的速度增长。大数据的流通、存储、分析、可视化等任务对于各企业都是重大的挑战。机器学习技术作为一种对数据中隐含模式及规律进行分析和挖掘的重要手段,也日益凸显出它的威力和重要性。直至今日,机器学习的研究和应用已被广泛地应用于图像识别,语音分析,自然语言处理及各类商用数据以及工业数据的挖掘中。在机器学习的研究中,分类和回归是两大基础。在机器学习的研究进程中,出现了如广义线性模型(GLM)、人工神经网络(ANN)、支持向量机(SVM)、超限学习机(ELM)等通用算法。如何在提高分类器的准确率或者回归器的拟合精度的同时提高算法的泛化能力,已经成为现在机器学习技术发展的重要课题。在此背景下,本文主要研究并完成了以下工作:(1)针对ELM算法中由于输入层-隐层的权重随机初始化操作而会导致的隐层输出矩阵H不满秩从而导致部分隐层节点不起作用的问题,本文提出了相关性映射超限学习机。利用输入特征与预测标签之间的相关性系数,将特征与标签之间的线性相关信息经过非线性函数映射后,用于确定输入层-隐层的权重矩阵。数据集测试结果表明该算法不仅可以提高超限学习机在分类及回归任务中的预测准确率和精度,同时可以更加高效地利用隐层节点,提高模型的泛化能力。(2)针对由于ELM网络中隐层激活函数单一化而导致其难以学习复杂数据,同时易产生冗余隐层节点的问题,本文提出了基于粒子群算法的混合域超限学习机算法。该算法将隐层激活函数的组合(包含7种候选的激活函数)定义为一个粒子,随机产生大量粒子成为初始群体,按照一定进化规则迭代寻找隐层节点对应最优个体,即最优的激活函数组合。数据集测试结果表明该算法有效地提高了隐层节点的利用率和模型最终的泛化能力。(3)针对石油化工生产过程中设备管道流动腐蚀的冲刷腐蚀现象,结合实际问题,开展了如下研究工作:1.利用实验设备,按照固定变量法,获得10号碳钢在不同实验条件下的冲蚀速率。2.利用CFD计算流体力学仿真软件,对90度弯管在不同条件下的冲蚀速率(包括平均速率和最大速率)进行仿真测试。对以上2种冲刷腐蚀现象,基于收集的历史数据,利用多种机器学习模型进行建模测试,发现本文提出的2种超限学习机改进算法均能更好地对冲蚀数据进行拟合和预测,从而为石油化工行业中腐蚀建模预测问题提供了一种可行的方法。本文从理论分析到实际应用都取得了一定的进展,为超限学习机在复杂工业问题中的应用提供了一些新的思路,具有一定的理论意义和实践作用。
[Abstract]:With the advent of the Internet era, whether in the Internet industry, consumer goods industry, financial industry, or the traditional chemical industry, manufacturing industry, the amount of data is increasing at an explosive rate. Big data's circulation, storage, analysis, visualization and other tasks are major challenges for enterprises. Machine learning technology, as an important means of analyzing and mining hidden patterns and rules in data, also highlights its power and importance day by day. Up to now, the research and application of machine learning have been widely used in image recognition, speech analysis, natural language processing, various kinds of commercial data and industrial data mining. In the research of machine learning, classification and regression are the two major bases. In the research process of machine learning, general algorithms such as generalized linear model (GLM), artificial neural network, (ANN), support vector machine, (SVM), overlimit learning machine (ELM) have emerged. How to improve the accuracy of classifier or the fitting accuracy of regression and improve the generalization ability of the algorithm has become an important topic in the development of machine learning technology. In this context, The main work of this paper is as follows: (1) aiming at the problem that the output matrix H of hidden layer does not work due to the random initialization of the weights of the input layer and the hidden layer in the ELM algorithm, some hidden layer nodes do not work. In this paper, a correlation mapping learning machine is proposed. Using the correlation coefficient between input feature and prediction label, the linear correlation information between feature and label is mapped by nonlinear function to determine the weight matrix of input layer and hidden layer. The data set test results show that the algorithm can not only improve the prediction accuracy and accuracy of the out-of-limit learning machine in classification and regression tasks, but also make more efficient use of hidden layer nodes. To improve the generalization ability of the model. (2) because of the homogeneity of hidden layer activation function in ELM network, it is difficult to learn complex data, and it is easy to generate redundant hidden layer nodes. In this paper, a hybrid domain learning machine algorithm based on particle swarm optimization (PSO) is proposed. In this algorithm, the combination of hidden layer activation functions (including seven candidate activation functions) is defined as a particle, and a large number of particles are randomly generated into an initial population, and the hidden layer nodes are iterated to find the optimal individuals according to certain evolutionary rules. The optimal combination of activation functions. The data set test results show that the algorithm can effectively improve the utilization ratio of hidden layer nodes and the ultimate generalization ability of the model. (3) in view of the erosion corrosion phenomenon of pipeline flow in petrochemical production process, combined with practical problems, The following research work has been carried out: 1. According to the fixed variable method, the erosion rate of 10 # carbon steel under different experimental conditions was obtained by using the experimental equipment. 2. 2. The erosion rate (including average rate and maximum rate) of 90 degree bend pipe under different conditions was simulated by CFD computational fluid dynamics simulation software. On the basis of the historical data collected and the various machine learning models used to model and test the above two kinds of scour corrosion phenomena, it is found that the two improved algorithms proposed in this paper can better fit and predict the erosion data. It provides a feasible method for corrosion modeling and prediction in petrochemical industry. In this paper, some progress has been made from theoretical analysis to practical application, which provides some new ideas for the application of over-limit learning machine in complex industrial problems, and has certain theoretical and practical significance.
【学位授予单位】：浙江理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP181

【相似文献】