统计建模方法的理论研究及应用

发布时间：2018-01-30 01:15

本文关键词： 统计建模小波核方法支持向量机谷氨酸发酵广义可加模型　出处：《江南大学》2011年博士论文　论文类型：学位论文

【摘要】：在当今信息时代,各种统计方法层出不穷,统计知识得到越来越多的应用。例如,统计的多尺度建模无论是在理论统计学还是在应用统计学中现都已成为热门课题,这无论对统计方法还是其在各个应用科学领域的发展都起着冲击作用;基于核的学习方法引起了数据分析领域的一场革命;广义可加模型高度的灵活性,为有效揭示数据间所隐含的各种关系提供了一种有效的方法。在化工领域,一个有效的过程模型的建立,对研究如何科学规划生产工艺,进而实现生产过程的优化意义重大。针对常规预测函数模型存在未将预测时域的优化从总体上考虑的不足,在统计的多尺度建模方面研究后,基于小波多尺度的特性而提出了基于小波基函数和Hammerstein模型的预测函数模型,其内部模型参数可以通过不断辨识,自适应的进行校正。利用小波的紧支局部性和多尺度分析特性,既保证了整体误差性能的优化,又突出了重要拟合点的逼近要求,并实现了优化变量的集结。理论分析和仿真应用表明,该方法有更好的跟踪性和抗模型失配性能。 (1)针对如何提高核方法的建模精度的同时还要兼顾建模速度的问题,通过核方法研究,结合小波分析的理论,提出了小波融合核的建模方法。该方法具有小波多分辨率分析和核方法对输入维数不敏感的特点,理论上在保证建模精度的前提下,有更快的建模速度。在此基础上,分别通过一维函数和化工生产数据进行了仿真研究,仿真结果也验证了算法的有效性。(2)由可分Hilbert空间与L~2 ( R )的等价性,利用内积同构的线性算子,可以把L~2 ( R )中子空间的小波尺度函数折算为Hilbert空间中子空间的小波尺度函数。基于支持向量机核函数的条件和小波多分辨率理论,在Hilbert空间构造出Morlet小波核函数。通过仿真实验,与传统的RBF核函数相比较,该尺度再生核函数具有更高的精度和更好的泛化能力。(3)在应用融合核支持向量机建模以提高模型的泛化能力和精度时,为避免在进行核融合时,支持向量机稀疏性的缺失,提出了将数据映射到稀疏特征空间进行研究。通过仿真研究表明,所建模型在保证稀疏性的前提下,能提高建模精度,从而验证了算法的有效性,有良好的应用意义。针对谷氨酸发酵过程复杂,如何解决难以建立有效的模型来指导生产过程优化的现状的研究中,发现广义可加模型(GAM)能为谷氨酸的发酵过程提供行之有效的建模方法。利用该方法可以方便的分析不同的建模变量对谷氨酸产量的影响并从中得出与谷氨酸产量间的关系。研究中,基于15批次发酵实验数据,通过对不同影响因素的分析,最终选择三个显著影响因素(时间T、溶氧DO和氧摄取率OUR)来构建GAM模型,这一模型可以对谷氨酸的发酵过程解释97%。该模型的构建成功,为研究发酵过程中不同因素对谷氨酸产量的影响提供了基础。该模型不仅为根据在线数据预测谷氨酸产量提供了可行有效的方法,而且为发酵过程中在线故障诊断提供了新思路。在谷氨酸发酵过程故障诊断的方法研究中,提出了基于GAMs和Bootstrap方法的故障诊断方法。该方法能只依靠显著观测变量就可对发酵过程的状态是否正常做出判断,并能初步给出故障源相关的观测变量。该方法只有很少的参数需要确定和调整,在发酵过程中,一方面能及时的对故障状态进行报告,另一方面为排除故障源提供必要的参考信息,从而为发酵过程的正常运行提供了可靠的保障。总之,随着计算机技术的快速普及和广泛发展,面对着数据和信息爆炸的挑战,为迅速有效地将数据提升为信息、知识和智能,统计建模方法在工业领域的研究意义重大。
[Abstract]:In today's information age, various statistical methods emerge in an endless stream of statistical knowledge, get more and more applications. For example, multiscale modeling statistics both in theoretical statistics or applied statistics have now become a hot topic, both the statistical methods and the application in various fields of science development plays a role in learning impact; method based on the kernel caused a revolution in the field of data analysis; generalized additive models can be highly flexibility, provides an effective method for revealing the implicit various relationships among data. In the chemical field, establish an effective process model, to study how to scientifically plan the production process, so as to realize the the optimization of production process.
Aiming at the shortage will not consider optimizing the overall prediction horizon from the existence of the conventional predictive function model, research in statistical aspects of multiscale modeling, feature based on wavelet multi-scale and proposes the prediction function model of wavelet function and Hammerstein model based on the internal model parameters can be through continuous identification, adaptive correction. Analysis of the characteristics of using compactly supported wavelets and multiscale, both to ensure the optimization of overall error performance, and some important points fitting, and the optimal parameters. The theoretical analysis and simulation show a better tracking performance and anti model mismatch performance of this method.
(1) in order to improve the modeling accuracy of kernel methods but also the modeling speed, by the nuclear method, combined with wavelet analysis theory, put forward the modeling method of wavelet fusion kernel. This method has the features of wavelet multiresolution analysis and kernel method is not sensitive to the input dimension theory, under the premise of ensuring modeling the accuracy of modeling, faster. On this basis, we have studied the one-dimensional function and chemical production data, the simulation results verify the validity of the algorithm. (2) when the Hilbert space and L~ 2 (R) of equivalence, using linear operator product isomorphism, can L~2 (R) wavelet scale function and wavelet scale function conversion neutron space as a subspace in Hilbert space. Conditions of the support vector kernel function and wavelet multi-resolution theory based on Hilbert space structure Morlet The nuclear wave function. Through the simulation experiment, compared with the traditional RBF kernel function, the scaling reproducing kernel function has higher accuracy and better generalization ability. (3) in the application of nuclear fusion support vector machine modeling to improve the generalization ability and the precision of the model, in order to avoid nuclear fusion timely, lack of support the sparse vector machine, the research data is mapped to the sparse feature space. The simulation results show that the model under the premise of guaranteeing sparsity, can improve the modeling accuracy, which verifies the validity of the algorithm, has good application significance.
In view of the glutamic acid fermentation process is complex, research how to solve it is difficult to establish an effective model to guide the status of production process optimization, find the generalized additive model (GAM) can provide effective modeling method for the fermentation process of glutamic acid. The method can affect convenient modeling and analysis of the different variables to the yield of glutamic acid and from and that the yield of glutamic acid. Among the studies, 15 batch fermentation based on experimental data, through the analysis of the influence of different factors, the final choice of the three significant factors (T, DO and dissolved oxygen uptake rate OUR) to construct the GAM model, this model can explain the fermentation process of glutamic acid was successfully constructed the 97%. model the influence of different factors to provide a basis for research on the fermentation process of glutamic acid production. The model not only for predicting the yield of glutamic acid according to online data provides a feasible The effective method, and provides a new idea for online fault diagnosis in the process of fermentation. In the research of fault diagnosis method of glutamic acid fermentation process, this paper presents a fault diagnosis method based on GAMs and Bootstrap method. This method can only rely on significant variables can be the state of the fermentation process is normal judgment, and observation variables the preliminary fault source. This method gives only a few parameters need to be determined and adjusted in the fermentation process, on the one hand to the failure state of the report, on the other hand, to provide the necessary information for troubleshooting source, so as to provide a reliable guarantee for the normal running of the fermentation process.
In short, with the rapid popularization and extensive development of computer technology, facing the challenge of data and information explosion, the research of statistical modeling is of great significance in the industrial field for rapidly and effectively upgrading data to information, knowledge and intelligence.

【学位授予单位】：江南大学
【学位级别】：博士
【学位授予年份】：2011
【分类号】：C81

【引证文献】