函数型数据建模的方法及其应用
发布时间:2018-01-29 16:47
本文关键词: 函数型数据 基表示 标准正交基 函数主成分分析 分类性能 聚类 函数k-means聚类算法 回归 部分函数线性模型 出处:《山西大学》2017年博士论文 论文类型:学位论文
【摘要】:信息技术的迅猛发展催生了大量的函数型数据,该类数据广泛存在于经济、金融、生物信息、医学、气象学、人体运动学、语音识别等众多领域,函数型数据分析已经成为数据挖掘领域中的一个研究热点.传统的数据挖掘方法将函数型数据视为离散、有限的观测序列,忽视了函数型数据的连续性和高维性,限制了函数型数据的知识发现.针对传统数据挖掘方法处理函数型数据的局限性,基于函数型数据的基表示,探索了函数型数据在分类、聚类及回归问题中建模的理论与方法,并通过具体案例验证了建模方法的有效性.主要工作和创新点概括如下:(1)在函数型数据的表示问题中,探究了函数主成分表示的建模原理.运用变分理论建立了由数据驱动的函数主成分基满足的模型,为求解函数主成分基提供了方法;证明了函数主成分表示是基于均方误差准则的最优标准正交表示,为函数型数据的主成分表示提供了理论依据.(2)在函数型数据的分类问题中,探究了不同基表示的分类性能差异.证明了在标准正交表示下函数型数据的L2距离等价于基系数向量的欧氏距离,为函数型数据两阶段分类方法奠定了理论基础;基于两阶段分类方法,从分类性能视角分别给出了适合傅里叶基、小波基以及函数主成分基表示的函数型数据类型;同时,通过实验比较了函数型数据在非正交表示和正交表示两种表示下的分类性能差异.(3)在函数型数据的聚类问题中,探究了函数k-means聚类算法的类中心表示.证明了一个度量多维函数样例间相似性的测度是距离,该距离的构造考虑了函数样例的导数信息,也为构建函数k-means聚类算法奠定了基础;基于指定距离给出了函数k-means聚类算法的类中心表示,证明了该中心能保证类内距离平方和最小;真实数据上的实验验证了该函数k-means聚类算法的有效性.(4)在函数型数据的回归问题中,探究了用于处理混合数据的部分函数线性模型的建模方法.为提高模型的预测精度,借用了函数系数在Sololev-Hilbert空间的基表示,将半参模型转化为参数模型;同时为增加模型的稳健性,在惩罚最小二乘法中引入了更为宽松的惩罚策略,运用该惩罚最小二乘法对模型进行学习.人造数据和真实数据均验证了该方法的有效性.本文针对传统数据挖掘方法处理函数型数据的局限性,基于函数型数据的基表示策略,提供了函数型数据建模的理论与方法,研究结果在函数型数据挖掘领域具有一定的理论价值和实践意义.
[Abstract]:The rapid development of information technology has given birth to a large number of functional data, such data widely exist in the economy, finance, biological information, medicine, meteorology, human kinematics, speech recognition and many other fields. Functional data analysis has become a research hotspot in the field of data mining. Traditional data mining methods treat functional data as discrete and finite observation sequences. The continuity and high dimension of the functional data are ignored, and the knowledge discovery of the functional data is limited. In view of the limitation of the traditional data mining method to deal with the functional data, the basic representation of the functional data is based on. The theory and method of functional data modeling in classification, clustering and regression problems are explored. The main work and innovation are summarized as follows: 1) in the representation of functional data. The modeling principle of function principal component representation is explored. The data-driven principal component basis model is established by using the variational theory, which provides a method for solving the function principal component basis. It is proved that the function principal component representation is the optimal standard orthogonal representation based on the mean square error criterion, which provides a theoretical basis for the principal component representation of the functional data. It is proved that the L2 distance of the functional data is equivalent to the Euclidean distance of the base coefficient vector under the standard orthogonal representation. It lays a theoretical foundation for the two-stage classification method of functional data. Based on the two-stage classification method, the functional data types suitable for the representation of Fourier basis, wavelet basis and principal component basis are given from the perspective of classification performance. At the same time, the classification performance difference of functional data under non-orthogonal representation and orthogonal representation is compared by experiments. It is proved that a measure to measure the similarity between multi-dimensional function samples is distance, and the construction of this distance takes into account the derivative information of function samples. It also lays a foundation for constructing function k-means clustering algorithm. The class center representation of function k-means clustering algorithm is given based on the specified distance, and it is proved that the center can guarantee the minimum sum of square distance within the class. Experiments on real data verify the validity of the function k-means clustering algorithm. 4) in the regression problem of functional data. The modeling method of partial functional linear model used to deal with mixed data is explored. In order to improve the prediction accuracy of the model, the basic representation of function coefficients in Sololev-Hilbert space is used. The semi-parametric model is transformed into a parametric model. At the same time, in order to increase the robustness of the model, a more relaxed penalty strategy is introduced in the penalty least square method. Using the penalty least square method to learn the model. Artificial data and real data both verify the effectiveness of the method. This paper aims at the limitations of traditional data mining methods to deal with functional data. Based on the basic representation strategy of functional data, the theory and method of functional data modeling are provided. The research results have certain theoretical value and practical significance in the field of functional data mining.
【学位授予单位】:山西大学
【学位级别】:博士
【学位授予年份】:2017
【分类号】:TP311.13;O212.1
【参考文献】
相关期刊论文 前2条
1 孟银凤;梁吉业;;基于最小二乘支持向量机的函数型数据回归分析[J];模式识别与人工智能;2014年12期
2 王R,
本文编号:1473870
本文链接:https://www.wllwen.com/kejilunwen/yysx/1473870.html