多元统计分析中两个问题的研究
发布时间:2018-12-31 13:38
【摘要】:统计分布是用来描述随机变量特性及规律的一种主要手段。多元统计分析方法是建立在多元统计分布基础上的一类处理多元统计数据方法的总称,是统计分析中有着丰富理论成果和许多应用方法的主要分支。本文由两部分相对独立的内容构成,第一部分是贝叶斯决策理论中的判别分析理论方面的研究,第二部分典型相关分析方法的应用研究。贝叶斯决策理论中的判别分析理论研究:统计模式识别方法是以样本特征值的统计概率为基础,本文运用贝叶斯决策理论、统计学理论做了一系列探究工作。贝叶斯决策理论以其分类错误发生率最小的特点在众多领域中进行了应用。根据原始的贝叶斯公式,前人已经得出了基于多元正态概率模型的贝叶斯判别函数及其决策面,并使用实验验证和分析了所得出的结论,但在其它统计分布下的分析结果却至今不得而知。多年来的研究表明,在现实生活中,并非所有样本的分布情况都服从多元正态分布。当研究样本数据出现尖峰特性时,在概率密度分布图上显现出较严重的尾部时,多元正态分布是不能满足这种情况的。当我们采用多元正态分布来描述样本数据的长拖尾特性时,样本的数据中的异常点必然会影响到协方差矩阵和均值的估计,从而使判别结果与实际结果相差甚远,进而影响多元正态分布的稳健性。然而,多元t分布比多元正态分布拥有更好的稳健性。在多元t分布中,我们可以适当调整自由度参数的大小,减少数据中的异常点对研究结果的影响。所以本文第一部分将多元t分布的概率密度函数作为分类器设计的依据,按照多元t分布概率模型抽取样本集并进行样本分析,具有较强的实际意义。主要是从协方差结构表达式的不同,自由度的相等和不等分为六种情形,分别来讨论在多元t密度模型下的判别函数表达式。对于这六种情形,我们分别再从先验概率相等和不等的情形来进一步讨论。最终可以推导出每种情形下的两种多元t密度模型的判别函数表达式,有了判别函数的表达式,我们就可以得出它的决策面方程,并且画出决策面图形。典型相关分析在烟草领域的应用研究:典型相关分析方法是多元统计分析的一个研究课题。它借助主成分的思想,用很少几对综合变量来反映两组变量之间的线性相关性。目前它已在很多领域的相关分析和预测分析中得到广泛应用。本文在探究典型相关分析的理论后,并将其应用于烤烟实例分析,对烤烟35个化学成分与10个感官舒适度指标进行了典型相关分析。研究结果表明,烤烟化学成分中的某些指标对感官舒适度中的某些指标都有显著影响,所以在烤烟的生产、制作、加工过程中,重点可以放在研究这些有显著影响的指标上,从而改善烤烟的感官舒适度。进一步说明了研究典型相关分析的价值所在。
[Abstract]:Statistical distribution is a main method to describe the characteristics and laws of random variables. Multivariate statistical analysis method is a general term for dealing with multivariate statistical data on the basis of multivariate statistical distribution. It is the main branch of statistical analysis with rich theoretical results and many applied methods. This paper consists of two parts: the first part is the research of discriminant analysis theory in Bayesian decision theory, the second part is the application of canonical correlation analysis method. Research on discriminant Analysis Theory in Bayesian decision Theory: statistical pattern recognition method is based on the statistical probability of sample eigenvalue. In this paper, Bayesian decision theory and statistical theory are used to do a series of research work. Bayesian decision theory has been applied in many fields because of its minimal incidence of classification errors. According to the original Bayesian formula, the Bayesian discriminant function based on the multivariate normal probability model and its decision surface have been obtained, and the conclusions have been verified and analyzed by experiments. However, the results of the analysis under other statistical distributions are still unknown. Many years of research have shown that not all samples are distributed from multivariate normal distribution in real life. The multivariate normal distribution can not satisfy this condition when the peak characteristic of the sample data is studied and the more serious tail is shown on the probability density distribution map. When we use the multivariate normal distribution to describe the long tail characteristics of the sample data, the outliers in the sample data will inevitably affect the estimation of the covariance matrix and the mean value, so that the discriminant result is far from the actual result. Then the robustness of multivariate normal distribution is affected. However, the multivariate t distribution is more robust than the multivariate normal distribution. In the multivariate t distribution, we can adjust the degree of freedom parameter properly and reduce the influence of outliers in the data on the results of the study. Therefore, in the first part of this paper, the probability density function of multivariate t distribution is taken as the basis of classifier design, and the sample set is extracted according to the multivariate t distribution probability model and the sample analysis is carried out, which is of great practical significance. This paper mainly discusses the discriminant function expressions under the multivariate t density model from the different expression of covariance structure and the equality and inequality of degrees of freedom into six cases. For these six cases, we further discuss them in the case of equal and unequal prior probabilities, respectively. Finally, the discriminant function expressions of two multivariate t-density models in each case can be derived. With the expression of the discriminant function, we can obtain its decision surface equation and draw the decision surface figure. Application of canonical correlation analysis in tobacco field: canonical correlation analysis method is a research topic of multivariate statistical analysis. It reflects the linear correlation between two sets of variables with the help of the idea of principal component and a few pairs of comprehensive variables. At present, it has been widely used in related analysis and prediction analysis in many fields. After exploring the theory of canonical correlation analysis and applying it to the case study of flue-cured tobacco, the canonical correlation analysis of 35 chemical components and 10 sensory comfort indexes of flue-cured tobacco was carried out. The results show that some indexes in the chemical composition of flue-cured tobacco have a significant effect on the sensory comfort, so in the production, manufacture and processing of flue-cured tobacco, the emphasis can be placed on the study of these indexes. So as to improve the sensory comfort of flue-cured tobacco. The value of canonical correlation analysis is further explained.
【学位授予单位】:云南财经大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:O212.4
本文编号:2396658
[Abstract]:Statistical distribution is a main method to describe the characteristics and laws of random variables. Multivariate statistical analysis method is a general term for dealing with multivariate statistical data on the basis of multivariate statistical distribution. It is the main branch of statistical analysis with rich theoretical results and many applied methods. This paper consists of two parts: the first part is the research of discriminant analysis theory in Bayesian decision theory, the second part is the application of canonical correlation analysis method. Research on discriminant Analysis Theory in Bayesian decision Theory: statistical pattern recognition method is based on the statistical probability of sample eigenvalue. In this paper, Bayesian decision theory and statistical theory are used to do a series of research work. Bayesian decision theory has been applied in many fields because of its minimal incidence of classification errors. According to the original Bayesian formula, the Bayesian discriminant function based on the multivariate normal probability model and its decision surface have been obtained, and the conclusions have been verified and analyzed by experiments. However, the results of the analysis under other statistical distributions are still unknown. Many years of research have shown that not all samples are distributed from multivariate normal distribution in real life. The multivariate normal distribution can not satisfy this condition when the peak characteristic of the sample data is studied and the more serious tail is shown on the probability density distribution map. When we use the multivariate normal distribution to describe the long tail characteristics of the sample data, the outliers in the sample data will inevitably affect the estimation of the covariance matrix and the mean value, so that the discriminant result is far from the actual result. Then the robustness of multivariate normal distribution is affected. However, the multivariate t distribution is more robust than the multivariate normal distribution. In the multivariate t distribution, we can adjust the degree of freedom parameter properly and reduce the influence of outliers in the data on the results of the study. Therefore, in the first part of this paper, the probability density function of multivariate t distribution is taken as the basis of classifier design, and the sample set is extracted according to the multivariate t distribution probability model and the sample analysis is carried out, which is of great practical significance. This paper mainly discusses the discriminant function expressions under the multivariate t density model from the different expression of covariance structure and the equality and inequality of degrees of freedom into six cases. For these six cases, we further discuss them in the case of equal and unequal prior probabilities, respectively. Finally, the discriminant function expressions of two multivariate t-density models in each case can be derived. With the expression of the discriminant function, we can obtain its decision surface equation and draw the decision surface figure. Application of canonical correlation analysis in tobacco field: canonical correlation analysis method is a research topic of multivariate statistical analysis. It reflects the linear correlation between two sets of variables with the help of the idea of principal component and a few pairs of comprehensive variables. At present, it has been widely used in related analysis and prediction analysis in many fields. After exploring the theory of canonical correlation analysis and applying it to the case study of flue-cured tobacco, the canonical correlation analysis of 35 chemical components and 10 sensory comfort indexes of flue-cured tobacco was carried out. The results show that some indexes in the chemical composition of flue-cured tobacco have a significant effect on the sensory comfort, so in the production, manufacture and processing of flue-cured tobacco, the emphasis can be placed on the study of these indexes. So as to improve the sensory comfort of flue-cured tobacco. The value of canonical correlation analysis is further explained.
【学位授予单位】:云南财经大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:O212.4
【参考文献】
相关期刊论文 前5条
1 张礼平,杨志勇,陈正洪;典型相关系数及其在短期气候预测中的应用[J];大气科学;2000年03期
2 宗序平;赵俊;陶伟;;统计学上三大分布推导方法[J];数学的实践与认识;2009年07期
3 胡建军;周冀衡;李文伟;冯晓民;;烤烟香味成分与其感官质量的典型相关分析[J];烟草科技;2007年03期
4 魏立力,张文修;几何分布的一类贝叶斯停止判决法则[J];应用数学学报;2003年01期
5 周荣亮;吴文奇;徐爱民;;关于χ~2分布概率密度函数的一个直接求解方法[J];浙江万里学院学报;2010年05期
,本文编号:2396658
本文链接:https://www.wllwen.com/kejilunwen/yysx/2396658.html