线性回归模型中响应值的选取对二分类问题的影响
发布时间:2018-05-29 04:07
本文选题:二分类问题 + 线性回归模型 ; 参考:《华北电力大学(北京)》2016年硕士论文
【摘要】:本文主要在多元线性回归模型下,研究了不同响应值以及不同的临界值的选取对两个总体分类问题的影响。首先我们取判别规则中的临界值为响应值的均值及中点,并在这两种情况下,分别讨论了不同响应值的选取对平衡及不平衡数据二分类问题的影响。同时,我们将判别规则中的临界值取为响应值的均值,并将响应变量赋值为三组不同的值,这时得到的判别结果与经典判别分析方法如:距离判别法、Bayes判别法对比分析,找到它们之间的联系及优缺点。此外,我们还使响应值取定,并探讨用三种临界值得到的三种判别规则对数据分类判别,依据错判概率最小原则,选出最合适的临界值。在理论研究的基础上,我们用r语言以及5-fold Cross-Validation准则,对响应变量分别取三组值,并将临界值赋值为响应值的均值的三种情况下,对平衡、不平衡模拟数据及真实数据WDBC进行分析,得到了与文章理论相符的模拟结果。另外,我们还对响应变量分别赋为三组不同的值,临界值分别取0或响应值的均值或响应值的中点的九种情况,将它们所对应的错判概率进行了程序模拟,得到了与理论证明一致的模拟结果,而且找到了这9种情况之间的联系,并选出了使得错判率较小的临界值,以便更好地对新的数据分类。
[Abstract]:In this paper, the effects of different response values and different critical values on the two population classification problems are studied under the multivariate linear regression model. First, we take the critical value in the discriminant rule as the mean and the middle point of the response value, and in these two cases, we discuss the influence of the selection of different response values on the two-classification problem of equilibrium and unbalanced data, respectively. At the same time, the critical value in the discriminant rule is taken as the mean value of the response value, and the response variable is assigned to three groups of different values. The result obtained is compared with the classical discriminant analysis method such as the distance discriminant method and Bayes discriminant method. Find out the relationship between them and their advantages and disadvantages. In addition, we also determine the response value, and discuss the classification of data by using the three kinds of critical values, and select the most appropriate critical value according to the principle of minimum misjudgment probability. On the basis of theoretical research, we use r language and 5-fold Cross-Validation criterion to take three sets of values for response variables, and assign the critical value to the three cases of mean value of response value. The unbalanced simulation data and the real data are analyzed by WDBC, and the simulation results are in agreement with the theory of the paper. In addition, the response variables are assigned to three groups of different values, and the critical values are taken as nine cases of the mean value of the response value or the midpoint of the response value, respectively, and the corresponding misjudgment probability is simulated by the program. The simulation results consistent with the theoretical proof are obtained, and the relationship between the nine cases is found, and the critical value which makes the error rate smaller is selected to better classify the new data.
【学位授予单位】:华北电力大学(北京)
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:O212.1
【相似文献】
相关硕士学位论文 前1条
1 杨岩丽;线性回归模型中响应值的选取对二分类问题的影响[D];华北电力大学(北京);2016年
,本文编号:1949424
本文链接:https://www.wllwen.com/kejilunwen/yysx/1949424.html