线性非高斯无环因果模型的研究

发布时间：2018-03-16 04:10

本文选题：负熵　切入点：峭度　出处：《广东工业大学》2017年硕士论文　论文类型：学位论文

【摘要】：近年来,线性非高斯无环模型(LiNGAM)在没有任何先验知识的情况下能够从观察数据中完整的识别因果网络而得到越来越多的关注,并在神经科学,经济学,基因组学等领域得到了广泛的应用.Direct LiNGAM(Direct Method for Learning a Linear Non-Gaussian Structural Equation Model)框架是其中一个经典解法,但其存在当维度达到25维度以上时,外生变量识别率低的问题,从而产生级联效应,使得整个网络的估计误差随着层数增大越来越大,并且计算复杂度达到了维度的三次方.针对以上问题,本文从三个不同的角度来研究外生变量的识别问题:(1)从局部选择的角度出发,把变量的非高斯性作为外生变量选择的标准,用负熵来度量变量的非高斯,选择负熵最大的k个变量存入局部目标变量集合Lv中,在集合Lv中进一步去寻找外生变量,从而提高了外生变量的识别率.(2)从独立性的角度出发,通过引入自适应的独立性判定参数,根据此参数来找出与其余所有变量回归得到的残差都独立的变量,即为外生变量.该算法不仅避免了传统算法对独立性值差异敏感而导致识别率低的问题,而且也避免了不同数据集对固定独立性参数敏感而导致无法识别的缺陷.(3)从估计方式的角度出发,通过引入峭度的度量标准,我们发现当干扰变量服从独立同分布时,外生变量是具有最大的峭度值,基于此特征我们提出了一种直接识别外生变量的方法,该算法不仅是一种直接量化的关系,并且计算复杂度仅仅为维度的二次方.本文的研究成果不仅丰富了LiNGAM模型的研究,而且在一定程度上为外生变量识别提供了新的方法支持。
[Abstract]:In recent years, the linear non-ring Gao Si model LiNGAM has gained more and more attention in neuroscience, economics, and the complete identification of causal networks from observational data without any prior knowledge. Direct LiNGAM(Direct Method for Learning a Linear Non-Gaussian Structural Equation Model framework has been widely used in genomics and other fields, but it has the problem of low recognition rate of exogenous variables when the dimension reaches above 25 dimension, which results in cascade effect. The estimation error of the whole network increases with the increase of the number of layers, and the computational complexity reaches the third power of the dimension. In this paper, we study the problem of identification of exogenous variables from three different angles. (1) from the point of view of local selection, we take the non-#china_person0# nature of variables as the criterion for the selection of exogenous variables, and use negative entropy to measure the non-#china_person1# of variables. Select k variables with maximum negative entropy into the set of local objective variables LV, and further search for exogenous variables in the set LV, thus improving the recognition rate of exogenous variables. By introducing an adaptive independence decision parameter, the variables which are independent of the residuals obtained from the regression of all the other variables are found according to this parameter. This algorithm not only avoids the problem that the traditional algorithm is sensitive to the difference of independence value, which leads to low recognition rate. In addition, the defect of different data sets, which is sensitive to fixed independence parameters, is avoided. (3) from the point of view of estimation method, we find that when the interference variables are distributed independently, the kurtosis metric is introduced. Exogenous variables have the largest kurtosis value. Based on this feature, we propose a method to directly identify exogenous variables. This algorithm is not only a direct quantization relation, The computational complexity is only the quadratic of the dimension. The results of this paper not only enrich the research of LiNGAM model, but also provide a new method for the identification of exogenous variables to some extent.
【学位授予单位】：广东工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP301.6

【参考文献】