基于随机森林的视觉数据分类关键技术研究

发布时间：2018-04-29 07:31

本文选题：模式分类 + 随机森林　；参考：《华南理工大学》2016年博士论文

【摘要】：随着经济社会的发展,计算机技术和电子产品在人们日常生活中已经得到普及,每天都产生着大量的视频、图像。对视频、图像等视觉数据研究成为计算机视觉、模式识别和机器学习等领域科技工作者关注的焦点之一。由于视觉数据越来越多且更加复杂最终形成视觉大数据,传统的单个统计计算模型已经不再能很好地分析、理解、挖掘和分类视觉大数据。近年来,机器学习方法已经逐步成为计算机视觉、模式识别、数字信号处理、自动化控制以及人工智能领域中对视觉数据进行挖掘和统计分析的主要方法和工具。随机森林是集成学习领域中一个重要的分支和方向,是对计算机视觉数据理解和分析的一种有效方法。该方法可以应用于数据分类和回归问题研究,其主要思想是通过构建弱分类器(预测器),然后将已构建的弱分类器(预测器)进行组合得到一个集成的综合系统。当一个新的实例到来时,这些集成的弱分类器(预测器)先单个进行分类(预测),然后将它们的分类(预测)结果进行投票(计算平均值)作为该实例的结果输出。随机森林作为一类有效的集成学习方法,在数据挖掘、模式识别、机器视觉以及人工智能等领域取得了诸多成就,并表现出了优异的实践能力。尽管随机森林在实际应用中取得了很多成就,但其在视觉数据属性特征选择、视觉数据样本实例分布以及集成学习元基础模型设计等方面研究尚未得到完全证明和充分解释。本文开展基于随机森林的视觉数据分类关键技术研究相关工作,主要研究随机森林学习方法作为视觉数据分类器的相关工作与核心问题。具体来说,本文开展的主要研究工作和创新点有:(1)在基于特征选择的随机森林集成学习问题研究中,探讨特征选择对随机森林作为视觉数据分类器的影响关系。本文提出一种基于移动块搜索的属性特征选择随机森林方法。基于块搜索属性特征选择算法首先将视觉数据的属性特征按照预定规则进行分块,然后从某个块中的属性特征与从剩下属性特征中随机抽取的特征共同构成决策树结点分裂的数据来源。当全部元基础模型决策树都建好之后,对于一个新的测试样本,由所有的决策树投票后输出相应的类标签信息。在灰度共生矩阵、局部二值模式和多重分形谱等视觉数据生成算法基础上,本文所提出的基于块搜索属性特征选择算法在UIUC数据集、UMD数据集、KTH-TIPS数据集、ALOT数据集和FMD数据集等五个数据集分类结果具有一定的竞争力。(2)在基于实例测度分布的随机森林学习关键技术研究中,从样本实例数据分布的角度对随机森林学习模型进行研究。该方法是在建立元基础模型时,通过一定的测度学习途径反应出其数据的分布状况并充分应用到分类决策中的策略。基于实例测度学习的随机森林方法,通过对原始数据集以及采样数据子集都分别采用混合高斯模型拟合数据分布之后,使得每个相应的数据集都有一组混合高斯模拟分布的结果参数,这些参数的个数是相同的,也就是维度相同的向量。通过测度学习方法计算任一个数据子集与原训练数据集之间的相似程度,根据数据子集分布与原始训练数据集之间子太相似或者太不相似投票时权重都应较小的原则进行集成。建议算法在ALOT数据集、Flower102花图像分类数据集、Scene-15场景图像分类数据集和Food101餐桌菜品图像分类数据集四个实验中取得了良好的分类效果。(3)对基于集成学习的视觉数据分类器关键问题研究中元基础模型本身复杂度进行探讨和研究,提出随机深度决策森林提升方法模型。随机深度决策森林提升方法是在深度提升方法的基础上,根据不同数据子集应该有着不同深度决策树原则进行模型设计与优化求解。随机深度决策森林提升方法是一种融合深度学习和深层决策树思想的集成学习策略。该方法以提升方法为主体框架,采用随机深度决策森林代替传统提升方法中单一决策树的方法策略,是一种两层结构学习模型。通过在机器学习库中英文字母识别数据集与FMD数据集上的实验结果验证了本文提出的随机深度决策森林提升方法模型具有较高的可靠性和准确性。
[Abstract]:With the development of economy and society, computer technology and electronic products have been popularized in people's daily life. A large number of video and images are produced every day. Visual data, such as video and image, have become one of the focus of computer vision, pattern recognition and machine learning. The more and more complex and more complex to form large visual data, the traditional single statistical computing model is no longer able to analyze, understand, excavate, and classify visual large data. In recent years, machine learning methods have gradually become computer vision, pattern recognition, digital signal processing, automation control, and artificial intelligence in the field of vision. According to the main methods and tools for mining and statistical analysis, random forest is an important branch and direction in the field of integrated learning. It is an effective method for understanding and analysis of computer visual data. This method can be applied to the research of data classification and regression. The main idea is to construct a weak classifier (predictor). Then the constructed weak classifier (predictor) is combined to get an integrated integrated system. When a new instance comes, the integrated weak classifier (predictor) first classifies (prediction) and then votes their classification (predicted) results as the result of the example. As an effective integrated learning method, many achievements have been achieved in the fields of data mining, pattern recognition, machine vision and artificial intelligence, and excellent practical ability is shown. Although a lot of achievements have been made in the practical application of random forests, the visual data samples are distributed in the visual data samples. Research on the design of meta model of integrated learning meta model has not been fully proved and fully explained. In this paper, the key technology of visual data classification based on random forest is researched, and the main research work and core problem of random forest learning method as visual data classifier. The research work and innovation are as follows: (1) in the study of random forest integrated learning based on feature selection, the influence of feature selection on the random forest as a visual data classifier is discussed. A random forest method based on the property feature selection based on mobile block search is proposed. The attribute features of the visual data are partitioned in accordance with the predetermined rules, and then the data sources of the decision tree node splitting are formed from the attribute features in the block and the random extraction features from the remaining attributes. When all the meta model decision trees are built, for a new test sample, all the decision trees are thrown. On the basis of grayscale symbiotic matrix, local two value model and multifractal spectrum generation algorithm, the proposed block search attribute feature selection algorithm is used to classify five data sets, such as UIUC data set, UMD dataset, KTH-TIPS data set, ALOT data set and FMD data set. (2) in the study of the key technology of random forest learning based on the distribution of case measure, the random forest learning model is studied from the point of view of sample data distribution. A random forest method based on case measure learning. By fitting a mixed Gauss model to the original data sets and sampling data subsets, each corresponding data set has a set of mixed Gauss simulation distribution results. The number of these parameters is the same, that is, The similarity between the data subset and the original training data set is calculated by the measure learning method. The proposed algorithm is integrated with the principle that the weight of the data subset distribution is too similar to the original training data set and the weight should be smaller when it is too dissimilar. The proposed algorithm is in the ALOT data set, and the Flower102 flower image is divided. Class data sets, Scene-15 scene image classification data sets and Food101 table dishes image classification data sets have achieved good classification results in four experiments. (3) the key problem of visual data classifier based on integrated learning is studied and studied in the complexity of the meta base model itself, and a random depth decision forest lifting party is proposed. The method of stochastic depth decision forest lifting is based on the method of depth lifting, which should be designed and optimized according to the principle of different depth decision tree. The method of forest lifting in random depth decision is an integrated learning strategy that combines deep learning and deep decision tree thinking. Taking the lifting method as the main frame, the stochastic depth decision forest is used to replace the single decision tree method in the traditional lifting method. It is a two layer structure learning model. Through the experimental results on the English letter recognition data set and the FMD data set in the machine learning library, the random depth decision forest hoisting party proposed in this paper is verified. The method model has high reliability and accuracy.

【学位授予单位】：华南理工大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP18

【相似文献】