基于C-MCMC和MapReduce的并行贝叶斯网络分类器研究
本文选题:贝叶斯网络 + 结构学习 ; 参考:《太原理工大学》2017年硕士论文
【摘要】:贝叶斯网络分类器是具有很强的学习和推理能力,是数据处理领域研究热点之一。虽然贝叶斯网络分类器表现出了良好的分类预测性能,但是仍存在先验只是利用率不高、实用性差而导致学习不能得到最优网络结构,从而影响了分类器的性能。如何更好的实现现有贝叶斯网络分类器的并行化仍然是亟待解决的问题之一。为了解决上述问题,本文开展了并行贝叶斯网络分类器相关的研究,设计并实现了新型的并行贝叶斯网络分类器,主要包括以下内容:(1)本文在马氏链蒙特卡洛算法(Markov Chain Monte Carlo,MCMC)的基础上引入存在、缺失和PD/CPD三种先验知识,提出了一种新的贝叶斯网络结构学习算法C-MCMC(Constrained-MCMC),运用以及先验知识对MCMC贝叶斯网络结构学习算法的影响,并通过一系列的实验验证了算法的有效性,从而学习得到更加优良的贝叶斯网络;(2)将C-MCMC贝叶斯网络结构学习算法应用在传统的增广朴素贝叶斯分类器(BAN)和通用贝叶斯网络分类器(GBN)中,并进行相应的参数估计,从而设计了C-MCMC BAN分类器和C-MCMC GBN分类器;借助开源平台Hadoop的并行编程模型MapReduce,设计了相应的Map函数与Reduce函数,对C-MCMC贝叶斯网络分类器使用MapReduce并行编程框架进行了并行化,给出了具体的编程实现过程,并通过搭建Hadoop平台验证了算法并行化对算法效率的改进和提高。实验结果表明,本文所设计的贝叶斯网络分类器的性能优于传统的贝叶斯网络分类器,有着较高的分类准确率和效率,且适用于大数据处理的场合,可以被应用于多个场合,具有广阔的市场应用前景。
[Abstract]:Bayesian network classifier has strong learning and reasoning ability, and it is one of the research hotspots in data processing field. Although Bayesian network classifier has shown good classification and prediction performance, there is still a priori only low utilization ratio and poor practicability, which leads to the failure of learning to obtain the optimal network structure, thus affecting the performance of classifier. How to better realize the parallelization of existing Bayesian network classifiers is still one of the problems to be solved. In order to solve the above problems, this paper develops the research of parallel Bayesian network classifier, designs and implements a new parallel Bayesian network classifier. The main contents are as follows: 1) this paper introduces three kinds of prior knowledge of existence, missing and PD/CPD on the basis of Markov Chain Monte Monte MCMCs of Markov chain Monte Carlo algorithm. In this paper, a new Bayesian network structure learning algorithm, C-MCMC- Constrained-MCMC-, is proposed. The effect of using and prior knowledge on the learning algorithm of MCMC Bayesian network structure is proved by a series of experiments, and the effectiveness of the algorithm is verified by a series of experiments. Thus, a better Bayesian network is obtained. The C-MCMC Bayesian network structure learning algorithm is applied to the traditional augmented naive Bayesian classifier (Ann) and the general Bayesian network classifier (GBN), and the corresponding parameters are estimated. In this paper, C-MCMC BAN classifier and C-MCMC GBN classifier are designed, the corresponding Map function and Reduce function are designed with the help of Hadoop parallel programming model of open source platform, and C-MCMC Bayesian network classifier is parallelized using MapReduce parallel programming framework. The implementation process of the algorithm is given, and the improvement and improvement of the algorithm efficiency are verified by building the Hadoop platform. The experimental results show that the proposed Bayesian network classifier is superior to the traditional Bayesian network classifier and has high classification accuracy and efficiency. It is suitable for big data processing and can be applied to many occasions. Has broad market application prospect.
【学位授予单位】:太原理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
【参考文献】
相关期刊论文 前10条
1 周国军;程裕强;吴庆军;;基于Hadoop的并行朴素贝叶斯分类算法[J];玉林师范学院学报;2015年05期
2 贾卓然;李波;张明;;基于MapReduce的贝叶斯网络参数学习方法[J];计算机测量与控制;2015年09期
3 巴斌;郑娜娥;朱世磊;胡捍英;;利用蒙特卡罗的最大似然时延估计算法[J];西安交通大学学报;2015年08期
4 刘浩然;吕晓贺;李轩;李世昭;史永红;;基于Bayesian改进算法的回转窑故障诊断模型研究[J];仪器仪表学报;2015年07期
5 Xiaoyang Li;Mohammad Rezvanizaniani;Zhengzheng Ge;Mohamed Abuali;Jay Lee;;Bayesian optimal design of step stress accelerated degradation testing[J];Journal of Systems Engineering and Electronics;2015年03期
6 陈骥群;赵书强;马燕峰;胡永强;;贝叶斯网络法与盲数相结合的配电网可靠性评估[J];电力自动化设备;2015年06期
7 马宾;殷立峰;;一种基于Hadoop平台的并行朴素贝叶斯网络舆情快速分类算法[J];现代图书情报技术;2015年02期
8 殷陶;;贝叶斯网络结构学习研究[J];电子设计工程;2014年17期
9 张野;于湛麟;;基于贝叶斯网络的网络品牌认知度评价方法[J];计算机技术与发展;2014年05期
10 郭童;林峰;;基于混合遗传鱼群算法的贝叶斯网络结构学习[J];浙江大学学报(工学版);2014年01期
相关硕士学位论文 前2条
1 王昊;基于改进贝叶斯网络的气象数据预测算法研究[D];太原理工大学;2016年
2 董庆辉;HADOOP环境下分类算法研究[D];辽宁工业大学;2016年
,本文编号:1838434
本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1838434.html