基于Spark的粒子群算法并行编程及其在水库调度中的应用

发布时间：2018-10-29 23:01

【摘要】：黑河流域水资源数据量庞大,数据关系复杂,应用常规技术进行优化调度难度较大。本文针对黑河流域当前存在的水资源调度问题,应用大数据处理与进化计算技术,基于大数据计算框架Spark与并行化的粒子群算法,研究开发了黑河流域水库群多目标优化调度系统。在研究开发过程中,作者分析了黑河流域水资源优化调度系统多个目标的特点,将多个目标转化成单个目标,得到该问题的求解模型。接着研究了并行算法编程模型、用于优化问题求解的粒子群算法及其并行化策略,以及基于Spark大数据计算框架的粒子群算法并行化方法。在理论与技术研究的基础上,以Hadoop2.7.1、Sparkl.5.2、Spark on Yarn软件为基础,搭建了大数据支持平台;在该平台的分布式文件系统HDFS中存储了获取的黑河流域水资源数据。接着在Ubuntu Linux操作系统环境、Spark平台下,应用Scala语言开发了基于粒子群算法的水库群多目标优化调度并行程序,实现了可处理水库调度大数据、可高速运行调度优化程序的水库群多目标优化调度系统。本调度系统的数据载入、程序运行、结果查询等操作都是在Ubuntu Linux操作系统与Spark平台、界面下进行,对于不熟悉Spark运行机制的普通使用者来说,使用困难极大。为了解决此问题,我们还开发了一个水库群多目标优化调度系统应用平台,实现了大数据的上传、下载、删除、查询,以及需要Spark大数据平台处理的应用程序任务的提交运行和SQL查询等功能。本课题的研究开发工作,将为水资源优化调度系统的高效运行起到积极的促进作用,对于基于Spark平台的大数据并行编程的发展与应用也有很好的参考价值。
[Abstract]:Because of the huge amount of water resources in Heihe River Basin and the complex data relationship, it is difficult to apply conventional technology to optimal dispatching. In this paper, aiming at the problem of water resources scheduling in Heihe River Basin, big data processing and evolutionary computing techniques are applied to solve the problem, and a parallel particle swarm optimization (PSO) algorithm is proposed based on big data computational framework (Spark). A multi-objective optimal dispatching system for reservoirs in Heihe River Basin is developed. In the course of the research and development, the author analyzes the characteristics of the water resources optimal dispatching system in Heihe River Basin, transforms the multiple objectives into a single target, and obtains the solution model of the problem. Then the parallel algorithm programming model, the particle swarm optimization algorithm and its parallelization strategy are studied, and the parallelization method of particle swarm optimization algorithm based on Spark big data computing framework is also studied. On the basis of theoretical and technical research, big data support platform is built on the basis of Hadoop2.7.1,Sparkl.5.2,Spark on Yarn software, and the acquired water resources data in Heihe River Basin is stored in the distributed file system (HDFS) of the platform. Then under the Ubuntu Linux operating system environment and the Spark platform, the parallel program of multi-objective optimal operation of reservoir group based on particle swarm optimization algorithm is developed by using Scala language, and big data of reservoir dispatching can be processed. Multiobjective optimal dispatching system for reservoir groups with high speed operation optimization program. The data loading, program running and result querying of this scheduling system are all carried out under the Ubuntu Linux operating system and Spark platform, interface. It is very difficult for the common users who are not familiar with the running mechanism of Spark. In order to solve this problem, we have also developed an application platform of multi-objective optimal dispatching system for reservoir groups, which realizes big data's uploading, downloading, deleting and querying. As well as the Spark big data platform to handle the application task submission run and SQL query and other functions. The research and development of this subject will play a positive role in promoting the efficient operation of water resources optimal dispatching system, and it will also have a good reference value for the development and application of big data parallel programming based on Spark platform.
【学位授予单位】：西安理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP18;TV697.1

【参考文献】