油田海量数据挖掘技术研究及应用
发布时间:2018-05-12 11:19
本文选题:数据挖掘 + 属性约简 ; 参考:《东北石油大学》2017年硕士论文
【摘要】:近年来,数据挖掘技术在多个领域取得广泛的应用。它在处理知识发现、海量数据分析方面拥有其他技术无可比拟的优势。油田积累了海量的生产数据,在这些数据中存在着一些隐含的规律,由于人工分析数据的能力有限,所以很难发现它们,而数据挖掘技术恰好可以弥补这个不足。本文尝试应用数据挖掘技术来对油田产量进行分析和预测。本文首先确定数据挖掘技术在油田产量预测中应用的技术路线,对数据挖掘技术中与数据预处理、数据分类和数据预测相关的算法进行研究,主要内容有:1、对粗糙集理论中的生产数据属性约简算法进行优化。应用属性的依赖度和重要性来描述属性的权重,并以此作为粒子群算法初始群体的选取标准,缩小解空间的搜索范围,最后引入细菌觅食算法的迁移和趋向性操作来完成算法的局部搜索功能,提高属性约简过程中求取最优约简结果的寻优能力,从而得到最优的生产属性约简结果;2、利用数据库管理系统和基于C#的嵌入式SQL,直接在生产数据库中对生产数据进行查询操作,弥补C4.5算法不能对海量数据进行分类的不足,同时,利用Fayyad边界点判定定理,解决C4.5算法选取最优阈值比较耗时的问题,提高C4.5算法的执行效率,当生产数据库中的样本增多时不会影响算法的执行效率和分类准确性,使其具有更好的适应能力;3、应用组合预测方法,对油田产量这个受诸多因素影响的复杂变量进行预测。首先采用多元线性回归预测方法对变量进行显著性检验,保留显著性高的变量,再结合基于ARMA时间序列的分析方法对保留后的变量进行预测,最后通过神经网络建立产量综合预测模型,从而提高预测的精准度;4、以上述改进的数据挖掘算法为技术基础,在Windows7操作环境下运用Microsoft Visual Studio 2010编程软件、Oracle10g数据库及其管理系统和基于C#的嵌入式SQL语句,研发一种C/S架构的油田产量分析决策支持系统。最后通过实际生产数据对系统进行测试,验证该系统满足油田生产决策的需求。
[Abstract]:In recent years, data mining technology has been widely used in many fields. It has unparalleled advantages in dealing with knowledge discovery and mass data analysis. The oil field has accumulated massive production data, there are some hidden laws in these data, because of the limited ability of manual analysis data, it is difficult to find them, and data mining technology can make up for this deficiency. This paper attempts to use data mining technology to analyze and predict oilfield production. This paper first determines the technical route of application of data mining technology in oil field production prediction, and studies the algorithms related to data preprocessing, data classification and data prediction in data mining technology. The main content is: 1. Optimize the attribute reduction algorithm of production data in rough set theory. The dependency and importance of attributes are applied to describe the weight of attributes, which is used as the criterion of selecting initial population of PSO, and the search range of solution space is reduced. Finally, the migration and orientation operation of bacterial foraging algorithm is introduced to complete the local search function of the algorithm, and to improve the ability of finding the best reduction result in the process of attribute reduction. In order to get the optimal result of attribute reduction of production, we use database management system and embedded SQL based on C # to query the production data directly in the production database, and make up the deficiency that C4.5 algorithm can not classify the massive data. At the same time, using the Fayyad boundary point decision theorem, the problem of how to select the optimal threshold value of C4.5 algorithm is solved, and the execution efficiency of C4.5 algorithm is improved. When the number of samples in the production database increases, the execution efficiency and classification accuracy of the algorithm will not be affected. In order to make it more adaptable, the combined forecasting method is used to predict the complex variable, which is affected by many factors. First, the multivariate linear regression prediction method is used to test the significance of the variables, and the variables with high significance are retained. Then, combined with the analysis method based on the ARMA time series, the reserved variables are predicted. Finally, the comprehensive prediction model of production is established through neural network, so as to improve the accuracy of prediction. Based on the improved data mining algorithm mentioned above, Using Microsoft Visual Studio 2010 programming software Oracle10g database and its management system and embedded SQL language based on C #, a C / S structure decision support system for oil field production analysis is developed in Windows7 operating environment. Finally, the system is tested by actual production data to verify that the system meets the requirements of oilfield production decision.
【学位授予单位】:东北石油大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TE319;TP311.13
【参考文献】
相关期刊论文 前10条
1 蔡照鹏;王永皎;韩正亮;;ID3算法改进及其在分析商品价格波动因素中的应用[J];河南城建学院学报;2016年06期
2 李志豪;;基于离散粒子群算法的粗糙集属性约简[J];工业控制计算机;2016年11期
3 陈海燕;刘晨晖;孙博;;时间序列数据挖掘的相似性度量综述[J];控制与决策;2017年01期
4 缪长生;张晨阳;李振华;黄兰;魏华;;塔里木油田产量预测方法探讨[J];中国管理信息化;2016年22期
5 路,
本文编号:1878417
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1878417.html