当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于地理位置信息的关联规则挖掘研究与应用

发布时间:2019-06-27 09:49
【摘要】:由于信息化产业的迅猛发展,社会各领域产生的数据呈现指数级增长,而膨胀的数据中却蕴含了无穷的信息亟待人们去探索,此时,诞生了一门新的科学技术,数据挖掘与知识发现,旨在通过有效方法在大量数据中发现有价值的信息。如今,数据挖掘的飞速发展使得多种数据挖掘算法可以应用到多个领域。地理位置信息的研究与应用自从搜索引擎的发展开始越来越受到人们的关注,随着移动互联网的热潮再一次得到了迅猛的发展,针对地理位置信息与其相关联的物理信息进行数据挖掘的研究也被大量展开。本文基于北京市开放地理标志数据集,分别选取了以北四环和南四环为核心10公里范围内的地标数据点,针对北四环附近房价较高且人口密集的实际情况,通过数据挖掘的相关技术,对北四环附近的地理位置信息规则进行挖掘,试图探究不同类别的城市标志对于地区房价乃至繁荣程度的影响规则,并对其进行合理有价值的科学解释。结合本文中地理位置信息的特殊性,传统的关联规则挖掘方法并不能够适用。因此,在本文的实验中,通过对关联规则挖掘的数据预处理和挖掘具体流程进行研究和改进,最终实现了挖掘结果的生成和解释,主要概括如下:(1)引入聚类算法,将存放地理位置信息的关系型数据库转化为布尔型数据库。由于地理位置信息都是离散的坐标点,无法作为Apriori算法输入的布尔型事务数据库,本文引入了基于密度的聚类算法,对表示地理位置信息的点集合进行聚类,将得到的簇作为一条事务,生成布尔型数据库。(2)改进关联规则挖掘算法的事务集划分方法。针对事务集容量过大导致数据库庞大进而致使挖掘效率低下、算法速度缓慢的问题,本文对事务集划分方法进行了改进,对关联规则挖掘步骤进行了粗细粒度的拆分,并结合FP-Tree算法提升关联规则挖掘的整体效率,有效提升了算法的运行速度,提高了算法发现有效关联规则的能力。(3)对关联规则挖掘引入扩展信息。结合地理位置信息距离的特性,本文依据挖掘结果,将距离信息扩展到地标类型中,对扩展挖掘结果进行更深层次的解释。
[Abstract]:Because of the rapid development of the information industry, the data generated in various fields of the society has increased exponentially, while the expanded data contains the infinite information which needs to be explored. At this time, a new science and technology, data mining and knowledge discovery are born. To find valuable information in a large amount of data through an effective method. Nowadays, the rapid development of data mining makes a variety of data mining algorithms to be applied to many fields. The research and application of the geographical location information has been more and more concerned since the development of the search engine, and with the rapid development of the upsurge of the mobile Internet, the research of data mining aiming at the geographical position information and its associated physical information has also been carried out. Based on the data set of Beijing open geographic mark, this paper selects the landmark data points in the range of 10 km from the north four rings and the south four rings respectively, and aims at the actual situation of high house prices in the vicinity of the north four rings and the population density, and through the related technology of data mining, The paper excavates the rules of the geographical location information in the vicinity of the four rings, and tries to explore the influence rules of different types of city marks on the price and the degree of prosperity of the region, and to make a reasonable and valuable scientific explanation. The traditional association rule mining method can not be applied in combination with the particularity of the geographical location information in this paper. Therefore, in the experiment of this paper, through the research and improvement of the data pre-processing and mining specific process of the association rule mining, the generation and interpretation of the mining results are finally realized, and the following are mainly summarized as follows: (1) the clustering algorithm is introduced, And the relational database for storing the geographical position information is converted into a Boolean type database. Because the location information is a discrete coordinate point, the Boolean-type transaction database can not be input as the Apriori algorithm, and a clustering algorithm based on density is introduced, and the cluster of points representing the geographical position information is clustered, and the obtained cluster is used as a transaction to generate the Boolean-type database. And (2) improving the transaction set dividing method of the association rule mining algorithm. Aiming at the problem that the size of the transaction set is too large to cause the database to be large and the mining efficiency is low and the algorithm is slow, the method for dividing the transaction set is improved, and the separation of the coarse and fine granularity is carried out on the association rule mining step, And the overall efficiency of the association rule mining is improved in combination with the FP-Tree algorithm, the running speed of the algorithm is effectively improved, and the ability of the algorithm to discover the effective association rules is improved. (3) the extension information is introduced to the association rule mining. According to the characteristics of the distance of the geolocation information, this paper extends the distance information to the landmark type according to the mining results, and makes a deeper explanation of the extended mining results.
【学位授予单位】:北京工业大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP311.13


本文编号:2506690

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2506690.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户a0c06***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com