空间数据最优点查询算法研究
发布时间:2018-03-20 08:39
本文选题:空间数据库 切入点:空间数据多层网格索引 出处:《浙江大学》2017年博士论文 论文类型:学位论文
【摘要】:近年来,随着互联网、移动通信以及感知定位技术的快速发展与应用,各种移动终端、图像视频采集设备以及社交平台产生了大量的地理位置等低维空间数据,和图像、文本等高维空间数据。这些数据具有海量、异构等特征,因而需要借助高效、准确的处理模型及算法,来挖掘其内在的信息与价值,从而支持相关行业进行更好的决策。因此,如何高效、准确地分析处理空间数据并挖掘其内在价值和信息,已成为当今计算机领域的重大前沿学术问题。在该领域问题中,空间数据最优点查询是近几年快速兴起的一类子问题。现有的空间数据查询算法在处理复杂决策分析问题时会出现低效和低准确度等不足,同时索引和查询算法还面临诸多难点和技术瓶颈。一方面,互联网、移动设备等每时每刻都在产生海量空间数据。这些海量数据是面向不同决策支持场景的,需要同时考虑数据点的变化特征、维度特征与度量空间特征等诸多因素。然而,目前国内外索引技术大多针对静态数据点,很少有考虑时间维度的索引,同时现有算法很难做到查询效率和查询准确性的兼顾。另一方面,针对不同的决策支持场景,最优点查询问题本身是复杂多变的。现有的(空间)最优点查询方法大多针对某一种特定的目标方程进行算法优化,因此扩展性差,执行效率低,尤其是在处理高维或者路网数据集,以及在处理组合最优点查询问题时,现有算法的准确性和效率都随着数据集复杂度的增加而急剧下降。基于上述两点对国内外研究现状和趋势的分析,本文研究并提出了面向不同应用背景的高效的最优点查询算法以支持不同的分析与决策场景。主要包括:1.研究提出了欧氏空间下单一最优点查询算法,针对现有双色反向最近邻数目最大化问题在大数据场景下的算法效率低和可扩展性差的不足,结合计算几何等领域的经典算法,提出了空间扫描策略、空间划分和上限估计策略以及基于“弧重叠”的影响值计算方法。实验结果证明,提出的算法相比于已有算法,有更高的效率,更低的内存占用率。2.研究提出了三维空间下单一最优点查询算法,首次解决了三维空间下双色体反向最近邻最大化问题。针对问题本身的空间特性和对索引效率的需求,提出了基于“细粒度”空间划分和上限估计的三维空间最优点查询算法,算法具有高效准确的特点。并且,证明了提出的算法扩展到更高维空间的可行性,对于空间数据最优点查询问题在高维空间下的研究有重要的意义。3.研究提出了考虑服务设施点容量限制的空间最优点查询算法,首次解决了服务点容量受限场景下的空间数据最优点查询问题。通过研究空间数据库以及运筹学领域的带有容量限制的服务设施点选址查询问题,并分析现有算法的应用场景局限性,最后提出了基础搜索算法、基于渐进式算法的搜索算法以及基于空间剪枝的搜索算法。实验证明提出的算法能准确查询空间最优点,并给出准确的服务设施点和数据点之间的分配方案。4.研究提出了欧氏和路网空间中基于聚类算法的组合最优点查询算法。通过研究数据挖掘中的各种聚类方法,以及算法导论领域解决组合优化问题的各种经典近似算法,首次提出了基于聚类算法的组合最优点查询方案。提出的算法可以准确返回一组最少的最优位置点来覆盖所有数据点。相比于传统的组合最优点查询算法,具有更高的效率及更高的准确度。同时,首次将算法扩展到了路网数据空间,提出了高效的路网空间距离计算策略。本文研究内容主要围绕空间数据最优点查询问题,研究了二维三维欧氏空间中的单个最优点查询问题,并扩展到带有容量限制的场景,以及欧氏及路网空间中的组合最优点查询问题。提出的算法能够更高效、更正确地处理市场决策支持等问题。未来的研究方向包括动态场景下及高维空间和度量空间下的最优点查询问题:主要问题是将组合最优点问题扩展到高维空间(图片,文本)以及度量空间;以及研究利用分布式处理框架来处理空间数据最优点查询问题,使提出的查询算法在大规模数据集上有更好的性能。
[Abstract]:In recent years, with the rapid development and application of Internet, mobile communication and location aware technology, various mobile terminals, video image acquisition equipment and social networking platform to produce a large number of geographical position and low dimensional data, and images, text and other data in high dimensional space. These data are massive, heterogeneous characteristics, which need the help efficient, accurate processing model and algorithm, to explore the inner information and value, so as to support related industries to make a better decision. Therefore, how to efficiently and accurately analyze the processing of spatial data mining and its intrinsic value and information, has become a major academic problem in computer field. In the field of space. The advantages of data query is a subclass in recent years the rapid rise of the problem. The algorithm in the analysis of problems of complex decision process will appear inefficient and low quasi existing spatial data query The accuracy and other issues, at the same time indexing and query algorithm still faces many difficulties and technical bottlenecks. On the one hand, the Internet, mobile devices at all times produce huge amounts of spatial data. These data support for different decision-making scenarios, taking into account the needs of the changing characteristics of data points, many factors dimensions and metric space characteristics however, at home and abroad are mostly based on the static data indexing technology, seldom consider the time dimension index, while the existing algorithm is difficult to achieve both efficiency and accuracy of query query. On the other hand, according to the different decision support scenarios, the advantages of query problem itself is complicated. Most existing (space) most of the advantages of the query for a particular kind of goal equation algorithm optimization method, it has poor scalability, low efficiency, especially in dealing with high dimensional data set or network, And the problems in dealing with the combination of the advantages of the query, the accuracy and efficiency of the existing algorithms with the increase of the complexity of the data set decreases. Analysis of the above two research status at home and abroad and based on the trend, this paper studies and presents the advantages of different application background for efficient query algorithm to support different scene analysis and the decision mainly includes: 1. research and put forward the Euclidean space of a single optimal query algorithm, aiming at the existing reverse nearest neighbor number maximization problem in large data scenarios of the algorithm of low efficiency and poor scalability, combined with the classical algorithms of computational geometry and other fields, put forward the strategy of spatial scanning, and space division the upper limit estimation strategy and based on the "arc overlap effect value calculation method. The experiment results prove that the proposed algorithm compared to the existing algorithm, more efficient, more low The memory occupancy rate.2. study presents the three-dimensional space of a single point query algorithm, for the first time to solve the three-dimensional space under the double color reverse nearest neighbor maximization problem. According to the characteristics of the problem itself and on the index of efficiency, based on "fine-grained" space partition and upper bound estimate of the three-dimensional space has the advantage of query algorithm the algorithm has the advantages of high efficiency, accurate. And the result shows that the proposed algorithm is extended to the feasibility of higher dimensional space, the research for spatial data query has the problem in high dimensional space under the study of the importance of.3. considering the facility capacity constraints of the space the advantages of query algorithm, for the first time to solve the spatial data point of service capacity constrained scenarios of the most advantages of query problem. Through the study of spatial database and operational research with limited capacity in the field of facility location check Question, and application scenario analysis of existing algorithm limitations, finally proposed based search algorithm, search algorithm based on the incremental algorithm and search algorithm based on spatial pruning. Experiments show that the proposed algorithm can accurately query space the advantages, and provide accurate service allocation scheme in.4. between data points and the service facilities the combination of clustering algorithm based on the advantages of query algorithm and Euclidean space. The network through a variety of clustering methods in data mining research field, and introduction to algorithms solve various classical combinatorial optimization problems of the approximate algorithm is proposed for the first time, combined clustering algorithm has the advantage of query scheme based on the proposed algorithm can accurately return a group. The optimal position to cover all the data points. The advantages compared to the traditional combined query algorithm, has higher efficiency and higher accuracy. At the same time, the algorithm is extended to the network data space, proposed network space distance efficient calculation method. This paper mainly focuses on the advantages of spatial data query, on a single two-dimensional Euclidean space and the advantages of query, and extended to the capacitated scene, and Euclidean and road space combination the most advantage of the query problem. The proposed algorithm can more efficient and more accurate handling market decision support and other issues. Future research directions include query optimal dynamic scenes and high dimensional space and metric space: the main problem is the combination of the most advantages extended to higher dimensional space (picture and text) metric space; and based on a distributed processing framework to deal with the advantages of spatial data query, the query algorithm is better in large-scale data sets on the Performance.
【学位授予单位】:浙江大学
【学位级别】:博士
【学位授予年份】:2017
【分类号】:TP311.13
【参考文献】
相关期刊论文 前2条
1 Jos C.Pereira;Fernando G.Lobo;;An Optimized Divide-and-Conquer Algorithm for the Closest-Pair Problem in the Planar Case[J];Journal of Computer Science & Technology;2012年04期
2 赵元;张新长;康停军;;并行蚁群算法及其在区位选址中的应用[J];测绘学报;2010年03期
,本文编号:1638317
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1638317.html