当前位置:主页 > 科技论文 > 软件论文 >

面向互联网中文地址的地理要素解析方法的研究

发布时间:2018-03-27 07:04

  本文选题:地理要素解析 切入点:条件随机场 出处:《武汉工程大学》2016年硕士论文


【摘要】:由于位置信息服务的推广与普及,越来越多的企业将自己所拥有的地址数据与软件功能相结合,生成能为人们提供便利的位置服务应用,如:手机地图App。此过程需要大量由自然语言所描述的中文地址映射到地理坐标上,使得能在电子地图上精准的定位,从而为人们的信息检索、查询以及定位服务提供便利。然而,互联网上获取的中文地址信息存在不完备,非标准化等问题,即这些地址数据没有按照地理要素等级进行组织。为了建立空间信息与非空间信息的精确映射,研究互联网所获取的中文地址的地理要素解析与标准化具有重要的应用价值。以网络爬虫获取的中文地址作为研究对象。首先,本文采用条件随机场的算法,该算法主要运用四字词位标注,并建立条件随机场的概率模型对地址中的地理要素进行解析。之后,本文采用一种基于多因子计算行政区划可信度的算法,该方法的主要目的在于识别地址地理要素中的行政区划部分,首先利用行政区划词典匹配出多个行政区划集,并给不同行政区划设定位置匹配因子,之后根据各个因子之间的相互关系,计算出不同行政区划的可信度,从而选取最优的行政区划结果。最后,本文采用了一种基于条件随机场的规则改进算法,即基于经验转移规则的地址解析算法,该算法能有效识别出中文地址中的行政区划与其他部分的地理要素,首先建立特征字库,依据标准地址语料库制定一个基于单字的经验转移矩阵,提取出地址串中的特征字,形成一个随机场,借助经验转移概率矩阵发现适用于地址要素解析的规则化表达,从而对待处理地址字符串进行地理要素的解析,由于该方法的特征字库包含的特征字有限,对于一些出现频率不高的特征字来说,并不能很好的进行判断。但对于含有特征字库中特征字的中文地址,该算法能高效的识别出其中的地理要素。本文中采用的三种算法分别用不同的地址库进行测试,并将最终结果进行横向与纵向对比。实验结果表明,多因子等算法具有较好的效果,能有效的划分出各个不同地理要素,为基于位置的应用开发奠定了基础。
[Abstract]:Because of the promotion and popularization of location information service, more and more enterprises combine their own address data with the function of software to create a convenient location service application for people. For example, mobile phone map App. this process requires a large number of Chinese addresses described in natural languages to be mapped to geographical coordinates to enable accurate location on electronic maps, thereby retrieving information for people. Query and location services are convenient. However, the Chinese address information obtained on the Internet is incomplete, non-standardized and so on. That is, these address data are not organized according to the level of geographical elements. In order to create an accurate mapping of spatial and non-spatial information, It is of great value to study the geographical element analysis and standardization of the Chinese addresses obtained by the Internet. The Chinese addresses obtained by the web crawlers are taken as the object of study. Firstly, the conditional random field algorithm is used in this paper. The algorithm mainly uses four words to annotate, and establishes the probabilistic model of conditional random field to analyze the geographical elements in the address. After that, this paper uses a multi-factor algorithm to calculate the credibility of the administrative division. The main purpose of this method is to identify the administrative division in the geographical elements of the address. Firstly, the administrative division dictionary is used to match several sets of administrative divisions, and the location matching factors are set for different administrative divisions. Then, according to the relationship between the factors, the credibility of different administrative divisions is calculated, and the optimal results are selected. Finally, an improved rule algorithm based on conditional random field is proposed in this paper. This algorithm can effectively identify the geographical elements of administrative divisions and other parts of Chinese addresses. According to the standard address corpus, an empirical transfer matrix based on a single word is made, and the characteristic words in the address string are extracted to form a random field, and the regularized expression suitable for the address element analysis is found by means of the empirical transition probability matrix. Therefore, the processing of address strings is analyzed by geographical elements. Because the feature words contained in the feature font of this method are limited, for some feature words that do not appear frequently, But for the Chinese address which contains the feature words in the feature font, the algorithm can identify the geographical elements efficiently. The three algorithms used in this paper are tested with different address base. Finally, the final results are compared horizontally and vertically. The experimental results show that the multi-factor algorithm has a good effect and can effectively divide different geographical elements, which lays the foundation for location-based application development.
【学位授予单位】:武汉工程大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1

【参考文献】

相关期刊论文 前10条

1 朱艳辉;刘t,

本文编号:1670498


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1670498.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户ccc2d***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com