东南亚若干首都城市街景图像汉字检测与时空分布分析
发布时间:2017-12-27 03:17
本文关键词:东南亚若干首都城市街景图像汉字检测与时空分布分析 出处:《南京大学》2017年硕士论文 论文类型:学位论文
更多相关文章: 东南亚 汉字空间分布 街景图像 文字检测 一带一路
【摘要】:"一带一路"建设主要内容是实现沿线各国的"五通",即政策沟通、设施连通、贸易畅通、资金融通、民心相通。系统、有效、定量地评价"五通"建设的基本现状,为科学决策、区域合作等提供重要的信息参考和数据支撑。"五通"的基础是"语言互通",文字是语言的重要组成部分,"一带一路"沿线国家的汉字使用情况,能够有效反映不同国家与我国最真实的交流情况,反映"互联互通",特别是能够直观了解沿线国家与我国在民心相通、文化相通等方面的现状。东南亚是"一路"重点区域,从空间的角度定量研究东南亚汉字空间分布,能为"一带一路"沿线国家汉字空间分布研究提供应用示范。传统数据获取手段,难以获取大范围、空间化的汉字空间分布信息。街景地图能展示街道立面细节,包括城市内文字使用的情况,且具有有地理位置、覆盖范围广、用户可免费获取的特点,为汉字空间信息获取提供了数据支撑。自然图像文字检测相关算法已较为成熟,但从多语言自然图像中检测汉字的研究不足。受制于数据获取的制约,汉字空间分布相关研究尚且空白,如何科学系统地对汉字空间分布分析及评价值得研究。针对汉字空间分布信息获取困难且相关空间分布研究不足,建立基于街景图像汉字空间分布信息获取技术流程,并构建一套科学的汉字时空分布分析及评价体系。研究主要内容包括:(1)街景图像汉字检测。基于街景地图,提出"数据获取——文字检测——汉字判别"的街景图像汉字检测算法流程。利用网络数据获取技术,采集东南亚若干首都城市具有地理坐标的街景图像。依据街景图像的特点及文字检测的技术难点,利用连接文本建议网络检测、改进的最大稳定极值区域、基于笔画宽度这三种方法对街景图像进行文本行检测并对比结果,根据准确率和召回率指标选择满足要求的算法检测结果作为汉字判别的数据源。最后,通过分析汉字的特点及与其他文字的区别,提出基于字符分割、字符特征计算的汉字判别方法,获取东南亚若干首都城市汉字空间分布点数据。(2)汉字空间分布特征分析。基于街景图像中解译出的汉字空间分布点数据,通过数理统计分析东南亚若干首都城市汉字分布数量、密度、人均数量差异;通过空间分析手段探索东南亚若干首都城市内汉字空间分布特征,包括分布主方向、空间聚集度、空间均衡度。对汉字分布与道路网络中心进行相关分析,研究不同城市内汉字所处区位优势。引入中心地理论对各城市内部汉字标牌空间辐射范围及能力进行计算和评估。系统了解汉字在东南亚若干首都城市的空间分布状况,并对东南亚若干首都城市内汉字分布特征进行横向对比。(3)新加坡汉字分布时序变化分析。对新加坡2008年、2013年、2015年间汉字空间分布变化分析,统计新加坡中心城区与各区县汉字分布数量、密度变化情况;利用空间分析技术从分布主方向及重心、集聚度、均衡度、区位优势度及空间辐射度等角度,研究其在两个时间段、三个时间节点上汉字分布的时空变化特征,揭示汉字空间分布变化规律和区域差异。研究结果表明:(1)东南亚7个首都城市中,汉字分布数量最多、密度最高的吉隆坡,分布最少的是雅加达。7个城市汉字均呈现集聚分布特征,吉隆坡中心城区汉字分布集聚性最高,曼谷中心城区汉字分布集聚程度最低。金边汉字分布空间均衡性最好,马尼拉最差。7个城市汉字均主要分布在居民服务类道路,且与道路网络中心性正相关,区位优势度最强为金边,最低的为雅加达。就汉字空间辐射能力而言,金边中心城区汉字辐射能力近乎全覆盖;雅加达中心城区汉字辐射覆盖程度最低,汉字对金边中心城区居民的影响力最强,对雅加达影响力最弱。(2)2008到2015年间,新加坡汉字标牌数量在逐年增加,主要分布在中环区和加冷区,各区面密度均有所提高,核密度高值区向中环区移动,整体分布重心向西南方向移动。新加坡汉字标牌分布空间聚集性基本不变,空间均衡度略微下降。新增汉字标牌主要分布在居民类道路,整体空间辐射范围大幅增加,城市中居民接触汉字的机会增加,汉字在新加坡的影响力增强。本文对东南亚若干首都城市汉字空间分布进行了多角度的分析和探讨,取得了较好研究成果,但论文也存在一些不足。基于字符特征的汉字判别对于日语或字符特征不明显的拼音形文本无法完全剔除,且未对文本内容识别,如何提高汉字判别有效性和识别文字内容有待进一步研究。此外,本文仅从空间、路网的角度对汉字空间分布进行分析,对其空间分布差异的内在原因研究不足,在后续研究中,可以引入唐人街、商业中心、华对外投资、政策影响、当地华族人变迁等因素,研究东南亚汉字分布差异机制。
[Abstract]:The main contents of The Belt and Road construction is the realization of the countries along the "five links", that is the policy of communication, communication facilities, trade flow, capital circulation, the people connected. Systematic, effective and quantitative evaluation of the basic status of the "five links" construction, providing important information reference and data support for scientific decision-making and regional cooperation. "Five" is the "language exchange", the text is an important part of language, "Chinese characters use The Belt and Road along the country, can effectively reflect the different countries and China's real exchange, reflecting the interoperability, especially in present people connected, cultural similarities. The intuitive solution along the countries and china. Southeast Asia is "all the way" key areas, from the perspective of spatial quantitative study on the spatial distribution of Chinese characters for Southeast Asia, "The Belt and Road along the country Chinese characters on the spatial distribution of application demonstration. It is difficult to obtain the spatial distribution information of large and spatial Chinese characters by means of traditional data acquisition. Street map can show the details of street facade, including the use of characters in the city, and has the characteristics of location, wide coverage and free access to users. It provides data support for the acquisition of Chinese spatial information. The algorithm of natural image text detection is more mature, but the research of Chinese character detection from multi language natural images is not enough. Subject to the constraints of data acquisition, the research on the spatial distribution of Chinese characters is still blank. How to analyze and evaluate the spatial distribution of Chinese characters scientifically and systematically is worth studying. Aiming at the difficulty of obtaining the spatial distribution information of Chinese characters and the lack of related spatial distribution, a technological process for acquiring the spatial distribution information of Chinese characters based on streetscape images is established, and a set of scientific analysis and evaluation system for Chinese characters spatio-temporal distribution is constructed. The main contents of the research include: (1) the detection of Chinese characters in the street view image. Based on the street view map, a Chinese character detection algorithm of "data acquisition - text detection - Chinese character discrimination" is proposed. The network data acquisition technology is used to collect street view images of some capital cities in Southeast Asia with geographical coordinates. According to the technical characteristic and the difficulty in detecting text Street image, the connected text proposal network detection, improved maximally stable extremal region, based on the stroke width of the three methods and compare the results of text line detection scene image, according to the accuracy and recall rate index selection algorithm to meet requirements of test results as the data source Chinese characters identification. Finally, by analyzing the characteristics of Chinese characters and the difference from other characters, a Chinese character recognition method based on character segmentation and character feature calculation is proposed to get the data of Chinese character spatial distribution in some Southeast Asian capital cities. (2) analysis of the spatial distribution characteristics of Chinese characters. The distribution of data points out the image of street scene in space based on Chinese characters, through the statistical analysis of the capital city of Southeast Asia Chinese characters distribution and density, the per capita quantity difference; through spatial analysis means to explore the characteristics of spatial distribution of the capital city of Southeast Asia Chinese characters, including the distribution of the main direction, spatial aggregation and spatial equilibrium. The relationship between the distribution of Chinese characters and the center of road network is analyzed, and the advantages of Chinese characters in different cities are studied. The center theory is introduced to calculate and evaluate the space radiation range and ability of Chinese character signs in each city. The systematic understanding of the spatial distribution of Chinese characters in some capital cities in Southeast Asia and the horizontal comparison of the characteristics of Chinese characters in some capital cities in Southeast Asia. (3) the analysis of the temporal variation of the distribution of Singaporean characters. Analysis of changes in Singapore in 2008, 2013, 2015 years Chinese characters of spatial distribution, Statistics Singapore City Center and the county Chinese characters distribution and quantity density changes; using the technique of spatial analysis from the distribution of the main direction and focus, concentration, equilibrium degree, location advantage and space radiation angles, on the two time and the three time nodes distribution characteristics of temporal and spatial variation of Chinese characters, Chinese characters reveal the pattern of the changes in spatial distribution and regional differences. The results show that: (1) 7 in the capital city of Southeast Asia, the largest number of the highest density distribution Chinese characters, Kuala Lumpur, distribution is the least in Jakarta. The characters of Chinese characters in the 7 cities all have the characteristics of concentration distribution. The distribution of Chinese characters in the central city of Kuala Lumpur is the highest, and the distribution of Chinese characters in the central city of Bangkok is the lowest. The distribution of Chinese characters in Phnom Penh is the best in spatial distribution, and the worst in Manila. 7 city residents are mainly distributed in the Chinese characters of road service, and it is related to the road network center, location advantage is the strongest in Phnom Penh, Jakarta is the lowest. It Chinese characters of space radiation ability, Phnom Penh City Center radiation almost full coverage of Chinese characters; Chinese characters of Jakarta city center radiation coverage and the lowest level of residents in the center of the city of Phnom Penh Chinese characters had the strongest influence, the weakest influence on Jakarta. (2) from 2008 to 2015, the number of Chinese signs in Singapore increased year by year, mainly in the central area and the cold area. The density of each area increased. The high density area of the nuclear power moved to the central area, and the center of gravity moved southward. The spatial aggregation of the Chinese character signs in Singapore is basically unchanged, and the spatial equilibrium is slightly decreased. The newly added Chinese character signs are mainly distributed on residential roads. The radiation scope of the whole space has increased significantly, and the opportunities for Chinese residents to touch Chinese characters have increased. The influence of Chinese characters in Singapore has increased. This paper analyzes and explores the spatial distribution of Chinese characters in some capital cities of Southeast Asia, and achieves good results, but there are still some shortcomings. Chinese character recognition based on character features can not be completely eliminated for Japanese or character based Pinyin text, but not for text content recognition. How to improve the validity of Chinese character recognition and identify text content needs further research. In addition, this article only
【学位授予单位】:南京大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:P208;H12
【相似文献】
相关期刊论文 前8条
1 谭学厚;;热学教学软件的汉字化[J];南京大学学报(自然科学版);1986年04期
2 潘以锋;计算机在汉字自动注音中的应用[J];上海师范大学学报(自然科学版);1996年04期
3 李行健;汉字的规范和改革[J];百科知识;1994年07期
4 康言午;;新世纪呼唤汉字的完整解决方案[J];科学新闻;2003年05期
5 ;“黄钟信息”:开创汉字信息产业的新境界[J];华东科技;2004年Z1期
6 吴佑寿;汉字计算机自动识别研究的进展[J];科学通报;1991年04期
7 张p樦,
本文编号:1340024
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/1340024.html