互联网网页蕴含高动态交通信息的实时搜索与语义理解技术研究
发布时间:2018-02-26 01:32
本文关键词: 实时搜索 交通信息 网络爬虫 自然语言理解 出处:《浙江工业大学》2014年硕士论文 论文类型:学位论文
【摘要】:随着互联网的飞速发展,人们迫切希望获取一种方法,能够根据自己的需求,高效、快速地从海量的互联网网页内容中搜索出有价值的实时交通信息。然而,大量自然语言描述的交通信息受自然语言理解技术的限制,难以被现在计算机系统直接利用。 本文专门针对互联网网页蕴含的动态交通信息,对实时搜索与语义理解的方法进行研究。首先采用网络爬虫技术,实时地从互联网网页(包括微博)上抓取实时交通信息,并根据已构建的词典内容对实时交通信息进行分词,然后把分词结果按已建好的规则库进行匹配,实现语义理解,最后通过实验进行实例验证。本文的主要工作和成果如下: 1.研究了互联网网页蕴含动态交通信息的实时搜索方法。针对互联网网页蕴含动态交通信息,从以下几个交通信息来源:官方网站、论坛和微博,分别采用不同的抓取方法进行交通信息的抓取,并保存到数据库中,为后续自然语言理解提供数据基础。 2.提出了面向交通信息的自然语言理解方法。针对交通信息的特征和语义理解的应用需求,采用改进的最大匹配分词算法,同时将具有定性、模糊特征的交通信息形式化概括成一个规范的参考模版。并从语义层面将实时交通信息与已有的模版规则进行匹配,从而解决了自然语言形式描述的交通信息难以被现有计算机系统直接理解和利用的问题。 3.分别对交通信息实时搜索与语义理解方法进行实现。针对交通信息的不同来源,验证了本研究中所提出方法的正确性和有效性。 本文对面向交通信息的搜索与语义理解技术进行了深入的理论研究,并通过实验验证了方法的正确性与效率,解决了自然语言描述的交通信息无法直接被现在计算机理解和利用的问题,具有一定的实际应用价值,为动态导航与位置服务提供重要的数据支撑。
[Abstract]:With the rapid development of Internet, people are eager to acquire a method, according to their needs, high efficiency, fast Internet search from massive web content in real-time traffic information. However, traffic information from natural language understanding technology limit the amount of natural language, is difficult to directly use the computer system now.
This paper specifically addresses the dynamic traffic information contained in web pages on the Internet, this paper studies a method of real-time search and semantic understanding. The web crawler technology, real time from the Internet (including micro-blog) captures the real-time traffic information, and according to the segmentation of the real-time traffic information is constructed in the dictionary, then the segmentation results are matched according to the rule base has been built, the realization of semantic understanding, finally through the experiment is verified. The main work and achievements are as follows:
1. to study the real-time search method of dynamic traffic information contains the Internet pages. According to the dynamic traffic information contained in the Internet ", from the following sources of traffic information: official website, forum and micro-blog, respectively, using different methods to capture traffic information capture, and save to the database, to provide data basis for the follow-up of natural language understanding.
2. this paper puts forward the theory of natural language understanding for traffic information. According to the application demand of traffic information features and semantic understanding, using the improved maximum matching algorithm, at the same time will have a qualitative, fuzzy feature of traffic information in a formal specification of the reference template. And from the semantic level of real-time traffic information and the a template rule for matching, so as to solve the traffic information described in the form of natural language is difficult to directly understand the existing computer system and use.
3., traffic information real-time search and semantic understanding methods are implemented respectively. Aiming at different sources of traffic information, the correctness and effectiveness of the proposed method is verified.
This paper makes a thorough theoretical study to search and semantic understanding technology of traffic information, and the accuracy and efficiency of the method was verified through experiments, to solve the traffic information described in natural language can not be directly absorbed by the computer to understand and use the problem now, has a certain practical value, provides important data for dynamic navigation and the location of the service.
【学位授予单位】:浙江工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前4条
1 文庭孝;汉语自动分词研究进展[J];图书与情报;2005年05期
2 周红;自然语言理解中的语义分析问题[J];滨州师专学报;2001年03期
3 张林曼;吴升;;地理编码系统中地名地址分词算法研究[J];测绘科学;2010年02期
4 陈传彬;陆锋;励惠国;王钦敏;;城市路网信息融合的关键技术[J];地球信息科学学报;2009年04期
,本文编号:1536005
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1536005.html