基于Lucene的无线城市站内全文搜索系统的设计与实现
发布时间:2018-05-27 09:38
本文选题:无线城市 + 全文搜索 ; 参考:《北京邮电大学》2013年硕士论文
【摘要】:随着移动互联网以及无线接入技术的迅猛发展,无线城市为市民、企业和政府提供了共享无线信息平台的可能性,随时随地随需的获取无线网络服务已成为现代生活和办公的一个非常重要的方面,无线城市作为新科技社会发展的产物,将极大的影响人们的生活和工作,推动经济社会的进一步发展。无线城市协调统一各类信息资源,整合各类业务和应用,为用户提供了在具有海量信息和应用的互联网查找信息资源的便利平台,但是纵观已经试点运营的各省市无线城市综合门户网站,却缺少能够让用户在海量信息应用中快捷找到所需资源的全文搜索入口,不能满足用户深入、快速、准确的查询需求,已有的无线城市站内搜索功能只是对站内的新闻资讯进行检索,并不符合全文搜索的概念。 基于上述背景,本文提出一个无线城市站内全文搜索系统的设计和实现方案,帮助无线城市用户快速准确地在海量的信息资讯和应用资源中找到自已需要的信息或者应用入口。针对无线城市作为城市综合门户平台的特点,包含的信息资讯新闻等非常广泛,本文在基于关键字全文搜索的基础上对搜索结果进行优化,把搜索结果基于应用和信息资讯自动分类,使得用户通过简单的关键字搜索,可以快捷进入查找的业务应用或者查看包含关键字的具体内容,进而可以极大提高无线城市用户群的使用体验,进一步提高各行各业的生产效率。 本文构建的全文搜索系统是对全文搜索引擎Lucene的二次开发,通过调研目前已上线无线城市的情况,总结现有无线城市站内全文搜索系统的缺点,提出本文的设计目标并完成系统总体结构设计,在此基础上分模块进行详细需求分析和功能分析给出功能流程图和实现过程;针对无线城市面向领域广泛新词出现频率高的特点,在对已有中文分词算法研究的基础上,将机械分词与统计分词相结合,提出一种引入动态词库更新的中文分词架构;改进Lucene建立索引的过程,实现配置化建立索引的机制;设计实现了索引增量更新,以保证索引库和无线城市业务数据库的同步和一致性;搜索过程中利用Term Vector中的信息设计实现搜索结果的自动分类;系统展现部分最终提供给用户一个简洁并具有良好用户体验的搜索界面,通过搜索词联想功能对用户的搜索进行相关提示。 最后,论文对基于Lucene构建的无线城市站内全文搜索系统的运行效果给出整体运行结果,对完成的研究工作进行总结,并提出无线城市下一步发展的方向和系统的改进目标。
[Abstract]:With the rapid development of mobile Internet and wireless access technology, wireless cities provide the possibility of sharing wireless information platform for citizens, enterprises and governments. Access to wireless network services at anytime and anywhere has become a very important aspect of modern life and office. The wireless city is the product of the social development of new technology. It will greatly influence people's life and work and promote the further development of the economy and society. Wireless cities coordinate and unify various information resources, integrate various kinds of business and applications, provide users with a convenient platform for searching information resources with massive information and application of the Internet, but look at the cities and cities that have been pilot operating in wireless cities. The comprehensive portal website, but lacks the full text search entrance which can allow the user to find the resource quickly in the mass information application, can not satisfy the user deep, fast, accurate query demand. The existing wireless city station search function only checks the news information in the station, and does not conform to the concept of full text search.
Based on the above background, the design and implementation of a full-text search system in the wireless city station is proposed in this paper to help the wireless city users find their own information or application entrance quickly and accurately in the mass information and application resources. The information contained in the wireless city as a city integrated portal platform. Information news and so on are very extensive. This paper optimizes the search results based on the full text search, and classifying the search results based on the application and information information automatically, so that the users can quickly enter the search business application or see the specific content containing the keywords through a simple keyword search. Greatly improve the experience of wireless city user groups, and further improve the efficiency of all walks of life.
The full text search system constructed in this paper is the two development of the full text search engine Lucene. Through the investigation of the existing wireless city, the shortcomings of the full text search system in the existing wireless city station are summarized, the design goal of this paper is put forward and the overall structure of the system is completed. On this basis, the detailed requirements analysis and analysis are carried out in the module. The function analysis gives the function flow chart and the implementation process. In view of the characteristics of the high frequency of the new words in the wireless city, based on the study of the existing Chinese word segmentation algorithm, the paper combines the mechanical participle and the statistical word segmentation, and proposes a Chinese word segmentation architecture which introduces the dynamic word library updating, and improves the Lucene index. In order to ensure the synchronization and consistency of index database and wireless city business database, the design realizes the automatic classification of search results by using information design in Term Vector in the search process. Good user experience search interface, through search word association function to user search related hints.
Finally, the whole operation results of the full text search system in wireless city station based on Lucene are given, the completed research work is summarized, and the direction of the next development of the wireless city and the improvement target of the system are put forward.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 罗智勇;宋柔;;现代汉语通用分词系统中歧义切分的实用技术[J];计算机研究与发展;2006年06期
2 谭琼,史忠植;分词中的歧义处理[J];计算机工程与应用;2002年11期
3 应志伟,柴佩琪,陈其晖;文语转换系统中基于语料的汉语自动分词研究[J];计算机应用;2000年02期
4 齐文新;谢军;熊涛;;基于Ajax技术即时通讯系统的设计与实现[J];计算机与数字工程;2007年07期
5 姚天顺,张桂平,吴映明;基于规则的汉语自动分词系统[J];中文信息学报;1990年01期
6 梁南元;汉语计算机自动分词知识[J];中文信息学报;1990年02期
7 骆正清,陈增武,胡上序;一种改进的MM分词方法的算法设计[J];中文信息学报;1996年03期
8 黄昌宁;赵海;;中文分词十年回顾[J];中文信息学报;2007年03期
9 马志强;刘利民;苏依拉;马瑞明;;基于Lucene的站内搜索引擎研究[J];内蒙古工业大学学报(自然科学版);2009年01期
10 韩维良;汉语自动分词系统中切分歧义与未登录词的处理策略[J];青海师范大学学报(自然科学版);2004年02期
,本文编号:1941511
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1941511.html