基于Android的数码产品垂直搜索引擎研究与实现
发布时间:2019-03-31 20:55
【摘要】:信息技术的迅速发展给我们的生活带来了很多乐趣,然而,信息量的增多给我们查询所需要的信息带来了很大的难度,直到搜索引擎的出现才缓解了这一局势。通用搜索引擎将搜索结果不加区分的返回给用户,用户还需要从五花八门的结果中再进行大量的人工筛选才能找到自己想要的结果。作为搜索引擎的高级形式,垂直搜索引擎带来了明显的查询优势。它查询精准,分类明确,使用户可以方便、准确的找到自己所需,增加了用户黏性,推广起来也不成问题。 进入新世纪以来,智能移动终端快速普及,3G、WLAN等无线通信技术也得到了较好的应用,这都标志着移动互联网离我们的生活越来越近,随之而来的是蓬勃发展的移动应用开发。当前,Android系统占据了移动应用开发系统较大比例的市场,受到了广大移动应用开发者的青睐。手机客户端搜索引擎能够起到实时搜索、降低购物成本等作用,人们希望能随时随地从因特网获得更丰富的信息,这时一个移动终端的智能搜索系统就可以满足用户当前的需要。 本文以笔记本电脑和手机产品的资源库为背景,通过研究垂直搜索引擎的特点,设计并实现了一个数码产品垂直搜索引擎系统:其中包括对开源网络爬虫Heritrix的扩展与改进,对网页抓取过程中存在的问题进行了优化和处理;结合HTMLParser技术,将爬虫定制抓取下来的网页解析成结构化的文本并进行存储;结合Lucene技术,对结构化的文本建立了索引,同时还构建了专业词库,实现了检索模块;采用JAVA EE三层架构,利用Spring和DWR技术,开发了用户接口。在此搜索系统的基础上,通过制定和实现Android客户端与服务器端之间的通信接口,将Android客户端的数码产品搜索纳入到本文的研究内容中,更好地满足用户的需要。最后通过对系统测试,移动数码产品垂直搜索引擎的方案是切实可行的,提高了查询的效率和准确度。 本文及系统的创新点有:构建本系统的时候,采用了相关策略和算法实现了主题网络爬虫模块,使获取到的信息更加精确、更符合用户预期;在Android系统上实现了门户网站的垂直搜索功能。
[Abstract]:The rapid development of information technology has brought a lot of fun to our lives. However, the increase of information brings us a lot of difficulty in searching for the information we need, and it is not until the emergence of search engine that this situation is alleviated. The general search engine returns the search results indiscriminately to the user, and users need to do a lot of manual filtering from a wide variety of results in order to find the results they want. As the advanced form of search engine, vertical search engine brings obvious query advantage. It is accurate query, clear classification, so that users can easily and accurately find their own needs, increase the viscosity of users, it is not a problem to popularize. Since the beginning of the new century, intelligent mobile terminals have been rapidly popularized, 3G, WLAN and other wireless communication technologies have also been better used, which indicates that the mobile Internet is getting closer and nearer to our lives. What follows is the vigorous development of mobile applications. At present, Android system occupies a large proportion of the mobile application development system market, and has been favored by the majority of mobile application developers. Mobile client search engine can play a real-time search, reduce the cost of shopping, and so on, people want to get more information from the Internet anytime, anywhere. At this time, a mobile terminal intelligent search system can meet the current needs of users. Based on the resource base of notebook and mobile phone products, this paper designs and implements a vertical search engine system for digital products by studying the characteristics of vertical search engine. This system includes the extension and improvement of open source crawler Heritrix. The problems existing in the process of web page crawling are optimized and dealt with. Combined with HTMLParser technology, the crawler customized web page is parsed into structured text and stored, and combined with Lucene technology, the index of structured text is established, at the same time, the specialized thesaurus is constructed, and the retrieval module is realized. The user interface is developed by using JAVA EE three-tier architecture, Spring and DWR technology. On the basis of this search system, through the establishment and implementation of the communication interface between the Android client and the server, the digital product search of the Android client is included in the research content of this paper, so as to better meet the needs of the users. Finally, through the system test, the scheme of vertical search engine for mobile digital products is feasible, and the efficiency and accuracy of query are improved. The innovations of this paper and the system are as follows: when the system is built, the related strategies and algorithms are used to realize the topic network crawler module, which makes the obtained information more accurate and more in line with the user's expectation; The vertical search function of portal is realized on Android system.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
本文编号:2451268
[Abstract]:The rapid development of information technology has brought a lot of fun to our lives. However, the increase of information brings us a lot of difficulty in searching for the information we need, and it is not until the emergence of search engine that this situation is alleviated. The general search engine returns the search results indiscriminately to the user, and users need to do a lot of manual filtering from a wide variety of results in order to find the results they want. As the advanced form of search engine, vertical search engine brings obvious query advantage. It is accurate query, clear classification, so that users can easily and accurately find their own needs, increase the viscosity of users, it is not a problem to popularize. Since the beginning of the new century, intelligent mobile terminals have been rapidly popularized, 3G, WLAN and other wireless communication technologies have also been better used, which indicates that the mobile Internet is getting closer and nearer to our lives. What follows is the vigorous development of mobile applications. At present, Android system occupies a large proportion of the mobile application development system market, and has been favored by the majority of mobile application developers. Mobile client search engine can play a real-time search, reduce the cost of shopping, and so on, people want to get more information from the Internet anytime, anywhere. At this time, a mobile terminal intelligent search system can meet the current needs of users. Based on the resource base of notebook and mobile phone products, this paper designs and implements a vertical search engine system for digital products by studying the characteristics of vertical search engine. This system includes the extension and improvement of open source crawler Heritrix. The problems existing in the process of web page crawling are optimized and dealt with. Combined with HTMLParser technology, the crawler customized web page is parsed into structured text and stored, and combined with Lucene technology, the index of structured text is established, at the same time, the specialized thesaurus is constructed, and the retrieval module is realized. The user interface is developed by using JAVA EE three-tier architecture, Spring and DWR technology. On the basis of this search system, through the establishment and implementation of the communication interface between the Android client and the server, the digital product search of the Android client is included in the research content of this paper, so as to better meet the needs of the users. Finally, through the system test, the scheme of vertical search engine for mobile digital products is feasible, and the efficiency and accuracy of query are improved. The innovations of this paper and the system are as follows: when the system is built, the related strategies and algorithms are used to realize the topic network crawler module, which makes the obtained information more accurate and more in line with the user's expectation; The vertical search function of portal is realized on Android system.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 张丽敏;;垂直搜索引擎的主题爬虫策略[J];电脑知识与技术;2010年15期
2 吴涛;;PAGERANK算法下的网站链接优化策略研究[J];电子商务;2009年07期
3 张义忠,赵明生,朱精南;基于内容的网页特征提取[J];计算机工程与应用;2001年10期
4 吉根林,孙志挥;Web挖掘技术研究[J];计算机工程;2002年10期
5 王琦;张戈;何婧;;基于Lucene与Heritrix的图书垂直搜索引擎的研究与实现[J];计算机时代;2010年02期
6 陈再良;凌力;周强;;dPageRank——一种改进的分布式PageRank算法[J];计算机应用;2006年01期
7 白坤;耿国华;;基于Lucene/Heritrix的垂直搜索引擎的研究与应用[J];计算机应用与软件;2009年01期
8 刘运强;;垂直搜索引擎的研究与设计[J];计算机应用与软件;2010年07期
9 邱战宏;顾国庆;陈江洪;;搜索引擎的现状及发展趋势探析[J];科技广场;2009年09期
10 王继明;杨国林;;基于Lucene的中文文本分词[J];内蒙古工业大学学报(自然科学版);2007年03期
,本文编号:2451268
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2451268.html