当前位置:主页 > 科技论文 > 搜索引擎论文 >

聚焦搜索引擎研究及其在社区信息化中的应用

发布时间:2018-11-01 20:02
【摘要】:“云计算”作为一种全新的商业模式,是在2006年由Google提出的。它的提出为产业界和学术界提供了一个全新的思路。山东大学信息科学与工程学院袁东风教授团队迅速抓住了这一机遇,在基于云计算的新型信息化模式方面展开了深入研究并取得了阶段性成果。该团队已经得到了两个山东省自主创新成果转化重大专项的支持,本文课题就是来源于第二个重大专项“低成本、低耗能、高可靠嵌入式终端与信息服务平台”(2010ZHZX1A1001)。 在国家推行城镇化的大趋势下,针对农村改造成社区并实行规模经营和集体经济已经开始启动。山东省农村改造工作取得了较快的发展,本课题所属的重大专项选择的试点地区就是一个农村改造成社区的典型。社区信息化建设也成为信息化建设非常重要的一部分,在《2006-2020年国家信息化发展战略》中,将推进社区信息化建设列为我国信息化发展的战略重点之一。本项目团队在这样的背景下,展开了信息化关键技术研究,提出了“云计算服务器+宽带网+瘦客户端”这种完全摒弃PC的全新信息化模式。项目团队研发并批量生产了基于嵌入式架构的瘦客户端,成本和功耗都降低到了一个很低的水平;研发了云计算服务器集群,并针对社区用户的调查结果开发了用户关注的应用和信息服务。用这种模式取代传统的以PC为核心的信息化道路,展开了大规模的试点示范,并取得了良好的效果。 针对目标用户的使用要求,结合新型社区信息化模式的特点,本文设计实现了针对淘宝购物的聚焦搜索引擎,为社区信息化用户提供方便快捷的购物搜索和推荐。针对淘宝网商品种类繁多的特点,设计实现了商品通用模型,达到新增商品的时候不用大规模更新数据表的效果。系统设计了网络爬虫和信息搜索两大模块,其中网络爬虫模块实现了淘宝网商品信息抓取、索引文件的建立和商品详细信息存入数据库等操作,信息检索模块实现了用户关键字查询接口、索引文件查询和数据库查询等,为用户提供搜索结果列表显示、详细信息展示和信息推荐。 在爬虫模块,为了应对海量数据的抓取效率问题,运用java语言实现了基于hadoop的分布式网络爬虫。本文首先在ubuntu9.10操作系统下搭建了hadoop分布式环境,然后设计了针对hadoop的分布式爬虫程序,实现了对淘宝网数据的抓取;通过设计数据存储策略实现了索引文件的建立;优化了缓存策略,减少了物理空间占用率;针对淘宝网的数据特点,设计了信息提取方法并实现了商品详细信息存入数据库的操作;针对网络情况可能造成的系统运行异常,设计了日志存储规则;系统设计了用户操作界面,可以对数据的抓取规则进行设置。 在搜索模块,实现了基于浏览器的信息搜索功能。搜索程序的核心是一个J2EE工程,它实现了索引文件查询和数据库查询。系统首先实现了运行环境配置功能,针对系统运行的参数进行设定;通过前台页面实现了用户查询接口,并对关键字进行索引文件的检索,得到目标关键字的商品集合;根据商品集合中的数据库入口信息,结合数据库查询得到结果集合;针对目标用户对价格敏感的特点,实现了对结果集进行价格排序;实现了商品详细信息的查询,可以显示商品价格、标题、描述信息、价格曲线,并且就相近价格区间的商品进行推荐。
[Abstract]:Cloud Computing As a brand-new business model, it was proposed by Google in 2006. It offers a brand-new idea for industry and academia. The team of Dong Feng of Shandong University School of Technology and Engineering grasped this opportunity quickly, and carried out an in-depth study on the new information model based on cloud computing and made a phased achievement. The team has received the support of the transformation of the independent innovation achievements of two Shandong provinces. This paper aims to come from the second major special project Low-cost, low-consumption, high-reliability embedded terminal and information service platform (2010ZHZX1A1001). In the large trend of the country's urbanization, it has started to transform the countryside into a community and carry out large-scale operation and collective economy Starting with the rapid development of rural reconstruction in Shandong Province, the pilot area of the major special choice to which this project belongs is a rural transformation into a community. The construction of community informatization is also a very important part of informatization construction. In the National Information Development Strategy of 2006-2020, the construction of information construction of the community is listed as the strategic focus of China's information development. 1. In this background, the project team expands the key technology research of informatization, "Cloud Computing Server + Broadband Network + Thin Guest" is proposed Household End "This completely abandoned PC's brand-new informatization Pattern. The project team developed and mass-produced thin clients based on embedded architecture, reduced costs and power consumption to a very low level; developed cloud computing server clusters and developed user-focused applications and information for community users' findings Service. With this model, replace the traditional PC-centric informatization road, carry out a large-scale pilot demonstration, and have achieved good results According to the requirements of the target users and the characteristics of the new community information model, this paper designs a focus search engine for Taobao shopping, and provides convenient and convenient shopping for the community information users. Search and recommend. Aiming at the characteristics of the variety of products of Taobao, the general model of commodity is designed and realized, and the number of large-scale updating is not used when new goods are added. According to the effect of the table, the network crawler and the information searching module are designed in the system, wherein the network crawler module realizes the operation of the information retrieval module of the Taobao network, the establishment of the index file and the storage of the commodity detailed information into the database, and the information retrieval module realizes the key of the user. a word query interface, an index file query and a database query, and the like, provides a search result list display for a user, and detailed information display and information recommendation. In the crawler module, in order to deal with the grabbing efficiency of mass data, the java language is used to implement hadoop. In this paper, we set up the hadoop distributed environment under the operating system of ubuntu 9. 10, then designed the distributed crawler program directed to hadoop, which realized the grasping of the data of Taobao, and realized the establishment of the index file through the design data storage strategy. The caching strategy is optimized, the physical space occupation rate is reduced, the information extracting method is designed according to the data characteristics of the Taobao network, the operation of the commodity detailed information in the database is realized, the system running exception possibly caused by the network situation is abnormal, the log storage rule is designed, and the system is arranged. The user's operation interface is counted, which can be used for data. The capture rule is set. Based on the search module, the base is implemented. The core of the search program is a J2EE project, which realizes the information search function of the browser. The system firstly realizes the operation environment configuration function, sets the parameters for the system operation, realizes the user query interface through the foreground page, and indexes the keyword to search the index file to obtain the commodity collection of the target keyword; and according to the commodity, The database entry information in the collection is combined with the database query to obtain a result set; the price ordering is realized for the result set aiming at the characteristic of the target user on the price; the query of the commodity detailed information can be realized, and the commodity price and the mark can be displayed. Problem, description information, price curve, and simila
【学位授予单位】:山东大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3

【参考文献】

相关期刊论文 前7条

1 刘磊安;符志强;;基于Lucene.net网络爬虫的设计与实现[J];电脑知识与技术;2010年08期

2 肖珑;元数据格式在数字图书馆中的应用[J];大学图书馆学报;1999年04期

3 阎琦;;通用电子商品售后维修管理模块的建模与实现[J];信息技术;2012年09期

4 马宏远;王斌;;基于用户特性的搜索引擎查询结果缓存与预取[J];中文信息学报;2012年06期

5 胡晟;;基于网络爬虫的Web挖掘应用[J];软件;2012年07期

6 党飞;江铭炎;袁东风;;基于KVM的B/S架构虚拟化管理系统[J];计算机工程与设计;2013年06期

7 梁弼;王光琼;邓小清;;基于Lucene的全文检索系统模型的研究及应用[J];微型机与应用;2011年01期

相关硕士学位论文 前1条

1 陈玉鹏;基于语义网的web信息检索研究[D];吉林大学;2008年



本文编号:2304952

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2304952.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户d1cde***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com