商品搜索引擎产品排序模块的设计与实现
发布时间:2018-10-13 07:43
【摘要】:随着互联网和电子商务的发展,各类c2c和b2c网站为用户提供的商品数量和种类也越来越多,并且各网购网站都为用户提供了针对站内商品的搜索服务。如果用户需要对各个网站的商品进行比较搜索,就需要一个全网商品搜索引擎。全网商品搜索引擎是一种收录所有商家的商品信息,并且能够提供各种维度检索的垂直搜索引擎。对于全网商品搜索引擎,查询返回的信息如果以商品维度呈现给用户,会使得用户淹没在大量的数据中,对此人们提出了多种方法对查询结果的展示进行改进,其中把返回的结果从商品维度向上归约(Reduce)到产品维度来进行展现是一种比较好的办法。产品是商品的泛化概念,用户搜索可以先定位到产品后再对产品下的商品进行比较和选择。因此产品节点排序的权威性在一定程度上反映商品搜索引擎排序的权威性。基于以上背景和结合在实习公司的工作内容,本文设计和实现了一个用于商品搜索引擎的产品排序模块。该模块产生的产品静态分数作为商品搜索引擎产品排序的依据已经在线上使用。 本文设计的产品排序模块主要分为两大子模块:产品静态分数离线计算模块和数据监控模块。产品静态分数离线计算采用Hadoop技术,使其能处理海量的商品信息数据。而且该模块的设计具有可扩展性,能根据运营的要求对不同类目下产品的计算标准进行更改、对分数异常的产品节点进行特殊处理等功能。数据监控模块为开发人员提供对产品分数的波动和产品特征的情况进行监控的功能,可以根据产出的报表直观的观察和追踪排序分数异常的原因。该模块采用Django框架,结合Django的MTV开发模式,把系统自顶向下分为模板层、视图层和模型层。其中模板层和逻辑处理层(视图层)、模型层的分离,使得开发人员更加容易开发数据驱动型的web程序。 本文首先介绍了项目的背景,接着对项目所使用的技术和框架做了简介。然后针对产品排序模块的需求做了分析,根据需求分析详细阐述了产品静态分数产生模块和数据监控模块的设计和实现,其中的核心是静态分数的MapReduce程序的设计和实现。最后对项目进行了总结和展望。
[Abstract]:With the development of Internet and e-commerce, more and more products are provided to users by all kinds of C2C and B2C websites. If users need to compare the products of each website, they need a web-wide commodity search engine. The whole web commodity search engine is a vertical search engine which can collect the merchandise information of all the merchants and can search all kinds of dimensions. For the whole web commodity search engine, if the information returned by the query is presented to the user in the commodity dimension, it will cause the user to be submerged in a large amount of data. In this paper, many methods have been proposed to improve the display of the query results. It is a better way to display the returned results from the commodity dimension to the product dimension. Product is a general concept of commodity. User search can locate the product first and then compare and select the product under the product. Therefore, the authority of product node sorting reflects the authority of commodity search engine sorting to a certain extent. Based on the above background and the work content of the internship company, this paper designs and implements a product sorting module for commodity search engine. The product static score generated by this module has been used online as the basis for product ranking of commodity search engines. The product sorting module designed in this paper is mainly divided into two sub-modules: the static fraction off-line calculation module and the data monitoring module. Hadoop technology is used to calculate the static fraction of products, which can deal with a large amount of commodity information data. Moreover, the design of the module is extensible, and it can change the calculation standard of different categories of products according to the operational requirements, and make special processing of product nodes with abnormal scores. The data monitoring module provides developers with the function of monitoring the fluctuation of product scores and the situation of product characteristics, and can intuitively observe and track the causes of abnormal ranking scores according to the output reports. This module adopts the Django framework and combines the MTV development mode of Django to divide the system from top to bottom into three layers: template layer, view layer and model layer. The separation of template layer and logical processing layer (view layer) and model layer makes it easier for developers to develop data-driven web programs. This article first introduces the background of the project, and then gives a brief introduction to the technology and framework used in the project. Then the requirement of the product sorting module is analyzed and the design and implementation of the static score generation module and the data monitoring module are described in detail according to the demand analysis. The core of the module is the design and implementation of the MapReduce program of the static score. Finally, the project is summarized and prospected.
【学位授予单位】:南京大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP311.52
本文编号:2267817
[Abstract]:With the development of Internet and e-commerce, more and more products are provided to users by all kinds of C2C and B2C websites. If users need to compare the products of each website, they need a web-wide commodity search engine. The whole web commodity search engine is a vertical search engine which can collect the merchandise information of all the merchants and can search all kinds of dimensions. For the whole web commodity search engine, if the information returned by the query is presented to the user in the commodity dimension, it will cause the user to be submerged in a large amount of data. In this paper, many methods have been proposed to improve the display of the query results. It is a better way to display the returned results from the commodity dimension to the product dimension. Product is a general concept of commodity. User search can locate the product first and then compare and select the product under the product. Therefore, the authority of product node sorting reflects the authority of commodity search engine sorting to a certain extent. Based on the above background and the work content of the internship company, this paper designs and implements a product sorting module for commodity search engine. The product static score generated by this module has been used online as the basis for product ranking of commodity search engines. The product sorting module designed in this paper is mainly divided into two sub-modules: the static fraction off-line calculation module and the data monitoring module. Hadoop technology is used to calculate the static fraction of products, which can deal with a large amount of commodity information data. Moreover, the design of the module is extensible, and it can change the calculation standard of different categories of products according to the operational requirements, and make special processing of product nodes with abnormal scores. The data monitoring module provides developers with the function of monitoring the fluctuation of product scores and the situation of product characteristics, and can intuitively observe and track the causes of abnormal ranking scores according to the output reports. This module adopts the Django framework and combines the MTV development mode of Django to divide the system from top to bottom into three layers: template layer, view layer and model layer. The separation of template layer and logical processing layer (view layer) and model layer makes it easier for developers to develop data-driven web programs. This article first introduces the background of the project, and then gives a brief introduction to the technology and framework used in the project. Then the requirement of the product sorting module is analyzed and the design and implementation of the static score generation module and the data monitoring module are described in detail according to the demand analysis. The core of the module is the design and implementation of the MapReduce program of the static score. Finally, the project is summarized and prospected.
【学位授予单位】:南京大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP311.52
【参考文献】
相关硕士学位论文 前3条
1 孙文礼;电子商务系统中的全文检索及排序优化算法[D];浙江大学;2008年
2 张仁爱;产品库平台系统的研究和实现[D];浙江大学;2010年
3 王黎;搜索引擎的相关性排序算法研究[D];中国科学技术大学;2010年
,本文编号:2267817
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2267817.html