当前位置:主页 > 科技论文 > 搜索引擎论文 >

Sphinx站内搜索技术在论坛产品中的应用研究

发布时间:2018-06-02 02:36

  本文选题:站内全文搜索技术 + phpwind论坛 ; 参考:《复旦大学》2012年硕士论文


【摘要】:随着互联网的蓬勃发展,信息咨询、社交类门户、论坛已成为当今资讯的主流载体。门户论坛产品的市场需求量在日益增加,目前较流行的有disuz、 phpwind等。Phpwind是阿里巴巴公司近几年主推的一款集应用、论坛、社交、门户为一体的软件产品。本文作者经历了该论坛产品的多个版本开发与维护。过去老版本的phpwind论坛产品在高负载多并发压力下进行站内搜索的性能效率非常低下,如客户使用phpwind论坛产品建站后遇到日发帖量猛增等突发情况时经常造成服务器瘫痪等严重后果。所以这些弊端造成了该论坛产品发展的巨大瓶颈,在市场激烈竞争中也一度处于略势。 如何解决该论坛产品在大数据量高访问负载压力下又能提供快速精准的搜索服务;方便快捷地定位用户所需要的资讯内容的同时又能彻底释放站内搜索业务所带来的负载压力是本论文需要重点解决的问题之一。另外针对过去论坛产品不能进行站内准确定位查询也是本文需要重点解决的问题。 1.文本主要的研究成果: 本文通过在phpwind论坛产品内部集成了全文搜索技术sphinx的应用并结合搜索体系架构方案图解决了老phpwind论坛产品在垂直搜索时性能效率低下等问题。通过内置多条件下分类信息联动查询算法解决了老phpwind论坛产品在繁多商品信息数据负载压力下不能准确定位查询的问题。 2.本文的创新点: 本文在phpwind论坛产品中集成分布式sphinx搜索引擎技术解决了在高负载压力下进行搜索业务难与防止单点终端服务器挂机突发情况等诸多问题。同时本文还设计了论坛内存索引处理机制与队列控制技术在增加索引处理与响应时间方面具有一定意义。 3.需求分析设计所达到的效果: 通过单点sphinx终端技术结合搜索体系新架构图实现了在MYSQL下的百万级数据量论坛站内查询时,响应与返回数据结果时间控制在毫秒级,采用分布式sphinx搜索引擎技术做到了同样在百万级数据量站内进行查询响应与返回数据结果时间控制在微妙级。最后通过在论坛产品中新增分类信息联动查询算法实现了在多条件下进行准确定位查询的效果,同时也解决了老phpwind论坛产品搜索形式单一的问题。
[Abstract]:With the vigorous development of the Internet, information consultation, social portals and forums have become the mainstream carrier of today's information. The market demand of portal forum products is increasing day by day. At present, the popular ones are disuz, phpwind and so on. Phpwind is a kind of software product which is mainly promoted by Alibaba Company in recent years. The author has experienced the development and maintenance of several versions of the forum product. In the past, older versions of phpwind forum products were very inefficient in performing in-station searches under high load and multiple concurrent pressures. If customers use phpwind forum products to set up a site after meeting with a surge in daily posting and other emergencies, often resulting in server paralysis and other serious consequences. So these malpractices caused a huge bottleneck in the development of the Forum products, in the fierce competition in the market was also in a strategic position. How to solve the problem that the forum product can provide fast and accurate search service under the pressure of large amount of data and high access load; One of the key problems in this paper is to locate the information content needed by the user conveniently and quickly, and at the same time to completely release the load pressure brought by the search service in the station. In addition, the past forum products can not be accurately located in the site query is also the focus of this paper to solve the problem. 1. The main research results of the text are as follows: In this paper, the application of full-text search technology sphinx is integrated within the phpwind forum product, and the problem of inefficient performance of the old phpwind forum product in vertical search is solved by combining with the architecture diagram of the search system. This paper solves the problem that the old phpwind forum products can not locate the query accurately under the pressure of many kinds of commodity information data by using the interlinked query algorithm of classified information under the condition of built-in multi-condition. 2. The innovations of this paper are as follows: This paper integrates the distributed sphinx search engine technology into the phpwind forum product to solve many problems such as the difficulty of searching under high load pressure and the prevention of single point terminal server hanging up sudden situation and so on. At the same time, this paper also designs the forum memory index processing mechanism and queue control technology, which has a certain significance in increasing index processing and response time. 3. Requirements analysis design results: In this paper, the single point sphinx terminal technology combined with the new architecture diagram of the search system is used to realize the response and return data result time control in millisecond level when querying in the multi-level data forum station under MYSQL. The distributed sphinx search engine technology is used to control the time of query response and return data in subtle order. Finally, by adding the classification information linkage query algorithm to the forum product, the effect of accurate location query under multiple conditions is realized. At the same time, the problem of single search form of the old phpwind forum product is also solved.
【学位授予单位】:复旦大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3

【参考文献】

相关期刊论文 前9条

1 曹元大,贺海军,涂哲明;中文Web文档全文检索系统的设计及实现[J];北京理工大学学报;2002年01期

2 王继成,萧嵘,孙正兴,张福炎;Web信息检索研究进展[J];计算机研究与发展;2001年02期

3 张卫丰;徐宝文;周晓宇;许蕾;李东;;Web搜索引擎综述[J];计算机科学;2001年09期

4 宛玲,杨秀丹,杜晓静;试析中文搜索引擎的评价标准[J];情报科学;2000年01期

5 曾剑平;吴承荣;龚凌晖;;面向分布式搜索引擎的索引库动态维护算法[J];山东大学学报(理学版);2011年05期

6 胡骏;李星;;校园网信息资源搜索引擎的研究与实现[J];计算机工程与设计;2006年24期

7 徐建华,伍宪,胡燕菘;国外六个著名搜索引擎的特征和评析[J];现代图书情报技术;2001年01期

8 王香莲;Google和百度两种搜索引擎比较研究[J];现代图书情报技术;2004年08期

9 杨杰,徐炜民;搜索引擎原型系统的研究与设计[J];小型微型计算机系统;2002年10期



本文编号:1966953

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1966953.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户bb969***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com