广告搜索中的查询处理算法

发布时间：2018-04-27 16:40

本文选题：广告搜索 + 查询处理　；参考：《上海交通大学》2011年硕士论文

【摘要】：文本广告搜索为搜索引擎的全网搜索提供相关的、有针对性的文本广告。要匹配查询最相关的广告,广告搜索引擎应用了很多查询处理的技术,其中包括布尔检索和对稀有广告查询短语进行扩展。现有最好的广告查询短语扩展算法都是依赖于搜索引擎的检索结果,这样导致依赖性太强,不能形成一个独立的广告搜索系统。在寻找可靠的外部资源时,我们选择了维基百科。通过使用网页查询短语在已有的维基系统上进行检索top-检索,我们可以得到能够重新表达这个查询短语的信息。通过这些信息,我们可以重新构建广告查询短语,然后使用这个新的广告查询短语到现有的广告库中进行搜索。随后,我们也设计了一系列的实验来证明我们的方法是非常有效。因为弱与关系能够控制检索的结果数量,所以在广告搜索的查询中我们使用弱与关系来解决经典关系带来的问题。但是因为现有的弱与关系处理速度不够快,我们将提出一个全新高效的弱与关系处理框架。这个框架有效的利用了弱与关系的两个属性特点:分词后的词权重和弱与关系特有的一个阈值参数。我们首先关注查询中的一种非常特别的词。在分词后,查询短语中有一部分词权重可能非常高,以至于这一部分词必须出现在结果文集中。这种词我们称之为“强制词”。如果有这种类型的词,我们就可以很容易的构造一种非常快速的基于强制词弱与关系高效算法。然而,并不是所有的查询短语都含有强制词,因此我们又构建一种基于败者树的算法。通过和最原始的弱与关系算法相结合,这三种算法构建了我们的弱与关系处理框架。实验证明我们的方法比之前的方法更加有效,并且也非常健壮。
[Abstract]:Text advertising search provides relevant, targeted text ads for search engines throughout the web. To match the most relevant advertisements of query search engines employ a number of query processing techniques including Boolean retrieval and the extension of rare ad query phrases. At present, the best algorithms are all dependent on the search results of search engine, which leads to the dependence too strong to form an independent advertising search system. In the search for reliable external resources, we chose Wikipedia. By using the web page query phrase to retrieve top-retrieval on the existing wiki system, we can get the information that can reexpress the query phrase. With this information, we can rebuild the ad query phrase, and then use the new ad query phrase to search the existing advertising library. Subsequently, we also designed a series of experiments to prove that our method is very effective. Because weak and relationship can control the number of retrieval results, we use weak and relationship to solve the problem caused by classical relationship in advertising search query. However, due to the existing weak and relational processing speed is not fast enough, we will propose a new and efficient weak and relational processing framework. This framework effectively utilizes the two attributes of weak and relation: word weight after word segmentation and a threshold parameter of weak and relation. We first look at a very special word in the query. After participle segmentation, some of the words in the query phrase may have a very high weight, so that the part of the word must appear in the result set. This kind of word we call "compulsive word". If we have this type of word, we can easily construct a very fast algorithm based on forced word weakness and relationship. However, not all query phrases contain mandatory words, so we construct an algorithm based on the loser tree. By combining with the most primitive weak and relational algorithms, these three algorithms construct our weak and relational processing framework. Experiments show that our method is more effective and robust than previous methods.
【学位授予单位】：上海交通大学
【学位级别】：硕士
【学位授予年份】：2011
【分类号】：TP391.3

【相似文献】