基于概率的潜在语义分析模型在搜索引擎商业文本分类系统中的应用研究

发布时间：2018-07-27 20:21

【摘要】：就搜索引擎的盈利性来说,搜索引擎投放的商业广告是否与用户的搜索意图相关十分重要。传统的文本分类方法在搜索引擎的商业文本分类系统中解决了一部分问题,但是,语义的抽象性、多义性、同义性等特征是普遍存在的现象,如何定义和计算语义、怎样与上下文结合分析语义,仍然是搜索引擎目前面临的主要问题。本文针对搜索引擎的商业需求,利用近年来学术界提出的“潜在概率语义分析”(Probability Latent Semantic Analysis, PLSA)技术,以软件工程的思想为指导,设计并实现了搜索引擎商业文本分类系统中的潜在概率语义计算模块。最后,按照商业搜索引擎的业务要求,本文作者采用相关标准对本模块进行了测试,证明了它的有效性和实用性。
[Abstract]:As far as the profitability of search engines is concerned, it is very important whether the commercial advertisements placed by search engines are relevant to users' search intentions. The traditional text classification method solves part of the problem in the commercial text classification system of search engine. However, the features of semantic abstraction, polysemy and synonym are common phenomena, how to define and calculate semantics, How to combine semantic analysis with context is still the main problem for search engines. Aiming at the commercial demand of search engine, this paper uses the "latent probabilistic semantic Analysis" (Probability Latent Semantic Analysis, PLSA) technology, which is proposed by academic circles in recent years, to be guided by the idea of software engineering. The latent probabilistic semantic computing module in the commercial text classification system of search engine is designed and implemented. Finally, according to the business requirements of the commercial search engine, the author tests the module with relevant standards, and proves its validity and practicability.
【学位授予单位】：北京交通大学
【学位级别】：硕士
【学位授予年份】：2011
【分类号】：TP391.1

【相似文献】