基于时间序列的搜索引擎评估模型算法研究
发布时间:2018-05-13 14:24
本文选题:时间序列 + 熵权 ; 参考:《东北师范大学》2015年硕士论文
【摘要】:随着大数据时代的到来,搜索引擎作为人类获取信息的首要工具它的优劣直接决定了人们获取信息的准确性、速度和成本。因此,分析用户对搜索引擎的满意度一直是信息检索界的重要研究方向。本文以现代搜索技术中满意度及相关性等评估工作为基础,通过构建熵权模型及时间序列模型,分析引起搜索引擎满意度变化的主要因素。本文主要分为三大部分进行研究:首先,基于目前网页搜索评估中最重要的相关性评估和Session满意度分析两个主要部分,分析了两种评估的现状及存在的问题,并深入探讨了与本文相关的信息熵理论和时间序列模型。然后,确定分析满意度所需要的数据源——用户行为日志。对日志进行融合、对象重构、指标筛选、维度扁平化、扩展和多维度的拆分等操作,最终将处理后的数据存储到Infobright数据仓库中从而成为后续分析的数据基础。在此基础上,构造了熵权模型的判别矩阵,在已知结论的情况下逆向求解熵权,通过实验确定其可行性和合理性。在分析定位熵权模型的缺点后,设计了符合数据特点的时间序列模型,并构造出最终的满意度变化分析模型,实验分析了不同指标在Session满意度变化过程中的贡献。最后,设计并开发了基于时间序列的满意度变化分析系统,包括日志处理、计算熵权、计算满意度贡献、时间序列预测以及结论存储等5个模块。实际分析中该系统的结论为搜索引擎功能和服务的改进提供了一定的方向性建议,理论方面提出了一套简单易行、适合集群运算的分析满意度变化因素的算法。
[Abstract]:With the arrival of big data era, search engine, as the primary tool to obtain information, directly determines the accuracy, speed and cost of obtaining information. Therefore, the analysis of users' satisfaction with search engines has been an important research direction in information retrieval field. Based on the evaluation of satisfaction and correlation in modern search technology, this paper analyzes the main factors that cause the change of search engine satisfaction by constructing entropy weight model and time series model. This paper is mainly divided into three parts: first, based on the most important relevance evaluation and Session satisfaction analysis, this paper analyzes the current situation and existing problems of the two kinds of evaluation. The information entropy theory and time series model related to this paper are also discussed. Then, the user behavior log, the data source needed to analyze satisfaction, is determined. Log fusion, object refactoring, index filtering, dimension flattening, expansion and multi-dimensional split operations are carried out. Finally, the processed data is stored in the Infobright data warehouse, which becomes the data base for subsequent analysis. On this basis, the discriminant matrix of entropy weight model is constructed, and the entropy weight is solved in reverse under the condition of known conclusion. The feasibility and rationality of entropy weight are determined by experiments. After analyzing the shortcomings of the localization entropy weight model, a time series model which accords with the characteristics of the data is designed, and the final satisfaction change analysis model is constructed. The contribution of different indexes in the process of Session satisfaction change is analyzed experimentally. Finally, a time series based satisfaction analysis system is designed and developed, which includes five modules: log processing, entropy weight calculation, satisfaction contribution calculation, time series prediction and conclusion storage. The conclusion of the system provides some direction suggestions for the improvement of search engine function and service. In theory, a set of simple and suitable algorithm for analyzing the factor of satisfaction change is put forward.
【学位授予单位】:东北师范大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.3
【参考文献】
相关期刊论文 前2条
1 董哈微;叶先一;;基于时间序列的呼叫中心话务量预测[J];闽江学院学报;2008年05期
2 郝香芝;李少颖;;我国GDP时间序列的模型建立与预测[J];统计与决策;2007年23期
相关硕士学位论文 前4条
1 王秋彬;基于多因素非线性敏感分析的工业项目评估模型[D];华中科技大学;2005年
2 卢林兰;ontology及其在个性化信息检索中的应用研究[D];兰州理工大学;2007年
3 高峰;时间序列分析在顾客满意度中的应用研究[D];华东师范大学;2007年
4 郭龙;时间序列数据的周期性研究[D];电子科技大学;2013年
,本文编号:1883575
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1883575.html