基于物化视图增量维护的企业垂直搜索优化研究
本文选题:企业垂直搜索 切入点:增量维护 出处:《江苏科技大学》2013年硕士论文 论文类型:学位论文
【摘要】:对垂直搜索引擎的技术改进和优化,在信息产业界一直备受关注,然而长期以来针对该领域的相关研究多集中于网络数据爬取、检索效率改进以及个性服务等方面,而很少涉及对垂直搜索引擎整体架构性能的研究。具体到企业级层面,相关研究也多集中于对网页数据信息的处理,,很少有针对单个企业信息的纵向深度挖掘,导致搜索返回结果总体质量不高。如何改善企业垂直搜索引擎的服务性能,为工作开展提供更具价值的决策支持信息,已成为学术界和企业界的关注焦点。本课题立足现实需要,基于广泛的文献阅读,针对具体企业垂直搜索引擎的总体架构展开研究,重点优化改进了物化视图增量维护和搜索结果呈现进行算法,并结合某船舶配套生产企业的应用实例进行验证评价。 首先,针对企业垂直搜索引擎业务搜索实体层,通过建立视图并以实体化形式存储于数据仓库,运用“缓存”机制来改善系统检索效率。由于物化视图存储内容具有静态特征,需要着重实现物化视图的更新维护。文章通过引入语义约束概念,对物化视图增量更新维护算法进行优化,有效改善提升了系统的检索效率。 其次,对于表达层的检索结果呈现,通过应用元数据驱动加以实现,结合用户需求特征将“个性化”特征信息通过元数据加以存储,以“二级视图”维护优化系统性能。在个性化搜索服务结果呈现时,借鉴主题元搜索的数据整合思想对搜索结果排序工作进行优化。 最后,将上述改进优化工作统一于企业垂直搜索系统的整体架构,详细阐述了其具体应用实现机制,并结合某船舶配套生产企业应用实例验证说明本文研究工作的正确性和有效性。
[Abstract]:The technology improvement and optimization of vertical search engine has been paid much attention to in the information industry. However, for a long time, the related research in this field has focused on the aspects of data crawling, retrieval efficiency improvement and personality service, etc. However, there is little research on the performance of vertical search engine architecture. On the enterprise level, the related research also focuses on the processing of web page data information, and there are few vertical depth mining for individual enterprise information. How to improve the service performance of enterprise vertical search engine and provide more valuable decision support information for work, This topic is based on the need of reality, based on extensive literature reading, aiming at the overall structure of vertical search engine in specific enterprises. The algorithm of incremental maintenance of physical and chemical view and rendering of search results are optimized and improved, and the verification and evaluation are carried out with an application example of a ship supporting manufacturing enterprise. First of all, for the enterprise vertical search engine business search entity layer, by creating a view and storing it in a data warehouse in the form of materialization, The mechanism of "cache" is used to improve the retrieval efficiency of the system. Because of the static characteristics of the materialized view storage, it is necessary to realize the updating and maintenance of the materialized view. In this paper, the concept of semantic constraint is introduced. The incremental update maintenance algorithm of materialized view is optimized to improve the retrieval efficiency of the system. Secondly, for the presentation of retrieval results in the expression layer, it is realized by applying metadata driver, and "personalized" feature information is stored through metadata combined with the characteristics of user requirements. The performance of the optimized system is maintained by "two-level view". When the results of personalized search service are presented, the sorting of search results is optimized by using the idea of data integration of topic meta-search. Finally, the above improvement and optimization work is unified into the whole structure of the enterprise vertical search system, and the specific application and implementation mechanism is described in detail. The validity and validity of the research work are illustrated by an application example of a ship supporting manufacturing enterprise.
【学位授予单位】:江苏科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 胡孔法,宋爱波,董逸生,赵庆建;数据仓库中实视图聚集函数的增量更新[J];东南大学学报(自然科学版);2002年01期
2 冷春霞;数据仓库中物化视图的选择方法[J];华东理工大学学报(自然科学版);2005年02期
3 张书江;;基于超链接分析搜索引擎页面排序算法的剖析[J];安徽理工大学学报(自然科学版);2008年02期
4 王继成,萧嵘,孙正兴,张福炎;Web信息检索研究进展[J];计算机研究与发展;2001年02期
5 李保利,陈玉忠,俞士汶;信息抽取研究综述[J];计算机工程与应用;2003年10期
6 杜威;潘久辉;邹先霞;;多实化视图的自维护算法[J];计算机工程;2006年09期
7 范新华,陈宏兵,许满武;基于MPEG-7的多媒体搜索引擎构建[J];计算机应用研究;2004年11期
8 朱学昊;王儒敬;余锋林;唐昱;;基于Lucene的站内搜索设计与实现[J];计算机应用与软件;2008年10期
9 彭蕾;廖湖声;金雪云;;XQuery物化视图增量更新系统框架的研究[J];计算机应用与软件;2011年06期
10 ;Incremental View Computation Model for Object-Oriented Information[J];Wuhan University Journal of Natural Sciences;2004年03期
相关博士学位论文 前1条
1 葛世伦;大型单件小批制造企业信息模型研究[D];南京理工大学;2002年
相关硕士学位论文 前2条
1 尉建兴;基于Lucene搜索引擎的研究与应用[D];太原理工大学;2011年
2 李巍;企业信息搜索引擎的设计与关键技术的研究[D];内蒙古农业大学;2011年
本文编号:1562271
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1562271.html