基于多核并行的全文检索动态后继树模型相关算法研究
发布时间:2018-04-29 08:31
本文选题:多核处理器 + 并行编程 ; 参考:《广西大学》2013年硕士论文
【摘要】:随着多核处理器的快速普及,广大的程序开发者们面临着极大的机遇与挑战。在过去,由于缺乏并行开发的环境,大多数的开发者们通常只能在单核单线程的平台上进行程序开发,所以在大多数情况下是不会考虑程序的并发执行问题。但是,随着时间的推移,可以预见单核处理器在不久即将淡出人们的生活,多核处理器在近年来取得极大发展和普及,大多数开发者们都拥有了多核开发环境,在进行程序开发的时候,如果再不考虑程序的并行执行相关问题,将不能发挥多核平台拥有多线程并行执行能力的特点,不能进一步的提高程序的运行效率。 除此之外,伴随着互联网普及速度的进一步加快,搜索引擎成为了互联网上最受欢迎的应用之一,而搜索引擎就是全文检索系统的一个常见应用。伴随着多核平台的普及以及硬件价格的下降,建立适用于行业特色的检索系统将不再是一件遥不可及的事情。而过去对于全文检索索引模型的研究主要是通过对于索引结构的修改来提高索引的性能,而本文则希望通过另一种思路,通过对全文检索系统中动态后继树索引模型相关算法进行基于多核平台的并行化改进,从而提高索引生成以及检索的效率,本文主要从事了以下工作: 1、研究多核处理器的主要特点以及在进行多核编程的时候会遇到的问题;比较了多核平台并行运算与传统的多机分布式并行运算的不同,以此来作为进行算法并行化研究的指导。 2、分析研究全文检索系统中不同的索引结构及特点,并与动态后继树索引进行比较,从生成速度、检索效率以及更新的复杂度等角度出发,选择了动态后继树索引进行算法的相关研究。 3、结合多核编程的特点,研究了动态后继树索引创建算法的任务分配方法,在每个线程拥有各自私有内存空间的基础上,通过任务分配管理分配不同的对象,让每个并行线程都能以最合理的工作方式工作,最大程度发挥程序的并行性能。通过在原始索引结构创建算法的基础上,加入OpenMP指导语句以实现程序的并行化 4、分析了动态后继树的检索算法的特点,在多分词查找算法中加入OpenMP指导语句,研究了更适合多核平台的检索算法。 5、通过对动态后继树原始创建算法与修改后的并行创建算法以及动态后继树多分词查找原始算法与多分词并行查找算法的实验结果进行比较,证明经过了并行化处理的算法,在多核平台上的运行效率有显著提升。
[Abstract]:With the rapid popularity of multi-core processors, the vast number of program developers are facing great opportunities and challenges. In the past, because of the lack of parallel development environment, most developers can only develop programs on the platform of single core and single thread, so in most cases, the concurrent execution of programs is not considered. However, with the passage of time, it can be predicted that single-core processors will soon fade out of people's lives, and multicore processors have made great progress and popularity in recent years, and most developers have a multi-core development environment. In the process of program development, if we do not consider the parallel execution of the program, we will not be able to play the multi-core platform has the ability of multi-thread parallel execution, and can not further improve the efficiency of the program. In addition, with the further acceleration of the speed of Internet popularization, search engine has become one of the most popular applications on the Internet, and search engine is a common application of full-text retrieval system. With the popularization of multi-core platform and the decline of hardware price, it is no longer a distant thing to build a retrieval system suitable for industry characteristics. In the past, the research on index model of full-text retrieval is mainly to improve the performance of index by modifying the index structure, but this paper hopes to use another way of thinking. By improving the parallel algorithm of dynamic successor tree index model in the full-text retrieval system based on multi-core platform, this paper improves the efficiency of index generation and retrieval. The main work of this paper is as follows: 1. The main characteristics of multi-core processor and the problems encountered in the process of multi-core programming are studied, and the differences between parallel operation of multi-core platform and traditional distributed parallel computing of multi-computer are compared. This is used as a guide for the research of parallelization of algorithms. 2. The different index structures and characteristics in the full-text retrieval system are analyzed and compared with the dynamic successor tree index. From the point of view of generation speed, retrieval efficiency and update complexity, etc. The dynamic successor tree index is selected to study the algorithm. 3. According to the characteristics of multi-core programming, the task allocation method of dynamic successor tree index creation algorithm is studied. On the basis of each thread having its own private memory space, different objects are allocated through task allocation management. Each parallel thread can work in the most reasonable way to maximize the parallel performance of the program. In order to realize the program parallelization by adding OpenMP instruction on the basis of the original index structure creation algorithm. 4. The characteristics of the search algorithm of dynamic successor tree are analyzed, and the OpenMP instruction sentence is added to the search algorithm of multi-participle, and the retrieval algorithm which is more suitable for multi-core platform is studied. 5. By comparing the experimental results of the original creation algorithm of dynamic successor tree with the modified parallel creation algorithm, the original algorithm of multi-word search of dynamic successor tree and the parallel search algorithm of multi-participle, it is proved that the algorithm has been parallelized. The running efficiency on multi-core platform is improved significantly.
【学位授予单位】:广西大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 Herb Sutter;罗小平;;免费午餐已经结束——软件历史性地向并发靠拢[J];程序员;2006年11期
2 马科,胡运发;一个改进的互关联后继树数据模型[J];计算机工程;2003年21期
3 申展,江宝林,唐磊,胡运发;基于互关联后继树的频繁模式挖掘研究[J];计算机工程;2004年21期
4 王政华;胡运发;;基于后继区间的互关联后继树搜索算法[J];计算机工程;2007年09期
5 龚磊;武友新;;Lucene全文检索系统的研究与实现[J];计算机与数字工程;2010年05期
6 沈羽佳;韩松峰;刁海南;刘勇;;基于数据依赖图的主域变量识别方法[J];计算机应用研究;2009年01期
7 霍林;黄俊文;潘英花;王力;;大规模分布式资源搜索技术研究进展[J];计算机应用研究;2010年11期
8 申展,江宝林,张谧,唐磊,胡运发;互关联后继树模型及其实现[J];计算机应用与软件;2005年03期
9 郁卫江,朱根江,谢立;一个过程间数据流分析的框架[J];软件学报;1997年09期
10 阳雪林,于勐,陈道蓄,谢立;自动并行编译新技术[J];软件学报;2000年09期
相关博士学位论文 前1条
1 闫昭;程序并行识别方法及应用研究[D];吉林大学;2009年
,本文编号:1819177
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1819177.html