当前位置:主页 > 科技论文 > 软件论文 >

大数据环境下群体计算任务分配和关联分析算法的优化研究

发布时间:2019-05-19 08:41
【摘要】:随着大数据时代到来,数据规模剧增。尽管大数据带来了丰富的信息和知识,但大数据的规模繁杂性、高速增长性、形式多样性和价值密度低等特点也对传统数据处理技术带来了严峻的挑战。因此,亟须适应于大数据环境的大数据处理技术。大数据处理技术可以划分为:人机协作群体计算技术和数据处理算法技术。本文已在这两方面开展了研究工作,取得了如下两方面的研究成果:(一)在人机协作群体计算技术方面,针对大数据任务对复杂认知推理技术的依赖问题,主要通过优化群体计算的方法来解决。其中合理的分配策略是计算的重要阶段,本文提出了一种基于用户主题精确感知的大数据群体任务分配算法。为了提高计算的准确率,首先通过基于自适应模糊聚类与主题提取模型相结合的方法,提取已发布群体任务的主题;然后构建特定群体任务模型和用户模型,并计算关联度;再利用已提交高质量答案的历史任务迭代地检测新用户的真实主题并计算初始准确率;其次,通过逻辑回归(LR)方法预测用户能参与到某类任务的可能性并得到参与用户候选序列,在充分了解用户真实主题和对应主题上的准确率以及用户诚信度的情况下进行精准分配。最后通过实验,验证本文所研究算法更精准,尤其适用于大数据环境,并一定程度上节约了随机算法需多次重复分配确保准确率的花销。(二)在数据处理算法技术方面,针对海量数据处理的效率需求问题,本文提出了一种基于云计算的改进算法并行化方法。传统算法已不能满足对大数据的处理需求。其中,关联分析算法是数据处理技术的研究热点之一。本文对关联分析Apriori算法的改进工作主要包括两部分内容:首先,提出了一种基于矩阵的Apriori算法改进方法(M_Apriori),该方法的创新之处在于构造矩阵的方式和计算步骤的改变,算法采用基于矩阵的数据结构进行存储与处理,只需扫描一次数据库,减少了数据库I/O开销,通过构造支持频数矩阵,利用逻辑“与”运算对算法核心操作步骤(连接与剪枝)进行改进,并进行了理论验证与分析。然后提出了一种基于Spark的M_Apriori算法并行化方法(SPM_Apriori),该方法采用数据并行和局部代替全局策略,充分利用Spark基于内存计算、RDD存储数据项等优势,通过对M_Aprior算法进行并行化设计,并移植到Spark平台进行并行化实现,丰富了Spark MLlib。最后,对算法进行实验,验证本文算法取得了较好的效果。
[Abstract]:With the advent of the big data era, the scale of data has increased dramatically. Although big data has brought rich information and knowledge, big data's complicated scale, high speed growth, low form diversity and low value density have also brought severe challenges to the traditional data processing technology. Therefore, it is urgent to adapt to big data environment big data treatment technology. Big data processing technology can be divided into: man-machine cooperation group computing technology and data processing algorithm technology. In this paper, the research work has been carried out in these two aspects, and the following two research results have been obtained: (1) in the aspect of human-computer cooperative group computing technology, aiming at the dependence of big data's task on complex cognitive reasoning technology, It is mainly solved by optimizing the method of group calculation. Among them, reasonable allocation strategy is an important stage of computing. In this paper, a big data group task allocation algorithm based on user topic accurate perception is proposed. In order to improve the accuracy of calculation, the topic of published group task is extracted by combining adaptive fuzzy clustering with topic extraction model, and then the specific group task model and user model are constructed, and the correlation degree is calculated. Then the historical tasks that have submitted high quality answers are used to iteratively detect the real topics of new users and calculate the initial accuracy. Secondly, the logical regression (LR) method is used to predict the possibility that the user can participate in a certain kind of task and get the candidate sequence of the participating user. Accurate allocation is carried out with a full understanding of the accuracy of the real topic and the corresponding topic of the user, as well as the credibility of the user. Finally, the experimental results show that the algorithm studied in this paper is more accurate, especially suitable for big data environment, and to a certain extent, it saves the cost of multiple repeated allocation of random algorithms to ensure the accuracy of the algorithm. (2) in the aspect of data processing algorithm technology, aiming at the efficiency requirement of massive data processing, this paper proposes an improved algorithm parallelization method based on cloud computing. The traditional algorithm can no longer meet the processing needs of big data. Among them, association analysis algorithm is one of the research hotspots of data processing technology. In this paper, the improvement of Apriori algorithm for association analysis mainly includes two parts: firstly, an improved method of Apriori algorithm based on matrix (M_Apriori) is proposed. The innovation of this method lies in the change of the way of constructing matrix and the calculation step. The algorithm uses the data structure based on matrix to store and process, and only needs to scan the database once, which reduces the overhead of database I 鈮,

本文编号:2480574

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2480574.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户68112***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com