基于蚁群算法的WEB日志用户兴趣路径研究

发布时间：2018-06-04 20:34

本文选题：web日志挖掘 + 蚁群算法　；参考：《江西师范大学》2015年硕士论文

【摘要】：随着信息技术的高速发展,互联网信息呈指数级增长,各类网站如雨后春笋一般活跃在我们的生活中。对于网络服务提供商而言,如何在海量的服务器日志文件中挖掘出用户的兴趣模式,已经成为许多公司的研究热点。利用蚁群算法对web用户群体的访问数据进行挖掘时,把web用户看成是人工的蚂蚁,算法模型展示的蚂蚁偏好路径,就是用户的兴趣路径,根据web用户的兴趣路径,可以适应不同客户需求,优化网站拓扑结构,改善系统运行效率,调整相应营销策略,提高决策的科学性,从而为企业创造更大的商业价值。实践与分析表明,优化的蚁群算法挖掘web用户兴趣路径相比于MF等传统算法具有更高的准确率,在web日志数据挖掘领域有更好的推广应用价值。本文的研究基于蚁群算法的web日志用户兴趣路径展开,主要工作如下:(1)利用web用户兴趣路径与蚁群觅食行为的相似性,提出了“浏览兴趣度”的新概念,用于反应web用户访问站点的兴趣程度,并使用浏览兴趣度给出了基于蚁群算法的web日志用户兴趣路径挖掘方法。即通过对信息素进行全局更新,计算转移概率,设置转移概率阈值等步骤最终得到用户兴趣路径。实验表明该方法是可行的,能够准确反应用户兴趣路径。(2)通过对客户端日志数据的预处理,从日志数据中获取鼠标滚动相对次数、用户相对浏览时间以及页面点击率等能够反映用户兴趣的关键信息。其中鼠标滚动相对次数表示用户浏览web网页时的鼠标滚动次数占所有页面鼠标滚动次数的比例,用户相对浏览时间是指页面总的浏览时间占所有页面平均访问时间的比例,页面点击率表示页面访问的次数占所有页面访问次数的比例,实验表明,三者构成的信息素能够更准确地代表用户对该站点的兴趣度。(3)蚁群算法中?、?决定了信息素浓度和意向偏爱度的影响程度,本文重新定义了?、?,通过?、?的自适应调节对启发函数进行调整,新定义不仅可以加速算法的收敛速度,而且可以避免局部收敛和早熟,实验证明,该方法提高了算法的搜索效率。
[Abstract]:With the rapid development of information technology, Internet information is growing exponentially. For the network service provider, how to mine the user's interest pattern in the massive server log file has become the research hotspot of many companies. When using ant colony algorithm to mine the access data of web user group, the web user is regarded as an artificial ant. The ant preference path shown in the algorithm model is the user's interest path, according to the interest path of web user. It can meet the needs of different customers, optimize the topological structure of the website, improve the operating efficiency of the system, adjust the corresponding marketing strategy, and improve the scientific nature of the decision, thus creating greater commercial value for the enterprise. Practice and analysis show that the optimized ant colony algorithm has a higher accuracy rate than traditional algorithms such as MF in mining web user interest paths and has a better application value in the field of web log data mining. In this paper, the web log user interest path based on ant colony algorithm is developed. The main work is as follows: 1) based on the similarity between web user interest path and ant colony foraging behavior, a new concept of "browsing interest degree" is proposed. It is used to reflect the degree of interest of web users to visit the site, and the method of web log user interest path mining based on ant colony algorithm is presented by using browsing interest degree. In other words, the path of user interest is obtained by updating pheromone globally, calculating transition probability and setting threshold of transition probability. Experiments show that this method is feasible and can accurately reflect the path of user interest. By preprocessing the client log data, the relative times of mouse scrolling can be obtained from the log data. User relative browsing time and page click rate can reflect the key information of user interest. The relative number of mouse scrolling means that the number of times of mouse scrolling is the proportion of the number of times of mouse scrolling of all pages when users browse web pages, and the relative browsing time of users refers to the proportion of the total browsing time of the page to the average visit time of all pages. Page click rate indicates that the number of page visits is the proportion of all page visits. The experiment shows that the pheromone formed by the three can more accurately represent the user's interest in the site. This paper redefines the effect of pheromone concentration and intention preference. The new definition can not only accelerate the convergence speed of the algorithm, but also avoid local convergence and premature convergence. Experimental results show that the new method improves the search efficiency of the algorithm.
【学位授予单位】：江西师范大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TP18;TP311.13

【相似文献】