当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于混合蛙跳算法的Web文本聚类研究

发布时间:2018-05-07 11:54

  本文选题:Web文本聚类 + 混合蛙跳算法 ; 参考:《江南大学》2013年硕士论文


【摘要】:随着互联网技术的迅速普及和不断发展,网页上的文本信息在爆炸性的增长。如何对互联网上的信息进行有效的挖掘成为计算机科学领域所面临的一个巨大挑战。人们急需从大量的Web资源中快速、准确、有效地获取感兴趣的知识。文本聚类技术的出现为海量文本信息的分类管理及可视化研究提供了一条有效的途径。文本聚类作为信息过滤、信息检索、搜索引擎、文本数据库、数字化图书馆等领域的技术基础,获得了广泛的应用和发展。由于Web文本数据的海量、高维、动态以及不可预测性,基于Web的聚类研究已逐渐成为了新的热点。 论文把重点放在Web文本聚类算法的研究上,K-means(K均值)和FCM(模糊C均值)是聚类中基于划分的算法,由于其简单、快速和有效,被广泛应用于Web文本聚类中,但在应用过程中这些算法常常会在求解过程中陷入局部极小值,而且对初始值敏感。论文研究混合蛙跳算法在Web文本聚类中的应用,通过将混合蛙跳算法分别与K-means和FCM相结合,一定程度上解决了这两种聚类算法易陷入局部极小值和对初始值敏感的问题,提高了这两种算法的收敛精度。 论文首先对文本聚类技术的概念、特点和应用领域进行了介绍,对几种经典的聚类方法的实现方式进行了重点的描述,并分析了它们的优势和不足之处。 其次,详细的介绍了混合蛙跳算法,针对传统混合蛙跳算法的不足,提出一种改进的混合蛙跳算法,,它通过混沌搜索优化初始解,变异操作生成新个体,并设计了一种新的搜索策略,有效的提高了算法寻优能力。 最后,将改进的混合蛙跳算法分别与K-means和FCM相结合。在基于混合蛙跳的K-means算法中,根据青蛙群体的适应度方差来确定K-means算法的操作时机,抑制早熟收敛,用UCI数据集和随机产生的数据来验证其有效性。在基于混合蛙跳的FCM算法中,使用混合蛙跳算法的优化过程代替FCM的基于梯度下降的迭代过程,提高了算法全局寻优能力,通过实际语料库的测试结果比较,改进的算法提高了聚类精度,在全局寻优能力方面具有优势。
[Abstract]:With the rapid popularization and development of Internet technology, text information on web pages is increasing explosively. How to effectively mine the information on the Internet has become a great challenge in the field of computer science. There is an urgent need to quickly, accurately and effectively acquire interesting knowledge from a large number of Web resources. The emergence of text clustering technology provides an effective way for the classification management and visualization of massive text information. Text clustering, as the technical foundation of information filtering, information retrieval, search engine, text database, digital library and so on, has been widely used and developed. Because of the huge volume, high dimension, dynamic and unpredictability of Web text data, clustering based on Web has gradually become a new hot spot. In this paper, we focus on the research of Web text clustering algorithm (K-Means K mean) and FCM (Fuzzy C mean) are partition-based algorithms in clustering. Because of their simplicity, fast and efficiency, they are widely used in Web text clustering. However, in the process of application, these algorithms often fall into local minima and are sensitive to initial values. In this paper, the application of hybrid leapfrog algorithm in Web text clustering is studied. By combining the hybrid leapfrog algorithm with K-means and FCM, the problem that these two clustering algorithms are prone to fall into local minima and are sensitive to initial value is solved to a certain extent. The convergence accuracy of these two algorithms is improved. Firstly, the paper introduces the concept, characteristics and application of text clustering technology, describes the implementation of several classical clustering methods, and analyzes their advantages and disadvantages. Secondly, the hybrid leapfrog algorithm is introduced in detail. In view of the shortcomings of the traditional hybrid leapfrog algorithm, an improved hybrid leapfrog algorithm is proposed, which optimizes the initial solution by chaotic search and generates a new individual by mutation operation. A new search strategy is designed to effectively improve the ability of algorithm optimization. Finally, the improved hybrid leapfrog algorithm is combined with K-means and FCM, respectively. In the K-means algorithm based on mixed leapfrog, the timing of K-means algorithm is determined according to the variance of frog population fitness, and the precocious convergence is restrained. The validity of K-means algorithm is verified by UCI data set and randomly generated data. In the FCM algorithm based on hybrid leapfrog, the optimization process of hybrid leapfrog algorithm is used to replace the gradient descent iterative process of FCM, which improves the global optimization ability of the algorithm. The test results of the actual corpus are compared. The improved algorithm improves the clustering accuracy and has the advantage in global optimization.
【学位授予单位】:江南大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.1

【参考文献】

相关期刊论文 前10条

1 陈云飞,刘玉树,钱越英,赵基海;一种基于密度的启发性群体智能聚类算法[J];北京理工大学学报;2005年01期

2 朱靖波,姚天顺;中文信息自动抽取[J];东北大学学报;1998年01期

3 薛丽萍;尹俊勋;周家锐;纪震;;混合粒子对优化算法在说话人识别中的应用[J];电子与信息学报;2009年06期

4 王辉;钱锋;;群体智能优化算法[J];化工自动化及仪表;2007年05期

5 曹晓辛,李柠,黄道;基于蚁群聚类算法的模糊神经网络[J];华东理工大学学报(自然科学版);2005年02期

6 吴斌,傅伟鹏,郑毅,刘少辉,史忠植;一种基于群体智能的Web文档聚类算法[J];计算机研究与发展;2002年11期

7 高知新;李铁克;苏志雄;;Memetic算法在板坯排序中的应用[J];计算机工程与应用;2009年19期

8 孟庆莹;王联国;;基于邻域正交交叉算子的混合蛙跳算法[J];计算机工程与应用;2011年36期

9 沈达阳;孙茂松;;万维网知识挖掘方法的研究[J];计算机科学;2000年02期

10 王敞;陈增强;袁著祉;;基于遗传算法的K均值聚类分析[J];计算机科学;2003年02期

相关硕士学位论文 前2条

1 曲建华;Web上的信息过滤问题研究[D];山东师范大学;2003年

2 赵鹏军;优化问题的几种智能算法[D];西安电子科技大学;2009年



本文编号:1856817

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1856817.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户fada5***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com