基于链接分析的藏文Web社区发现算法研究
发布时间:2018-05-19 04:28
本文选题:Web社区 + 链接分析 ; 参考:《西北民族大学》2012年硕士论文
【摘要】:随着互联网技术的不断发展,以网页形式出现的藏族文化信息越来越多,近年来藏族网站正以惊人的速度在不断的增长,并且这些藏语Web数据具有数据量大,无组织等特征,如何有效、快速的从这些藏语Web中找出有用信息成为当前一个研究热点。研究发现,在这些庞大且复杂的Web中,存在着大量的社区,这些社区对研究社会热点等起着非常重要的作用。社区可以为用户提供及时的、有价值的信息,社区也可以反映Web中普遍存在的复杂的聚团关系和层次关系,同时对藏语社区的深入跟踪研究不仅可以及时了解藏族地区的文化的发展情况,还可以及时掌握藏族地区的社会发展态势。将Web社区发现算法应用于搜索引擎的开发,有助于提高Web信息搜索的精确度,进而为开发更优的搜索引擎提供基础。 Web页面的链接关系为Web社区发现研究提供了非常丰富的信息线索。链接分析是Web社区发现的关键技术之一。 本文在考察了当前藏语Web及其链接数据特征之后,通过对Web社区,链接分析技术等基础理论的分析,了解了基于链接分析的Web社区发现技术:链接凝聚算法和链接分裂算法。本文中我们重点研究了链接分裂算法中基于极值优化的社区发现算法,并找出了该社区发现算法中存在的问题:极值优化算法对初始网络划分的依赖性,以及模块度是否达到极大值的不确定性,算法中局部最优并不代表全局最优等不足。本文提出了基于分歧点的改进极值优化算法,该算法可以将社区间存在着分歧的网页进一步划分开来,消除了在模块度的极大值方面的不确定性,并且实现了在未知社区个数的网络中对社区进行划分,提高扩大了应用到了藏文网站中。经过大量实验表明,本文提出的改进算法能够进一步提高发现Web社区的质量,具有重要的理论和实际应用价值。
[Abstract]:With the continuous development of Internet technology, more and more Tibetan cultural information appears in the form of web pages. In recent years, Tibetan websites are growing at an alarming rate, and these Tibetan Web data have the characteristics of large amount of data, unorganized and so on. How to find useful information from these Tibetan Web effectively and quickly has become a hot research topic. It is found that there are a large number of communities in these large and complex Web, which play a very important role in the research of social hot spots. Communities can provide users with timely, valuable information, and communities can reflect complex clusters and hierarchical relationships that are prevalent in Web. At the same time, the further study of Tibetan language community can not only understand the development of Tibetan culture in time, but also grasp the social development situation of Tibetan area. The application of Web community discovery algorithm to the development of search engine will help to improve the accuracy of Web information search and provide the foundation for developing a better search engine. The link relationship of Web pages provides rich information clues for Web community discovery research. Link analysis is one of the key technologies found in the Web community. After investigating the features of the current Tibetan Web and its linked data, this paper analyzes the basic theories of Web community, link analysis technology and so on, and finds out the Web community discovery technology based on link analysis: link aggregation algorithm and link splitting algorithm. In this paper, we focus on the community discovery algorithm based on extremum optimization in the link splitting algorithm, and find out the problems in the community discovery algorithm: the dependence of the extreme value optimization algorithm on the initial network partition. As well as the uncertainty of whether the module degree reaches the maximum, the local optimization does not represent the global optimization. In this paper, an improved extremum optimization algorithm based on bifurcation points is proposed. The algorithm can further divide the web pages with different communities and eliminate the uncertainty in the maximum of module degree. It also realizes the division of communities in the network of unknown community numbers, and improves the application to Tibetan language websites. A large number of experiments show that the improved algorithm proposed in this paper can further improve the quality of Web community, and has important theoretical and practical application value.
【学位授予单位】:西北民族大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
【参考文献】
相关期刊论文 前8条
1 廖金波;论引文分析的由来与发展[J];高校图书馆工作;1999年03期
2 皮红;;网络链接分析研究的最新进展与发展趋势[J];湖南科技学院学报;2009年08期
3 王丹;刘发升;;复杂网络的社区发现算法研究[J];计算机时代;2009年03期
4 李峻金;向阳;牛鹏;刘丽明;芦英明;;一种新的复杂网络聚类算法[J];计算机应用研究;2010年06期
5 王莉军;杨炳儒;翟云;谢永红;;动态社区发现算法的研究进展[J];计算机应用研究;2011年09期
6 苏芳荔;;文献引文分析、网络链接分析和网络引文分析的比较[J];情报探索;2010年01期
7 杜海峰;李树茁;W.F.Marcus;悦中山;杨绪松;;小世界网络与无标度网络的社区结构研究[J];物理学报;2007年12期
8 何宇;赵洪利;杨海涛;赵东杰;;复杂网络演化研究综述[J];装备指挥技术学院学报;2011年01期
,本文编号:1908723
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1908723.html