当前位置:主页 > 科技论文 > 数学论文 >

多层合作网络中的链接预测

发布时间:2018-05-29 03:50

  本文选题:社会网络分析(Social + network ; 参考:《北京交通大学》2017年硕士论文


【摘要】:对复杂网络及其特性的研究引起了各领域学者们的广泛关注。复杂网络是真实世界的网络的一种抽象表示。它们在本质上具有高度的动态性,并在不断地演化。此外,复杂网络最初的规模较小,但在大数据时代下,复杂网络的规模以惊人的速度在增长。研究和分析动态大型网络是网络科学家的一大挑战。现实世界的许多系统可以建模成角色之间的合作网络(网络中节点的角色可以是用户,作家,论文,项目,蛋白质等),且这样的网络是动态变化的。复杂网络也可以是在线社交网络,用来描述人与人之间的社交关系,比如朋友之间的交互关系。合作网络展示了一些业务关系(例如,学术合著或合作出版网络,产品联合采购网络等)。生物系统(例如蛋白质相互作用网络)或计算机科学网络(例如互联网和对等网络)等,也是复杂网络的一种。这些系统用图中的节点来表示角色,节点之间的链接表示角色之间的各种相互作用、合作或影响。所有类型的复杂网络都有一些公共的拓扑性质,如直径或平均距离较小、节点度分布服从幂率分布、高聚集系数以及社区结构等。近年来,大量的研究工作都集中于研究复杂系统中单层网络的特性。但是,研究证明,对于大型复杂系统,例如交通网络、社交网络等,由于它们十分庞大、结构复杂多样,如果只研究其中的单层网络的话,将很难捕捉到复杂系统中的链接模式。人们自然会想到将复杂系统中的多层网络融合在一起进行研究。但是如果简单的将多层网络进行叠加,在融合后的网络上做链接预测的话,将失去很多有用的信息。因此,对于复杂系统,有必要考虑其网络的多层次结构,不能仅从单层结构的角度,研究其网络链接的特点。我们需要同时考虑多层网络结构,发现其中的链接模式的规律。因此,在本研究中,为了更好地捕捉复杂网络中的特点,我们构建了一个多层网络模型,这个多层的网络模型同时考虑了两个网络的特点,并基于此提出了一种多层网络模型中的链路预测方法。多复杂性(multiplexity)理论最初来源于社交网络。在社交网络分析中,人与人之间存在着多种多样的关系,由于关系类型的不同,他们之间的交互行为也会有所不同。例如,在社交网络中,人与人之间的关系可以是亲密的朋友关系、邻居关系,也可以是同事关系等。在社交网络分析中,两个人之间的多重关系被称为关系的多样性。事实上,不仅在社交网络分析中存在关系的多样性这一特点,现实世界的其他复杂系统的实体之间也存在着关系多样性的特点。本研究的核心问题是链接预测,链接预测是指预测两个节点之间的链接是否存在,它在复杂网络分析中具有重要意义,也是图挖掘的一个重要方向。一些研究者在该领域已经做出了许多贡献,但直到21世纪初,研究人员将机器学习和数据挖掘应用于此领域后,才得到了如何通过高关联数据的结构性质共同探索提取知识的方法。传统的机器学习方法的缺点是无法准确理解和利用实体之间的关联信息,而这种新的方法充分利用了实体之间的关系的数据。在图这类特殊的数据结构上进行数据挖掘时,图中的边会提供实体间的关联关系,所以当对实体间的关系进行挖掘时,图挖掘的相关方法将很有效。预测网络中新的链接关系即所谓的“未来链接预测”的问题是指通过研究一段时间内网络中链接的出现或消失的记录,预测未来将要出现的链接关系的问题。链接预测在不同领域有着广泛的应用,如向你推荐社交网站中的朋友、识别隐藏的犯罪关系。在医学和生物学领域,找到合适且预测精度足够高的方法去指导实验,不但缩小了实验范围,而且还能提高实验的成功率,在很大程度上降低了实验成本,节省了大量的时间和人力。另外,链路预测的研究与网络演化机制的问题有着密切的联系。了解网络的演化机制,得到其变化的规律,这里所蕴含的规律很可能是复杂网络中链接形成的主要推动力。近年来学者们提出了多种链路预测模型。大多数现有方法只考虑由单层组成的简单网络,其中所有链接都是相同的类型。然而,现实中许多网络通常都是异构的,它们涉及不同类型的链接和节点。例如,关注科学家之间的相互作用,可以定义不同类型的链接:如果两个科学家共同出版了一些研究论文,或者如果他们在同一个会议上发表了他们的文章,或他们的研究领域相同;如果他们在自己的文章中引用了其他科学家的作品,那么他们也可以联系起来。共同作者网络或科学家网络可以通过多层网络更好地建模,异构链接信息可以非常好地用于改进链接预测的结果。合作网络,特别是科学家的合作网络,有各种丰富的信息,可用于各类网络分析任务的研究,如链路预测,社区检测,节点识别等。另外该网络中由于存在不同种类的链接信息,这些网络已被用于研究复杂网络的异构性质。所以,在科学家合作网络这个复杂的大型系统中,研究如何利用多层网络的信息来预测未来潜在的链接关系是有非常意义的。因此,在这项工作中,我们将研究利用多层网络预测某个单层网络中的未来可能产生的链接,充分利用了多层网络中丰富的特征和信息。在我们构建的科学家合作网络中,一个层次上的链接代表两个作者在某个期刊上的合作关系,这样的预测对于发现潜在合作关系进行合作者推荐是有潜在价值的。从广义的多层网络链接预测来讲,我们所给出的方法也具有很好的可推广性,因为它在概念上和结构上都与许多实际的网络分析问题相关。本文主要在科学家合作网络上评估了我们的链接预测方法,数据来自APS(美国物理学会)数据集。链接预测是一个二分类问题。机器学习中分类算法有很多,这里结合我们的数据和实验结果,我们选择了逻辑回归的方法进行链接预测。逻辑回归属于有监督学习算法。在有监督学习中,每个例子都是由特征向量和对应输出标签组成。在我们的问题中,输入就是对节点的一系列特征,包括单层网络上的特征和多层网络特征,输出就是这对节点在未来的一段时间内是否存在链接关系。我们的创新点在于,不同于传统的单层网络的链接预测问题,我们构建了基于多层网络的特征集合,即充分的利用了两个节点在不同网络中的多种关系的信息。具体来讲,我们从数据集中提取相关信息,利用一个特定领域的论文数据构建了三个单层网络。其中两个单层网络的是由2000-2004年来自两个期刊的数据生成的网络,另外一个单层网络是由2005-2009年其中一个期刊的数据生成的网络。我们目标是预测2005-2009年间期刊“PHYSICAL REVIEW LETTERS”上可能出现的科学家之间的合作关系。有别于普通的单层网络上的链接预测问题,为了充分利用科学家之间的交互关系,我们构建了多层网络,该多层网络的第一层网络是2000-2004年,期刊“PHYSICAL REVIEW LETTERS”上的科学家之间合作的信息;第二层网络是同一时间段,即2000-2004年,期刊“PHYSICALREVIEWE”上的科学家之间合作的信息。我们用这个多层网络上的数据来预测2005-2009年,期刊“PHYSICAL REVIEW LETTERS”上的科学家之间的合作关系。在我们提出的基于多层网络的链接预测模型中,我们主要构建了链接的两类特征属性。第一类属性是基本的单层网络特征,第二组属性是基于多层网络的复杂属性,并将我们设计的方法应用于上述数据集中。为了证明我们提出的多层链接预测模型的有效性,我们设计了对比实验。对比实验中,我们仅用期刊“PHYSICAL REVIEW LETTERS”在2000-2004年的科学家合作网络中的链接信息来预测该期刊2005-2009年的科学家合作网络中的链接,即用传统的单层网络链接预测方法。实验结果表明,我们提出的基于多层网络的链接预测模型,即同时用了 2000-2004 年的“PHYSICAL REVIEW LETTERS”和“PHYSICAL REVIEWE”两个期刊的信息,丰富了两个合作者之间的信息,较传统的单层网络上的链接预测方法有更高的准确率。特别是在评价指标AUC这一项上,我们方法的平均值比基准方法的平均值有很大的提升。当然,除了 AUC指标外,F1指标、准确率、召回率这些方面,我们的方法的结果也有提升。
[Abstract]:The study of complex networks and their characteristics has aroused wide attention of scholars in various fields. Complex networks are an abstract representation of the real world network. They are highly dynamic and evolving in essence. In addition, complex networks are initially small in size, but in the large data age, the scale of complex networks is astonishing. The research and analysis of dynamic large networks is a major challenge for network scientists. Many systems in the real world can be modeled as a cooperative network between roles (the roles of nodes in the network can be users, writers, papers, projects, proteins, etc.), and such networks are dynamic. Complex networks can also be online. Social networks, used to describe social relationships between people, such as the interaction between friends. The cooperative network shows some business relationships (such as academic co authored or cooperative publishing networks, product joint procurement networks, etc.). Biological systems (such as protein interaction networks) or computer science networks (such as the Internet and peer-to-peer networks) Collaterals, etc., it is also a kind of complex network. These systems use the nodes in the graph to represent the roles, the links between the nodes represent the various interactions, cooperation or influence between the roles. All types of complex networks have some common topological properties, such as the smaller diameter or the average distance, the node degree distribution obeys the power rate distribution, and the high aggregation set. In recent years, a lot of research work has been focused on the study of the characteristics of single layer networks in complex systems. However, research has proved that it is difficult to capture complex systems, such as traffic networks, social networks, and so on, because they are very large and complex and complex. It is natural to think of the integration of multi-layer networks in complex systems. But if a simple multi-layer network is superposed and linked in the merged network, a lot of useful information will be lost. Therefore, it is necessary to consider the multi-level network for complex systems. Structure can not only study the characteristics of its network links from a single layer structure. We need to consider the multi-layer network structure at the same time and discover the rules of the link pattern. In this study, in order to better capture the characteristics of the complex network, we build a multi-layer network model, which is tested at the same time. The characteristics of two networks are considered, and based on this, a link prediction method in a multilayer network model is proposed. The multiplexity theory is originally derived from social networks. In social network analysis, there are a variety of relationships between people and people. Because of the different types of relationships, the interaction between them will not be possible. In social networks, for example, the relationship between people can be a close friend, a neighbor, or a colleague relationship. In the social network analysis, the multiple relationships between the two people are called the diversity of the relationship. In fact, there is not only the diversity of relationships in the social network analysis, but the real world. The core problem of the other complex systems is the relationship diversity. The core problem of this study is link prediction. Link prediction is the prediction of whether the links between the two nodes exist. It is important in the complex network analysis and is an important direction of the graph mining. Some researchers have already made it in this field. A lot of contributions were made, but until the early twenty-first Century, when researchers applied machine learning and data mining to this field, the method of exploring how to extract knowledge through the structural properties of high related data was obtained. The disadvantage of the traditional machine learning method is that it can not accurately understand and utilize the Association information between entities, and this new method is new. The method makes full use of the data between entities. When data mining is carried out on a special data structure such as graphs, the edges of the graph will provide the relationship between entities, so when the relationship between entities is excavated, the related methods of graph mining will be very effective. The problem of prediction is to predict the link relationship that will occur in the future by studying the occurrence or disappearance of links in the network for a period of time. Link prediction is widely used in different fields, such as recommending friends in social networking sites to you, identifying hidden criminal relationships. Finding a combination in the medical and biological fields. The method is suitable to guide the experiment with high accuracy, which not only reduces the scope of the experiment, but also improves the success rate of the experiment, reduces the cost of the experiment to a great extent, saves a lot of time and manpower. In addition, the link prediction research is closely related to the problem of network evolution mechanism. In recent years, scholars have proposed a variety of link prediction models. Most existing methods only consider simple networks composed of monolayers, all of which are the same types. However, many networks are usually in reality. They are heterogeneous, and they involve different types of links and nodes. For example, focusing on the interaction between scientists can define different types of links: if two scientists publish some research papers together, or if they publish their articles at the same meeting, or their research fields are the same; if they are in the same field, In his article, the works of other scientists are cited, so they can also be linked. The co author network or the scientist network can be better modeled through a multi-layer network, and the heterogeneous link information can be used to improve the results of the link prediction. Information can be used in the research of various network analysis tasks, such as link prediction, community detection, and node recognition. In addition, the network has been used to study the heterogeneous nature of complex networks because of the existence of different types of link information. So, in the complex large-scale system of the scientist cooperative network, the study of how to use multi layers is studied. The information of the network is very meaningful to predict the potential link relationship in the future. In this work, we will study the future potential links in the prediction of a single layer network using multi-layer networks, making full use of the rich features and information in the multi-layer network. The next link represents a cooperative relationship between two authors in a periodical. This prediction is of potential value for the collaborator recommendation to find potential cooperative relationships. From the generalized multi layer network link prediction, the method we give is also very good, because it is both conceptually and structurally. This paper evaluates our link prediction method mainly on the scientist cooperation network. The data comes from the APS (American Physics Society) data set. Link prediction is a two classification problem. There are many classification algorithms in machine learning. Here we combine our data and experimental results, and we choose logical regression. In the supervised learning, each example is composed of the feature vector and the corresponding output label. In our problem, the input is a series of features of the node, including the features on the single layer network and the multi layer network features, and the output is the node in the future. Whether there is a link relationship for a period of time, our innovation is that, unlike the traditional single layer network link prediction problem, we build a feature set based on the multi-layer network, that is, to fully utilize the information of the multiple relationships between the two nodes in different networks. Three single layer networks are constructed using a specific field of paper data. Of which two single layer networks are generated by 2000-2004 years of data from two periodicals, and one single layer network is a network generated by one of the periodicals for 2005-2009 years. The goal is to predict the 2005-2009 years' Journal "PHYSICAL REVIE. The possible collaboration between scientists on W LETTERS is different from the link prediction problem on the ordinary single layer network. In order to make full use of the interaction between scientists, we build a multi-layer network. The first layer of the multilayer network is 2000-2004 years, between scientists on "PHYSICAL REVIEW LETTERS". Information about cooperation; the second layer network is the information about the cooperation between scientists on the same time period, 2000-2004 years, the journal "PHYSICALREVIEWE". We use the data on this multi-layer network to predict the cooperation between the scientists on the periodical "PHYSICAL REVIEW LETTERS" in 2005-2009 years. In the link prediction model, we mainly build two types of characteristic attributes of the link. The first class attribute is the basic single layer network feature, the second attributes are based on the complex attributes of the multilayer network, and apply the method we design to the above data set. Contrastive experiments are designed. In contrast, we use the link information of the journal "PHYSICAL REVIEW LETTERS" to predict the link in the scientist's cooperative network for the 2005-2009 year of the journal, that is, the traditional single layer network link prediction method. The link prediction model of the layer network, that is, uses the information of two periodicals of "PHYSICAL REVIEW LETTERS" and "PHYSICAL REVIEWE", enriches the information between two collaborators, and has a higher accuracy rate than the traditional single layer network link prediction method, especially on the evaluation index AUC. The average value of the method has been greatly improved than the average of the benchmark method. In addition to the AUC index, the F1 index, the accuracy rate, the recall rate, and the results of our method have also improved.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:O157.5

【相似文献】

相关会议论文 前1条

1 时国华;周斌;韩毅;;一种微博事件源头发现的方法[A];第27次全国计算机安全学术交流会论文集[C];2012年

相关博士学位论文 前1条

1 张鑫;复杂网络中社区发现方法研究[D];哈尔滨工业大学;2017年

相关硕士学位论文 前1条

1 WALEED JAMIL;多层合作网络中的链接预测[D];北京交通大学;2017年



本文编号:1949374

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/yysx/1949374.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户57fa4***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com