网络化多主体系统中的社会规范涌现机制研究

发布时间：2018-06-19 02:39

本文选题：复杂网络 + 演化博弈　；参考：《大连理工大学》2016年博士论文

【摘要】：社会规范分为协同规范和合作规范,其对于维持网络化多主体系统的秩序和运行效率起着极其重要的作用,目前研究中面临的挑战是如何在特定的场景中快速有效的建立起社会规范。在网络化多主体系统中,由于参与主体具有自私性和推理学习能力,他们会根据在系统中获得的收益以及获取的其他局部信息(如邻居的行为和收益等)来不断调整自己的行为进而优化自己的收益,自下而上的涌现方式成为在多主体系统中建立社会规范的有效途径。然而,在不同的网络结构和博弈冲突模型下,不同的涌现机制会给主体提供不同的诱导信息,从而对社会规范的涌现结果产生非常大的影响。因此,针对不同的网络结构和博弈场景设计合适的社会规范涌现机制是本文研究的核心科学问题。在对已有研究工作进行总结的基础上,本文针对环形网络上的协同博弈、静态网络上的囚徒困境博弈、移动网络上的囚徒困境博弈以及网络重复囚徒博弈四个问题场景,分别从策略更新规则、博弈矩阵调节机制和动态网络构建机制三个纬度提出了相应的社会规范涌现机制,并分别分析了涌现过程的微观机理,主要贡献在于：1).环形网络因为其直径最大,所以最容易产生局部协同规范。由于现有的近视最优反应规则(MBR)、最高累积奖励规则、Q学习规则等多种机制都不能有效地促进环形网络上全局规范的涌现,本文提出了带有冰冻期的最优反应规则(FBR)。个体在新采纳一个新行为之后进入冰冻期,冰冻期内的个体以很大概率延续上一时刻的行为,以极小的概率采用传统的MBR规则进行策略更新。仿真结果和微观分析表明,FBR规则可以使局部规范间的界面由随机游走过程变成有偏随机游走,从而使得界面的扩散率更高,在适中的冰冻期长度下能够更快地促进环形网络上的全局协同规范的涌现。2).以往的研究工作提出了多种策略更新规则和博弈矩阵调节机制来促进静态网络上的合作规范的涌现,这些机制虽然可以使合作者在较大的参数空间内在系统中存活,但通常无法保证全局合作规范的涌现。针对这一问题,我们提出了空间扩展的费米更新规则(N-FUR)和多博弈矩阵调节机制(MG)。在N-FUR规则中,个体使用学习对象的收益和学习对象邻居的平均收益的加权和作为学习对象的适应度,这使得很小的合作者团簇能够在背叛者的海洋中存活并扩张,从而提升全局合作规范涌现的临界值。MG机制通过给系统中等比例的个体分配具有正值和负值的愚者收益(S)的博弈矩阵,可以首先提升采用正S矩阵的子群体中的合作水平,然后通过正S矩阵到负S矩阵的子群体的不对称策略模仿流来提升群体的合作比例和社会总收益。3).由于移动网络中不合理的移动规则容易破坏合作者的团簇,使得背叛者可以容易地入侵合作者团簇,从而不利于合作规范的涌现。针对这一问题,本文提出了度相关的矢量平均移动规则(DVAM)。在该规则中,个体利用邻居移动方向的加权平均值来更新自己的移动方向,其中度大的邻居被赋予较大的权重。仿真和分析结果表明,该机制可以促进系统更快的形成大合作团簇,从而抵御背叛者的入侵,DVAM规则能够比传统的随机移动和矢量平均移动规则更有效地促进移动网络上的合作规范的涌现。4).网络重复博弈中个体的策略数会指数级增加,使得现有的基于模仿的策略更新规则在该场景下会赋予个体太强的观察和推理能力而不适用,而基于期望的学习规则和全局极值更新规则又不能促进该场景下合作规范的涌现。针对这一问题,本文提出了局部化极值更新规则(LEUR)。在LEUR规则中,个体只和邻居比较收益,当自身收益在邻域中最小时将其策略更新为随机选择的新策略。仿真结果和微观分析表明,当邻域半径为2时,系统中活跃个体数量和这些个体形成的小团簇数量最多,从而使得系统可以演化到平均收益最高的以牙还牙策略(TFT)状态。
[Abstract]:Social norms are divided into cooperative norms and cooperation norms, which play an extremely important role in maintaining the order and efficiency of the networked multi-agent system. The challenge facing the present study is how to establish social norms quickly and effectively in a specific scene. In the network multi-agent system, the participants are selfish. And reasoning learning ability, they will adjust their behavior and then optimize their earnings according to the benefits obtained in the system and other local information (such as the behavior and income of the neighbours). The bottom-up emergence is an effective way to establish social norms in the multi-agent system. However, in different networks, they are in different networks. Under the complex structure and game conflict model, different emergence mechanisms provide different guidance information to the subject, which has a great influence on the emergence of social norms. Therefore, the design of appropriate social norm emergence mechanism for different network structures and game scenarios is the core scientific problem in this paper. On the basis of the summary of the work, this paper aims at the cooperative game on the ring network, the prisoner's dilemma game on the static network, the prisoner's dilemma game on the mobile network and the network repeated prisoner game four problems, which are respectively proposed from the policy updating rule, the game matrix regulator system and the dynamic network construction mechanism at three latitudes, respectively. The emergence mechanism of social norms should be introduced, and the micro mechanism of emergence process is analyzed respectively. The main contributions are as follows: 1) the ring network is most likely to produce local synergistic specifications because of its largest diameter. Due to the existing optimal response rules (MBR), the highest cumulative reward rules, and the Q learning rules, many mechanisms can not be effectively promoted. An optimal response rule (FBR) with frozen period is proposed in this paper. The individual enters the freezing period after a new behavior, and the individuals in the frozen period continue the behavior at the last moment in a large probability, and use the traditional MBR rules to update the strategy with a minimal probability. The analysis shows that the FBR rule can make the interface between the local specification from random walk process into biased random walk, which makes the diffusion rate of the interface higher, and can accelerate the emergence of the global cooperative specification on the annular network faster in the moderate freezing length. The previous research work puts forward a variety of policy updating rules and games. The mechanism of matrix adjustment promotes the emergence of cooperation specifications on static networks. Although these mechanisms can enable the collaborators to survive in a larger parameter space, they are often unable to guarantee the emergence of global cooperation specifications. In this case, we propose a spatial extended Fermi update rule (N-FUR) and a multi game matrix regulator. MG. In the N-FUR rule, the individual uses the income of the learning object and the weight of the average income of the learning object neighbours and the fitness of the learning object. This makes the small cooperative cluster can survive and expand in the sea of the Betrayer, thus enhancing the critical value.MG mechanism of the global cooperation rule by giving the system the same ratio. The individual distribution of the game matrix with positive and negative value of the fool's income (S) can first raise the level of cooperation in the subgroups of the positive S matrix, and then imitate the flow through the asymmetric strategy of the positive S matrix to the negative S matrix to enhance the cooperative proportion of the group and the total social income.3). Mobile rules can easily destroy the cluster of collaborators, making it easy for the Betrayer to invade the cluster of collaborators, which is not conducive to the emergence of the cooperation specification. In this paper, this paper proposes a degree related vector average mobility rule (DVAM). In this rule, the individual uses the weighted average of the neighbor's moving direction to update its own movement. In the direction, the big neighbors are given large weights. The simulation and analysis results show that the mechanism can promote the system to form large cooperative clusters faster and resist the invading of the Betrayer, and the DVAM rule can more effectively promote the emergence of.4 on the mobile network than the traditional random movement and vector average movement rules. The number of individual strategies in the network repeated game increases exponentially, making the existing imitation based policy updating rules endow individuals with too strong ability to observe and reasoning, but the expectation based learning rules and global extremum updating rules do not promote the emergence of cooperation norms in this scenario. In this paper, the localization extremum updating rule (LEUR) is proposed. In the LEUR rule, the individual is only compared with the neighbor, and the strategy is updated to a new strategy when its own income is at the nearest neighborhood. The simulation results and microanalysis show that when the neighborhood radius is 2, the number of active individuals in the system and the small formation of these individuals are small. The number of clusters is the largest, so that the system can evolve to the TFT with the highest average returns.
【学位授予单位】：大连理工大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：O157.5;O225

【相似文献】