电子邮件通信实体关系挖掘与分析研究
本文选题:社团划分 切入点:实体勾画 出处:《电子科技大学》2014年博士论文 论文类型:学位论文
【摘要】:为了适应网络数据快速增长的实体关系挖掘需要,电子邮件网络作为应用最广泛的通信网络之一,因其社会性明显、应用人群巨大、数据中隐含着现实的关系体系,其社会网络分析的研究日趋活跃。对电子邮件网络数据的社会结构进行划分呈现、未知链接的预测,是社会网络分析在网络数据实体关系挖掘中的重要内容,同时在电子商务、社交推荐等商业应用,反恐、犯罪侦查等业务方面具有广泛的应用前景。其中社团网络划分、链路预测则一直是研究的热点方向。面对大数据量的电子邮件通信实体关系挖掘,社团划分的效率、社团划分的准确性和链路预测的召回率和准确率问题成为实际应用的困扰。本文从现有社会网络分析的已知算法出发,针对电子邮件网络通信实体关系挖掘中的社团结构检测算法的准确性问题、计算效率问题,以及链路预测算法召回率和准确率问题进行了深入研究。论文的主要贡献如下:(1)提出了一个新的社团结构检测算法的测度模型。该模型针对模块度方法在划分结果稳定性方面存在的不足,基于信息中心度思想提出了一个新的测度模型,该模型通过对节点间关联度和节点的度进行加权,不仅能够准确识别聚类中心,而且为网络中节点间相似度计算提供了依据。据此进一步提出了一种新的社团划分算法(BSM算法),仿真实验和真实网络数据集上的实验结果表明,与模块度方法相比,该算法的稳定性和准确性更高,由此也证实了测度模型的有效性。(2)提出了一个适用于大规模复杂网络社团划分的快速算法模型。该算法模型的研究工作分为两步,首先针对鲁汶快速算法首轮迭代效率低的问题,通过引入剪枝策略,提出了一种改进算法(FLA算法)。然后针对鲁汶快速算法基于模块度优化思想,易于收敛到局部最优解的缺点,通过对优化模板函数进行改进,引入节点的度和边的权重等相关信息,在FLA算法的基础上,提出了一种新的CDDW算法。仿真实验和真实网络数据集上的实验结果表明,新的算法模型不仅能够大幅降低计算开销,而且能够提高整个网络社团划分结果的准确性。(3)提出了一种新型的链路预测集成学习算法模型。针对主流的链路预测算法普遍存在召回率和准确率较低的问题,提出了一种新颖的集成学习算法模型,将链路预测问题视为一个二元分类问题,利用Booting算法框架提供的误差反馈机制,设计实现了一个新的链路预测算法模型:AdaPred模型。为了进一步提高算法的准确率和召回率,提出了一种新的链路预测算法,并将其集成到AdaPred模型中。通过在论文协作网络和电子邮件网络等真实数据的实证研究,证明了AdaPred算法的预测准确率和召回率明显优于其他算法。(4)研发了一个电子邮件通信网络实体关系可视化分析系统。可视化技术有利于社会网络分析走向实际应用,将对该技术的普及产生深远影响。本论文以邮件网络中的实体关系挖掘为切入点,研发了一个面向应用的可视化分析平台。该平台所提供的数据分析能力与国际前沿水平看齐,具有良好的通用性和可扩展性。所研发的原型系统已通过第三方测试和国家863课题验收,验收考评结果为优秀。综上,本文对社会网络分析技术走向实际应用时面临的几类重要挑战性问题进行了针对性研究,并在此基础上设计实现了一个可视化分析系统原型,该研究成果为社会网络分析技术的推广应用提供了一个高效可行的解决方案。本文所采用的分析技术基于网络拓扑结构,而不依赖于更多的上下文信息,因此具有良好的可扩展性,能够推广到更广泛的社会网络数据分析应用场景。
[Abstract]:In order to increase the network data mining need to adapt to the entity relation network, email communication network as one of the most widely used, because of its obvious social application, huge population, data implies system reality, research and analysis of its social network is becoming more and more active. The social structure of email network data are divided into presentation, forecast unknown links, is the important content of social network analysis in the network data mining entity relationship, at the same time in electronic commerce, social recommendation and other commercial applications, counter terrorism, criminal investigation and other business and has wide application prospect. The community network division, link prediction has been the focus of research direction. In the face of a large amount of data e-mail communication entity relationship mining efficiency, community classification, community classification accuracy and link prediction precision and recall problems become real The application of problems. Starting from the analysis of the existing known algorithms of social networks, aiming at accuracy of community structure mining e-mail network communication entity relation detection algorithm in the calculation efficiency, and link prediction algorithm recall rate and accuracy rate were studied. The main contributions of this thesis are as follows: (1) put forward the measurement model a new community structure detection algorithm. This model is based on modularity method in the lack of stability of division results exist, the information center of the idea of a new measurement model based on the model of the correlation between nodes and nodes are weighted, not only can accurately identify the clustering center, and provides according to the similarity between the nodes in the network are calculated. Further proposes a new partitioning algorithm (BSM algorithm), simulation experiments and real data On the set of experimental results show that compared with the modularity method, the algorithm stability and higher accuracy, which also confirms the validity of the measurement model. (2) proposed a fast algorithm model for large-scale complex network community division. On the model of the algorithm are divided into two steps, first of all in Leuven the first round of iteration fast algorithm for the problem of low efficiency, by introducing the pruning strategy, proposed an improved algorithm (FLA algorithm). Then the Leuven fast algorithm based on modularity optimization, convergence to local optimal solution, based on the optimized template function is improved, and the weights of the edges and other related information into the node, based on the FLA algorithm, this paper proposes a new CDDW algorithm. The simulation results and the real network data sets. The experimental results show that the new algorithm model can not only greatly reduce the computational cost, and The accuracy and can enhance the network partition result. (3) proposed a new type of link prediction ensemble learning algorithm model. For link prediction algorithm mainstream widespread recall rate and low accuracy problem, this paper proposes a novel ensemble learning algorithm of the model, the link prediction problem as a a two element classification problem, error Booting algorithm using the framework provided by the feedback mechanism, the design and implementation of a new algorithm for link prediction models: AdaPred model. In order to further improve the accuracy and recall rate of the algorithm, we propose a new link prediction algorithm, and integrated into the AdaPred model. Through the empirical study on the real data collaboration network and e-mail network, AdaPred algorithm proves that the prediction accuracy rate and recall rate is better than other algorithms. (4) developed an email communication network The entity relationship analysis system. The visualization technology is conducive to social network analysis to practical application, will have a profound impact on the popularization of this technology. In this paper, the mail in the network entity relationship mining as the starting point, research and analysis platform of an application oriented visualization. The platform provides data analysis capabilities with the international advanced level in line with good universality and expansibility. The prototype system has been developed through the third party testing and the National 863 project acceptance, acceptance appraisal result is excellent. In conclusion, this paper researched the social network analysis technique into practical application faces several important challenges, and on this basis the design and Implementation of a visualization analysis system prototype, this research provides a feasible solution for the application of social network analysis. The analysis technology adopted in this paper is based on network topology without relying on more contextual information, so it has good scalability and can be extended to a wider application scenario of social network data analysis.
【学位授予单位】:电子科技大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:TP393.098
【相似文献】
相关期刊论文 前10条
1 赵淑萍;IP地址安全使用全攻略[J];华南金融电脑;2004年11期
2 杨鹏,赵博,王琨,周利华;利用Java技术实现SIP通信[J];计算机应用;2005年02期
3 陈业纲;李柳柏;徐则同;;利用JAINSIP构建SIP服务器[J];计算机时代;2006年11期
4 白岩;刘大有;;一种Agent通信中逻辑意外信息转换方法[J];计算机研究与发展;2007年03期
5 白岩;刘大有;刘杰;;一种移动Agent通信中本体信息调整方法[J];吉林大学学报(工学版);2007年05期
6 王汝传,王绍棣,孙知信,傅静;混合密码认证模型的研究[J];计算机学报;2002年11期
7 蒲志强;冯山;;基于移动IPv6的身份认证体系[J];绵阳师范学院学报;2007年11期
8 陈性元,李勇,潘正运,宋国文;选择认可动态逻辑[J];通信学报;2002年06期
9 ;协议[J];电子科技文摘;2002年11期
10 路而红;墨西哥新通信法规促进市场发展[J];通讯产品世界;1996年06期
相关会议论文 前1条
1 江义杰;杨晓晖;;用GPS仪表实现电信通信实体的地理信息定位[A];2005年安徽通信论文集[C];2006年
相关博士学位论文 前1条
1 吴祖峰;电子邮件通信实体关系挖掘与分析研究[D];电子科技大学;2014年
相关硕士学位论文 前1条
1 樊怡;高校通信实体经营模式的研究[D];兰州大学;2007年
,本文编号:1573521
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1573521.html