基于特征提取的网络测量数据集构建方法研究

发布时间:2018-01-13 01:31

  本文关键词:基于特征提取的网络测量数据集构建方法研究 出处:《新疆大学》2017年硕士论文 论文类型:学位论文


  更多相关文章: 流氓证书(Rogue Certificate) 数据集构建 特征提取 Isomap


【摘要】:互联网的出现与普及,给人们带了极大便利,同时也让我们承担着来自互联网的威胁和被欺诈的风险。近年来,流氓证书被恶意颁发的事件时常发生,如果流氓证书被不法分子获取并部署至钓鱼/欺诈网站之中,用户个人信息被窃取的风险会大大增加,会造成个人财产的损失以及相关企业信誉度的下降。现阶段流氓证书主要是凭借人工来进行识别的,实现流氓证书自动识别是十分必要的。针对流氓证书难以识别,以及暂时缺乏有效的流氓证书数据集的现状,本文以流氓证书为研究对象,主要完成以下三方面的工作:(1)合作研究并构建流氓证书原始数据集:以构建流氓证书数据集为目标,结合网络测量获得的真实数字证书数据和Frankencert工具生成的流氓证书仿真数据,通过小组探讨对数字证书的字段以及流氓证书的特点进行调研分析,以数字证书的字段和流氓证书特点为基础来确定流氓证书的特征字段,通过去除异常证书等预处理工作,结合基本指标构建了37维的原始流氓证书数据集(73万样本量)。(2)改进特征提取算法并构建新的指标模型:根据传统Isomap算法不足,提出了改进后的算法MM-Isomap;算法着重增加对样本点类别的考量问题,即通过缩小类内距离,扩大类间距离的方法来提升分类的效果。论文通过准确度与流氓证书识别的精准度、召回率以及F值作为评估指标,进行了算法最优参数选择和算法效果评估。通过应用于流氓证书原始数据集的,得到特征提取后18维的流氓证书指标属性模型。(3)验证指标属性模型的有效性并合作构建开放数据集:验证有效性方面进行了两部分实验,一是应用向量机(SVM)、J4.8决策树以及BP神经网络三种分类算法对流氓证书原始数据集的有效性进行了评估;二是评估了特征提取后新指标模型的有效性。同时结合小组另一名同学特征选择的工作,一同构建了“特征选择(22维)+特征提取(18维)“后的流氓证书开放数据集,这为进一步展开流氓证书的研究,提供基础数据集支撑。
[Abstract]:The emergence and popularity of the Internet, to people with great convenience, but also let us bear the threat from the Internet and the risk of fraud. In recent years, rogue certificates are often issued by malicious incidents. If rogue certificates are obtained by criminals and deployed to phishing / fraud sites, the risk of personal information being stolen by users is greatly increased. Will cause the loss of personal property and the decline in the credibility of related enterprises. At this stage, rogue certificates are mainly based on artificial identification. It is very necessary to realize the automatic identification of hooligan certificates. In view of the fact that hooligan certificates are difficult to recognize and the data set of hooligan certificates is lacking for the time being, this paper takes hooligan certificates as the research object. Mainly completed the following three aspects of work: 1) Cooperation research and construction of rogue certificate raw data set: to build rogue certificate data set as the goal. Combining the real digital certificate data obtained by network measurement and the simulation data of rogue certificate generated by Frankencert tool. Through the investigation and analysis of the field of the digital certificate and the characteristics of the rogue certificate, this paper determines the characteristic field of the rogue certificate based on the field of the digital certificate and the characteristics of the rogue certificate. By removing abnormal certificates and other preprocessing work. Based on the basic indexes, a 37-dimensional original rogue certificate data set (730,000 sample size) is constructed to improve the feature extraction algorithm and build a new index model: according to the shortcomings of the traditional Isomap algorithm. An improved algorithm, MM-Isomapa, is proposed. The algorithm focuses on increasing the consideration of sample points, that is, by reducing the distance between classes and expanding the distance between classes to improve the effectiveness of classification. The accuracy and accuracy of the identification of rogue certificates are adopted in this paper. Recall rate and F value are used as evaluation indexes to select optimal parameters of the algorithm and evaluate the effectiveness of the algorithm. The algorithm is applied to the raw data set of rogue certificate. Get the 18-dimensional rogue certificate index attribute model after feature extraction. 3) verify the validity of the index attribute model and cooperate to build an open data set: verify the validity of the two parts of the experiment. The first is to evaluate the validity of the original data set of rogue certificate by using three classification algorithms of vector machine SVMU J4.8 decision tree and BP neural network. The second is to evaluate the effectiveness of the new index model after feature extraction. At the same time, combined with the work of feature selection of another student in the group. The open data set of rogue certificate after feature selection (22 dimension) feature extraction (18 dimension) is constructed together, which provides basic data set support for further research on rogue certificate.
【学位授予单位】:新疆大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP309

【参考文献】

相关期刊论文 前7条

1 ;CNNIC发布第39次《中国互联网络发展状况统计报告》[J];中国信息安全;2017年02期

2 谢国民;单敏柱;付华;;基于IsoMap和MBFO-SVR的瓦斯涌出量动态预测研究[J];传感技术学报;2016年07期

3 郭依正;朱伟兴;马长华;陈晨;;基于Isomap和支持向量机算法的俯视群养猪个体识别[J];农业工程学报;2016年03期

4 杨秀锋;彭慧;周晓锋;;一种改进的ISOMAP分类算法[J];计算机应用与软件;2015年08期

5 张少龙;巩知乐;廖海斌;;融合LLE和ISOMAP的非线性降维方法[J];计算机应用研究;2014年01期

6 程起才;王洪元;吴小俊;刘锁兰;;一种基于ISOMAP的分类算法[J];控制与决策;2011年06期

7 牛燕华;任新华;毕经平;;Internet网络测量方式综述[J];计算机应用与软件;2006年07期

相关博士学位论文 前2条

1 侯勇;特征提取与集成学习算法的研究及应用[D];北京科技大学;2015年

2 潘锋;特征提取与特征选择技术研究[D];南京航空航天大学;2011年

相关硕士学位论文 前2条

1 孙丽萍;流形学习算法ISOMAP的改进与实现[D];大连理工大学;2010年

2 王超;基于流形学习的有监督降维方法研究[D];中国科学技术大学;2009年



本文编号:1416895

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1416895.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户d7a05***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com