一种基于样本的模拟口令集生成算法

发布时间：2018-11-09 14:56

【摘要】：大规模的用户口令集因可用于评估口令猜测算法的效率、检测现有用户口令保护机制的缺陷等,而广受系统安全研究领域的重视.然而,尽管可以通过一些渠道,譬如网站口令泄露、用户自愿征集或者个别网站出于研究目的的共享等,获取真实的大规模用户明文口令对当前研究人员来说仍然非常困难.为应对上述问题,该文提出了一种基于样本的模拟口令集生成算法(Sample Perturbation Based Password Generation,SPPG).该算法利用较容易获得的小规模真实口令样本,通过学习生成概率模型,并产生大规模用户口令集合.为评估这一算法的效能,该文提出了一组模拟口令集质量的检测指标,包括真实口令覆盖率、Zipf分布拟合度等.最后,论文对比了SPPG算法与当前常见的用户口令猜测概率模型,包括概率上下文无关文法和多种马尔科夫模型,在生成用户口令集上的效能差异.结果显示,SPPG算法产生的模拟口令集在各指标下都有更好的表现.平均地,在真实口令覆盖率上,相对上下文无关文法和四阶马尔科夫模型分别提高了9.58%和72.79%,相对三阶和一阶马尔科夫模型分别提高了10.34倍和13.41倍,并且Zipf分布的拟合度保持在0.9及以上的水平.同时,其口令结构分布和特殊模式的使用也更符合真实用户生成口令的情况.
[Abstract]:Large-scale user password sets can be used to evaluate the efficiency of password guessing algorithms and to detect the shortcomings of existing user password protection mechanisms, which have attracted much attention in the field of system security research. However, although some channels, such as website password leakage, voluntary user solicitation or sharing of individual websites for research purposes, it is still very difficult for researchers to obtain real large scale clear text passwords of users. In order to solve the above problems, this paper proposes an analog password set generation algorithm (Sample Perturbation Based Password Generation,SPPG) based on samples. The algorithm makes use of small scale real password samples which are easy to obtain and generates probabilistic models by learning and generating large-scale user password sets. In order to evaluate the performance of the algorithm, this paper presents a set of quality detection indicators for analog password sets, including real password coverage, Zipf distribution fit, etc. Finally, the paper compares the performance difference between SPPG algorithm and common user password guessing probability models, including probabilistic context-free grammar and multiple Markov models, in generating user password sets. The results show that the simulated password set generated by SPPG algorithm has better performance under each index. On average, the relative context-free grammar and the fourth order Markov model are increased by 9.58% and 72.79%, respectively, and the third and first order Markov models are increased by 10.34 and 13.41 times, respectively. The fitting degree of Zipf distribution was maintained at the level of 0. 9 and above. At the same time, the distribution of password structure and the use of special patterns are more in line with the real user generated password.
【作者单位】：复旦大学软件学院;上海市数据科学重点实验室;
【基金】：上海市科委“创新行动计划项目”(16DZ1100200) 国家自然科学基金(61572136,61370080)资助~~
【分类号】：TP393.092

【相似文献】