基于变阶马尔科夫模型的口令猜测方法

发布时间：2018-06-17 22:29

本文选题：口令猜测 + 马尔科夫模型　；参考：《武汉大学》2017年硕士论文

【摘要】：目前最流行的用户身份认证方式是以用户名-口令对进行认证,这种方式容易理解、实现方便、使用简单,但往往存在口令安全强度不够的问题,因此,研究口令的安全性一直是现阶段的一个热点课题,其中,利用不同的口令猜测技术来破解口令集是研究口令安全性的主要方法。常用的口令猜测技术包括暴力破解、字典攻击、基于概率口令模型的攻击等等,其中基于概率口令模型的攻击方法是近年来的研究热点,也是破解效果最好的口令猜测方法。基于概率的口令模型可以分为两种,一种是基于模板的口令模型,即把口令按照一定的结构分段,先计算出不同模板结构的概率,从而得到具体口令的概率;另一种是基于全串的口令模型,即把口令当做一个整体计算其概率。目前有很多基于模板的概率口令模型猜测方法的研究,而对于基于全串的概率口令模型猜测方法的研究较少,主要方向就是把自然语言处理技术中的马尔科夫模型引入到口令概率计算之中。多数研究者采用的都是固定阶数的马尔科夫模型,阶数过低时,就会导致计算口令中每个位置的字符概率时,使用的历史字符信息过少,计算结果不够准确;而阶数过高时,则会由于训练集的数据稀疏问题,导致高阶马尔科夫模型过度拟合。针对以上问题,本文提出了一种实现变阶马尔科夫模型的方法,BackOff方法,在计算整串口令的概率时,自适应选择所用的马尔科夫模型阶数,即根据具体位置选择历史字符信息长度来计算概率。其实现方式是设置一个出现次数阈值,然后从最高阶数的马尔科夫模型开始尝试,不断降低模型阶数,直到N元语法出现次数大于阈值。整个猜测方法的流程是首先通过对真实口令集的训练,得到N元语法模型和对应频率的集合,然后在生成猜测口令阶段,将N元语法与字符空间中的字符进行拼接,利用马尔科夫链计算口令概率,最后利用优先队列的入队和出队,得到降序排列的猜测口令集合,用猜测口令集合去匹配测试集,得出不同猜测次数对应的破解率。经过四组对比实验,本文提出的基于变阶马尔科夫模型的口令猜测方法取得了较好的效果,在进行了两千万次猜测之后,本文猜测方法的破解率相对于传统的JTR工具、PCFG方法以及基于定阶马尔科夫模型的口令猜测方法,都有了明显的提高。
[Abstract]:At present, the most popular way of user identity authentication is to authenticate by username-password, which is easy to understand, easy to realize and easy to use, but it often has the problem of insufficient password security intensity. The research on password security has been a hot topic at present. Among them, it is the main method to study password security by using different password guessing techniques to crack password set. The commonly used password guessing techniques include brute force, dictionary attacks, attacks based on probabilistic password model and so on. Among them, probabilistic password model based attack method is a hot research topic in recent years, and it is also the best password guessing method. The probabilistic password model can be divided into two kinds: one is the password model based on template, that is, the password is segmented according to a certain structure, the probability of different template structure is calculated first, and the probability of specific password is obtained. The other is a full string-based password model, in which the probabilities of passwords are calculated as a whole. At present, there are many researches on probabilistic password model guessing based on template, but there are few researches on probabilistic password model guessing based on full string. The main direction is to introduce Markov model in natural language processing technology into password probability calculation. Most researchers use Markov model with fixed order. When the order is too low, the probability of characters at each position in password will be calculated, the historical character information is too little and the calculation result is not accurate, and when the order is too high, Because of the data sparsity of the training set, the higher order Markov model is overfitted. In order to solve the above problems, this paper presents a method to realize the variable order Markov model. In order to calculate the probability of the whole password, the order of Markov model is adaptively selected. The probability is calculated by selecting the length of historical character information according to the specific position. The method is to set a threshold of occurrence times, and then try to reduce the order of the model from the Markov model with the highest order, until the number of N-meta syntax occurrences is greater than the threshold. The flow of the whole guessing method is to obtain the N-meta syntax model and the set of corresponding frequencies by training the real password set, and then, in the stage of generating the guess password, the N-meta syntax and the characters in the character space are stitched together. The Markov chain is used to calculate the password probability. Finally, the descending order of the guess password set is obtained by using the priority queue, and the test set is matched by the guess password set, and the corresponding cracking rate of the different guesses is obtained. After four sets of comparative experiments, the password guessing method based on variable order Markov model has achieved good results. After 20 million guesses, Compared with the traditional JTR tool PCFG method and the password guessing method based on fixed order Markov model, the decoding rate of this method is obviously improved.
【学位授予单位】：武汉大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP309

【参考文献】