基于显露模式的早期网瘾行为检测模型的研究与实现
发布时间:2018-07-24 14:49
【摘要】:随着网络的日益普及,社交网、购物网、即时聊天软件也随之蓬勃发展,但由于患者对互联网依赖而导致明显的心理异常并造成生理性受损现象,这就是网络成瘾问题,即“病理性网络使用”,英文为Pathological Internet Use,简称PIU。如何早期发现和治疗网瘾是学术界和工业界面临的前沿问题。目前,对于此问题的研究大多基于心理学、社会学和医学等方向,而计算机技术领域尚未涉及此问题。因此,本文从计算机数据挖掘角度对网瘾问题进行了研究,提出了基于显露模式(Emerging Pattern-EP)的网瘾模式挖掘和检测模型,为进一步有效治疗网瘾提供了可参考的理论依据。显露模式是一种新的对比挖掘模式,是从一个数据集到另外一个数据集支持度发生显著变化的项集,其能够捕获目标类与非目标类之间的差异化特征,基于显露模式可以建立分类效果良好的分类器。在网瘾模式挖掘和检测模型中,首先需要采集用户上网行为简单事件,然后通过生成规则推理出具有高级语义信息的复杂事件,最后根据行为等价类(Behavior Equivalence Class)挖掘出产生子(Generator),因为产生子能显著的代表数据集的属性特征且表示形式简单。本文提出了两种PIU模式检测算法,基于产生子的PIU检测算法(Generator-based PIU Detecting Algorithm-GBP DA)和基于EP的PIU检测算法(EP-based PIU Detecting Algorithm-EPBPDA). GBPDA算法从产生子角度,选择能显著代表网瘾行为的产生子,通过对网瘾数据集产生子与测试数据集产生子比对打分方法给出最后诊断。EPBPDA算法则从EP角度出发,挖掘出跳跃显露模式(JEP)与基本显露模式(eEP),综合考虑增长率、支持度、JEP与EP提出一种有效打分机制,并利用该机制对网瘾进行检测。从真实数据集和仿真数据两种数据集进行实验,分别检验了两种算法的性能指标,如运行时间、内存资源占用情况;有效性指标,如正确率、误诊率、率。实验结果表明,当数据规模不大时,两种方法都有很好的网瘾检测效果,且EPBPDA算法的有效性好于GBPDA算法,这是因为EP区分能力强于Generator。但性能指标GBPDA算法好于EPBPDA算法,原因在于挖掘EP比Generator需要更多的处理时间和空间。当数据规模较大时,相比于GBPDA算法,EPBPDA算法的有效性优势更加明显,同时由于EP个数没有Generator个数增加的多,导致运行时间也比GBPDA算法更少。
[Abstract]:With the increasing popularity of the Internet, social networks, shopping networks, instant messaging software has also flourished, but because patients rely on the Internet and lead to obvious psychological abnormalities and physical damage phenomenon, this is the problem of Internet addiction. That is, "pathological network use", Pathological Internet Use, for short Piu. How to find and treat Internet addiction early is a frontier problem faced by academia and industry. At present, most of the researches on this problem are based on the fields of psychology, sociology and medicine. Therefore, this paper studies the problem of Internet addiction from the point of view of computer data mining, and puts forward a model of mining and detection of Internet addiction based on Emerging Pattern-EP, which provides a theoretical basis for the further effective treatment of Internet addiction. Exposure pattern is a new pattern of contrast mining, which can capture the difference between target class and non-target class. It is an itemset with significant change in support from one data set to another. Based on the exposure pattern, a classifier with good classification effect can be established. In the model of mining and detection of Internet addiction patterns, it is necessary to collect simple events of users' behavior on the Internet, and then to infer complex events with advanced semantic information by generating rules. Finally, according to the behavior equivalent class (Behavior Equivalence Class), the generator (Generator), is mined because the generator can represent the attribute characteristics of the dataset significantly and the representation form is simple. In this paper, two PIU pattern detection algorithms, PIU detection algorithm based on generator (Generator-based PIU Detecting Algorithm-GBP DA) and PIU detection algorithm based on EP (EP-based PIU Detecting Algorithm-EPBPDA), are proposed. The GBPDA algorithm selects the generator which can represent the behavior of Internet addiction significantly from the point of view of generation subsets. By comparing the generation subsets of Internet addiction data sets with that of test data sets, the final diagnosis. EPBPDA algorithm is given from the point of view of EP. The jump exposure mode (JEP) and the basic exposure mode (eEP),) were excavated to consider the growth rate. The support degree JEP and EP proposed an effective scoring mechanism and used the mechanism to detect Internet addiction. The performance indexes of the two algorithms, such as running time, memory resource occupation, validity index, such as correct rate, misdiagnosis rate, rate of misdiagnosis, are tested from real data set and simulation data set. The experimental results show that when the data size is small, both of the two methods have good detection effect of Internet addiction, and the effectiveness of EPBPDA algorithm is better than that of GBPDA algorithm, which is because EP is better than generator in distinguishing ability. But the performance index GBPDA algorithm is better than EPBPDA algorithm, because mining EP needs more processing time and space than Generator. When the data scale is large, the validity advantage of EPBPDA algorithm is more obvious than that of GBPDA algorithm, and the running time of EP algorithm is less than that of GBPDA algorithm because the number of EP is less than that of GBPDA algorithm.
【学位授予单位】:东北大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:R749.99;TP311.13
,
本文编号:2141720
[Abstract]:With the increasing popularity of the Internet, social networks, shopping networks, instant messaging software has also flourished, but because patients rely on the Internet and lead to obvious psychological abnormalities and physical damage phenomenon, this is the problem of Internet addiction. That is, "pathological network use", Pathological Internet Use, for short Piu. How to find and treat Internet addiction early is a frontier problem faced by academia and industry. At present, most of the researches on this problem are based on the fields of psychology, sociology and medicine. Therefore, this paper studies the problem of Internet addiction from the point of view of computer data mining, and puts forward a model of mining and detection of Internet addiction based on Emerging Pattern-EP, which provides a theoretical basis for the further effective treatment of Internet addiction. Exposure pattern is a new pattern of contrast mining, which can capture the difference between target class and non-target class. It is an itemset with significant change in support from one data set to another. Based on the exposure pattern, a classifier with good classification effect can be established. In the model of mining and detection of Internet addiction patterns, it is necessary to collect simple events of users' behavior on the Internet, and then to infer complex events with advanced semantic information by generating rules. Finally, according to the behavior equivalent class (Behavior Equivalence Class), the generator (Generator), is mined because the generator can represent the attribute characteristics of the dataset significantly and the representation form is simple. In this paper, two PIU pattern detection algorithms, PIU detection algorithm based on generator (Generator-based PIU Detecting Algorithm-GBP DA) and PIU detection algorithm based on EP (EP-based PIU Detecting Algorithm-EPBPDA), are proposed. The GBPDA algorithm selects the generator which can represent the behavior of Internet addiction significantly from the point of view of generation subsets. By comparing the generation subsets of Internet addiction data sets with that of test data sets, the final diagnosis. EPBPDA algorithm is given from the point of view of EP. The jump exposure mode (JEP) and the basic exposure mode (eEP),) were excavated to consider the growth rate. The support degree JEP and EP proposed an effective scoring mechanism and used the mechanism to detect Internet addiction. The performance indexes of the two algorithms, such as running time, memory resource occupation, validity index, such as correct rate, misdiagnosis rate, rate of misdiagnosis, are tested from real data set and simulation data set. The experimental results show that when the data size is small, both of the two methods have good detection effect of Internet addiction, and the effectiveness of EPBPDA algorithm is better than that of GBPDA algorithm, which is because EP is better than generator in distinguishing ability. But the performance index GBPDA algorithm is better than EPBPDA algorithm, because mining EP needs more processing time and space than Generator. When the data scale is large, the validity advantage of EPBPDA algorithm is more obvious than that of GBPDA algorithm, and the running time of EP algorithm is less than that of GBPDA algorithm because the number of EP is less than that of GBPDA algorithm.
【学位授予单位】:东北大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:R749.99;TP311.13
,
本文编号:2141720
本文链接:https://www.wllwen.com/yixuelunwen/jsb/2141720.html
最近更新
教材专著