基于约束的频繁模式挖掘方法以及应用研究
[Abstract]:Frequent pattern mining based on constraints is one of the most basic problems in the research of data mining and has a wide range of practical applications. However, there are still three challenges in this field: (1) how to expand the new application? Specifically, in addition to the "support" of the model, how to design some new model indicators to better measure it Mode of interest to meet the needs of the new application; (2) the anti mononality of the model support is different, and the properties of the proposed new model are usually more complex, such as it does not satisfy monotonicity, anti mono tonal, conversion, simplicity, etc. then, for a pattern, for example, how to quickly calculate all the upper / lower bounds of all its parent patterns on the index, And using the characteristics of this new model to design an efficient algorithm; (3) usually, different applications, with different new model indicators, and then put forward different model / lower bound calculation method. Then, is there a general method to calculate the upper / lower bounds of any pattern index? For the above problems and challenges, this paper develops The method and application of constraint based frequent pattern mining are summarized. The main achievements and contributions are as follows:
First, a web content recommendation method based on pattern mining is proposed. The recommendation of web content is to find important content block combinations from web pages to recommend users, and there are many applications (such as web page intelligent printing, electronic reading on mobile devices, etc.). There are many ways to solve this problem at present, but in these parties, there are many ways to solve this problem. In the law, either is for a specific web page (such as a web page for news, bloggers) or semi automated (users need additional operations to select the content blocks of a web page). For any type of web page, how to automatically extract effective content from a web page has not been well solved. The method of selecting the similar web page by the former user, makes the problem form a pattern mining recommendation problem, and proposes a web content recommendation method based on pattern mining, which can provide more accurate web content recommendation for any type of web page. Specifically, the content block combination (pattern) recommended to the user is not only frequent. Other users choose, and the more complete, the better. In view of this, this paper presents a new pattern of interest index, that is, the degree of possession, to measure the integrity of the pattern on its support database. Experimental results on real data sets show that the proposed method can achieve more satisfactory recommendation results and operational efficiency.
Secondly, a general efficient algorithm for mining frequent pattern mining based on occupancy is proposed. This chapter extends the definition of occupancy, the method of boundary estimation and the application of three levels. Specifically, two different definitions of occupancy are proposed based on the different weighted mean (arithmetic mean and harmonic mean), that is, the arithmetic occupancy. And harmonic possession. Unlike the anti mononality of the pattern support, the nature of possession is not satisfied with monotonicity, anti mononality, and is not satisfied with the convertability and simplicity. Then, how to quickly calculate the upper bound of all the parent patterns about the degree of possession for a pattern? For this, for each definition, three The upper bound is efficient, the most 'tight' and the upper bound. The high efficient upper bound is more efficient for single node computing, but it is looser, it needs to search a lot of nodes; the most tight upper bound is compact, so it searches for a few nodes, but the calculation of a single node is more time-consuming; for this reason, this paper puts forward a middle upper bound, A balance between the tightness and computational complexity makes the overall performance of the algorithm optimal. The concept of occupancy is not only important for the application on the transaction database (such as web page content printing recommendation), but also is very important for the application of the sequence database (such as a tourist attraction recommendation). For this reason, this paper proposes A universal algorithm DOFRA can process applications on different types of databases at the same time. Finally, the validity of DOFRA is verified in two practical applications, and the efficiency of the DOFRA algorithm is verified in a large number of synthetic data.
Finally, a general model is proposed to efficiently estimate the upper / lower bounds of any pattern index. Constraint based mining is not only helpful to capture more semantic information of the pattern, but also can further improve the mining efficiency by using the nature of constraints. The interest degree of the metric pattern is labeled, then the upper / lower bounds of the model indexes are estimated, and a unified framework suitable for any pattern index is lacking. Therefore, this paper formally considers the boundary estimation problem of only item markers, and proposes a general model to efficiently solve the problem. For the effectiveness of the framework, this paper gives two typical model indexes as learning cases, namely, pattern utility and pattern occupancy. In addition, in order to meet different application requirements, this paper extends the traditional SQL based pattern indicators, such as min, Max, AVG, VaR, and so on. The experimental analysis on the data shows the versatility and effectiveness of the proposed scheme.
【学位授予单位】:中国科学技术大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:TP311.13;TP393.092
【共引文献】
相关期刊论文 前10条
1 朱君;曲超;汤庸;;利用单词超团的二分图文本聚类算法[J];电子科技大学学报;2008年03期
2 张乐君;国林;张健沛;杨静;夏磊;;测度属性关系分析的分布式系统异常检测[J];北京邮电大学学报;2013年06期
3 马丽生;姚光顺;杨传健;;基于FP-tree的极大超团模式挖掘算法[J];计算机工程与应用;2011年12期
4 卓鹏;肖波;蔺志青;;基于事务拆分的超团挖掘算法[J];计算机工程;2009年20期
5 曲超;潘晓衡;朱君;蔡少仲;胡天明;;基于单词超团的文本聚类方法[J];计算机工程;2011年11期
6 黄崇争;李海峰;陈红;;数据流上近似非可导项集的挖掘算法[J];计算机学报;2010年08期
7 Daniel Kunkle;张冬晖;Gene Cooperman;;Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy[J];Journal of Computer Science & Technology;2008年01期
8 ;Mining item-item and between-set correlated association rules[J];Journal of Zhejiang University-Science C(Computers & Electronics);2011年02期
9 高恩阳;刘伟军;王天然;;一种基于线性规划的孤立点检测方法[J];控制工程;2013年06期
10 高峗;周薇;韩冀中;孟丹;;一种基于文法压缩的日志异常检测算法[J];计算机学报;2014年01期
相关会议论文 前1条
1 黄崇争;李海峰;陈红;;数据流上近似非可导项集的挖掘算法[A];NDBC2010第27届中国数据库学术会议论文集A辑一[C];2010年
相关博士学位论文 前10条
1 李强;数据挖掘中关联分析算法研究[D];哈尔滨工程大学;2010年
2 沈斌;关联规则相关技术研究[D];浙江大学;2007年
3 沙朝锋;基于信息论的数据挖掘算法[D];复旦大学;2008年
4 耿汝年;加权频繁模式挖掘算法研究[D];江南大学;2008年
5 肖波;可信关联规则挖掘算法研究[D];北京邮电大学;2009年
6 贺惠新;燃机异常检测系统的关键技术研究[D];哈尔滨工业大学;2013年
7 任维武;用于分布式入侵检测系统的合作式本体模型[D];吉林大学;2013年
8 陈斌;异常检测方法及其关键技术研究[D];南京航空航天大学;2013年
9 黄垂碧;应用层网关攻击检测和性能优化策略研究[D];中国科学技术大学;2014年
10 何晓旭;时间序列数据挖掘若干关键问题研究[D];中国科学技术大学;2014年
相关硕士学位论文 前10条
1 余强;基于语义的设计知识个性化检索技术研究及应用[D];南京航空航天大学;2010年
2 李世松;基于闭模式的关联规则产生算法研究[D];江苏大学;2007年
3 卓鹏;关联规则与超团挖掘算法研究[D];北京邮电大学;2009年
4 孟静;异常数据挖掘算法研究与应用[D];江南大学;2013年
5 庞景月;滑动窗口模型下的数据流自适应异常检测方法研究[D];哈尔滨工业大学;2013年
6 肖托;一种改进的支持向量数据描述算法[D];哈尔滨工程大学;2013年
7 仲莉;基于隐马尔科夫模型的低碳异常检测方法研究及应用[D];华南理工大学;2013年
8 沈耀东;基于压缩融合的无线传感网事件检测算法研究[D];中国地质大学;2013年
9 吴龙常;基于聚类分析的入侵检测算法研究[D];东北大学;2011年
10 刘彬彬;Android平台的安全技术研究与实现[D];江苏科技大学;2013年
,本文编号:2150063
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2150063.html