面向数据挖掘的关系型领域知识融合方法研究

发布时间:2018-12-10 12:21
【摘要】:现有数据挖掘技术所面向的数据大多是在原始层次上的,相应的挖掘方法是无领域知识融合,或者是依赖于用户参与的人工方式融合领域知识来实现知识发现的过程。然而,实际应用领域的数据存在层次上的差异,有些数据是原始级的,还有些数据与其他一些数据密切相关,并且采用这些相关数据的适当的组合或泛化粒度可能更好地揭示其内在的规律。因此,充分利用与原始数据相关的领域知识指导数据挖掘的工作,能“从极不相同的粒度上观察和分析同一问题”,达到在合理的数据层次上获取知识,在不同的数据层次上灵活转换,做到往返自如,毫无困难,这成为重要的研究课题。鉴于实际应用领域中,大量的数据存在着以属性扩展或延伸为代表形式的领域知识,而此类领域知识大多采用关系表的形式出现。因此,本文重点研究关系型领域知识的表示及其与数据挖掘研究工作融合的方法,从而自动有效的开展知识发现工作。本文主要研究工作如下:(1)提出基于关系模型领域知识的结构化表示模型DKMRM (Domain Knowledge of Multi-Relations Model,DKMRM)。模型中采用关系模型对数据表中的相关属性的领域知识进行映射或投影,从而构成领域知识的上下文关系表,进而形成了复杂的多关系表示模型。在面向关系型数据库系统进行挖掘时,利用这种模型和必要的变换策略,可以将某些原始数据泛化或例化到合理的层次,以获得更符合用户个性化需求的知识形式。(2)基于DKMRM的数据挖掘研究工作。提出面向数据挖掘的关系型领域知识融合方法。以分类问题为实际案例,建立融合关系型领域知识的分类挖掘方法框架。针对传统挖掘方法存在的局限性,本方法框架有效解决传递源、传递路径、终止策略、传递的偏差统计等关键问题。(3)提出基于属性选择的多关系分类挖掘算法CC-DKMR ( Classification of Characters based on Domain Knowledge of Multi-Relations,CC-DKMR)和基于关系表选择的多关系分类挖掘算法 CS-DKMR (Classification of Sheets based on Domain Knowledge of Multi-Relations,CS-DKMR),以寻求在不同的数据粒度层次上挖掘模式和灵活的转换机制,从领域知识中获取更有价值的知识。实验表明此方法是有效的。(4)提出在数据挖掘的评测阶段融合领域知识的挖掘算法的评测方法,解决数据挖掘的算法(程序)存在的“oracle”现象,传统的评测方法难以具有适应性的问题。基于蜕变测试技术,该方法有效利用领域知识,并针对分类、关联、聚类挖掘算法的具体案例开展研究分析,构造了针对具体算法的蜕变关系。实验结果表明,此方法能有效达到评测目的,并具有适用其它领域的推广可行性。
[Abstract]:Most of the existing data mining technologies are based on the original level. The corresponding mining methods are domainless knowledge fusion or the process of realizing knowledge discovery by integrating domain knowledge with the user's participation. However, there are hierarchical differences in data in practical application areas, some of which are raw, others that are closely related to others, And the proper combination or generalization granularity of these related data may better reveal its inherent law. Therefore, to make full use of domain knowledge related to raw data to guide the work of data mining, we can "observe and analyze the same problem from very different granularity", so as to obtain knowledge at a reasonable data level. Flexible conversion at different data levels, free commutation, no difficulty, this has become an important research topic. In view of the fact that a large number of data exist in the field of practical application, there is domain knowledge in the form of attribute extension or extension, and most of such domain knowledge appears in the form of relational tables. Therefore, this paper focuses on the representation of relational domain knowledge and its fusion with data mining research, so that knowledge discovery can be carried out automatically and effectively. The main work of this paper is as follows: (1) A structured representation model based on relational model domain knowledge (DKMRM (Domain Knowledge of Multi-Relations Model,DKMRM) is proposed. In the model, the relational model is used to map or project the domain knowledge of the related attributes in the data table, so as to form the contextual table of domain knowledge, and then form a complex multi-relational representation model. When mining for relational database system, some raw data can be generalized or exemplified to a reasonable level by using this model and necessary transformation strategy. (2) the research work of data mining based on DKMRM. A relational domain knowledge fusion method for data mining is proposed. Taking the classification problem as a practical case, the framework of classification mining method for integrating relational domain knowledge is established. In view of the limitations of traditional mining methods, the framework of this method effectively solves the problem of transfer source, transfer path and termination strategy. (3) A multi-relational classification mining algorithm CC-DKMR (Classification of Characters based on Domain Knowledge of Multi-Relations, based on attribute selection is proposed. CC-DKMR) and CS-DKMR (Classification of Sheets based on Domain Knowledge of Multi-Relations,CS-DKMR), a multi-relational classification mining algorithm based on relational table selection, to seek for mining patterns and flexible transformation mechanisms at different data granularity levels. Acquire more valuable knowledge from domain knowledge. Experimental results show that this method is effective. (4) A method for evaluating the fusion of domain knowledge in the evaluation stage of data mining is proposed to solve the "oracle" phenomenon in the algorithm (program) of data mining. It is difficult for traditional evaluation methods to be adaptive. Based on the metamorphosis testing technology, the method effectively utilizes domain knowledge, and carries out research and analysis on the specific cases of classification, association and clustering mining algorithm, and constructs the metamorphosis relation for the specific algorithm. The experimental results show that this method can effectively achieve the purpose of evaluation and is applicable to other fields.
【学位授予单位】:合肥工业大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP311.13

【参考文献】

相关期刊论文 前10条

1 谢亮;张晶;胡学钢;;主从关系数据库中关联规则挖掘算法研究[J];合肥工业大学学报(自然科学版);2009年05期

2 董国伟;徐宝文;陈林;聂长海;王璐璐;;蜕变测试技术综述[J];计算机科学与探索;2009年02期

3 彭珍;杨炳儒;李冬艳;侯伟;宁顶利;;多关系数据分类方法综述[J];计算机工程与应用;2008年34期

4 何军;刘红岩;杜小勇;;挖掘多关系关联规则[J];软件学报;2007年11期

5 徐光美;杨炳儒;张伟;宁淑荣;;多关系数据挖掘方法研究[J];计算机应用研究;2006年09期

6 李道国;苗夺谦;杜伟林;;粒度计算在人工神经网络中的应用[J];同济大学学报(自然科学版);2006年07期

7 ;A Granular Computing Model Based on Tolerance relation[J];The Journal of China Universities of Posts and Telecommunications;2005年03期

8 朱靖波,陈文亮;基于领域知识的文本分类[J];东北大学学报;2005年08期

9 吴鹏,施小纯,唐江峻,林惠民,陈宗岳;关于蜕变测试和特殊用例测试的实例研究(英文)[J];软件学报;2005年07期

10 李道国,苗夺谦,张红云;粒度计算的理论、模型与方法[J];复旦学报(自然科学版);2004年05期



本文编号:2370556

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/2370556.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户0e28b***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com