基于本体的洗钱案例特征提取研究

发布时间：2018-11-09 10:39

【摘要】： 洗钱案例特征值是金融领域中判定洗钱活动的一项重要参考依据。在基于案例推理的监测甄别中,首要任务是将案例报告的特征值录入到案例库。由于洗钱案例报告自身所具有的信息隐藏性和非结构化性,使得这项工作尚处于人工操作阶段,在效率和准确率上难以达到要求。基于此,提出一种基于本体的特征提取方法,设计并实现了文本知识的自动获取。本体作为一种概念化的显示说明,是对客观存在的概念和关系的描述。通常情况下,本体的构建是在领域专家的指导下进行的。实际应用中,在分析了大量洗钱案例报告之后,将其抽象出一个概念模型,抽取其中能代表洗钱特性的关键字作为本体中的类。利用同样原理再定义相应类的子类以及子类与父类之间的属性关系,最后是定义实例和加入约束。在特征提取中,采用模式匹配和定义文法相结合的方法实现。模式匹配的功能是确定索引关键字出现在在文本向量中的位置;文法定义规定了被抽取数据的出现形式,数据定义提供了数据规格化的参考标准。此外,对模式匹配算法做了深入的研究,分析了各算法的优缺点和复杂度,并对现有算法做了改进。最后,设计了一个原型系统。系统是开发语言是Java,运行在B/S模式下。系统中使用了开源工具protégé3.1进行本体的编辑和Jena 2.4进行本体解析,实验的输入数据来自官方提供的洗钱案例报告样本,输出形式为可以存储在关系数据库中的结构化数据。
[Abstract]:The characteristic value of money laundering cases is an important reference for judging money laundering activities in the financial field. In case based reasoning (CBR) based monitoring, the primary task is to input the eigenvalues of case reports into the case base. Due to the information hiding and unstructured nature of money laundering case report, the work is still in the stage of manual operation, and it is difficult to meet the requirements in efficiency and accuracy. Based on this, a feature extraction method based on ontology is proposed, and the automatic acquisition of text knowledge is designed and realized. Ontology, as a conceptual representation, is a description of the concept and relationship of objective existence. In general, ontology construction is conducted under the guidance of domain experts. In practical application, after analyzing a large number of money laundering case reports, it is abstracted into a conceptual model and the keywords that represent the characteristics of money laundering are extracted as classes in the ontology. By using the same principle, we define the subclasses of the corresponding classes and the relationship between the subclasses and the parent classes. Finally, we define the instances and add the constraints. Pattern matching and definition grammar are used in feature extraction. The function of pattern matching is to determine where the index key appears in the text vector, and the grammar definition defines the appearance of the extracted data, and the data definition provides the reference standard for data normalization. In addition, the pattern matching algorithm is studied deeply, the advantages, disadvantages and complexity of each algorithm are analyzed, and the existing algorithms are improved. Finally, a prototype system is designed. The system is a development language and Java, runs in B / S mode. The open source tool prot 茅 g 茅 3.1 is used for ontology editing and Jena 2.4 for ontology parsing. The output is structured data that can be stored in a relational database.
【学位授予单位】：华中科技大学
【学位级别】：硕士
【学位授予年份】：2007
【分类号】：TP399-C1;D917

【参考文献】