面向复杂语义的专利本体构建方法研究

发布时间:2018-04-06 07:39

  本文选题:专利结构 切入点:实体关系 出处:《武汉大学》2014年博士论文


【摘要】:专利数据涵盖了世界上95%的最新技术、发明,专利数量和质量已经成为一个企业、行业乃至国家经济竞争力的标志。对专利数据进行有效地利用,可以为企业研发决策提供支持,使企业有效地节省重复开发的成本。随着人们知识产权意识的增强,专利数据增长迅猛,往往与同一项技术相关的专利数量较多,形成了一个专利群,其中的专利在实现原理或技术细节上存在着不同程度的相似性。在数据呈爆炸性增长的现代社会,人们期望以最便捷的方式获得所需要的信息。 已有的专利分析方法未考虑关键词之间的语义关系,这类方法主要基于技术关键字进行统计分析,将专利文档建模为由关键字对应的权重构成的向量,以向量空间模型计算专利文档之间的相似性。然而,基于这一模型无法甄别专利群中的不同专利间使用的语义相同或相近的不同技术关键词。因此,如果在分析过程中考虑专利中语义信息的利用,将会获得比较好的分析效果。 本项目将研究包含丰富语义关系的专利构成信息的抽取,并通过本体组织和管理这些结构数据,试图基于专利结构相关语义知识进行深入的专利分析。 由于从专利文档中抽取信息以文本理解为基础,虽然已有一些研究工作以信息处理技术为支撑实现了经济、生物、化学等领域的数据抽取并通过本体来管理,但这些方法不适用于从中文专利文档中抽取实体关系和这些实体关系的组织和管理。这是因为专利结构信息获取过程及专利分析应用中存在着一些特有问题: (1)专利文档蕴含丰富的结构相关的实体关系数据,建模专利结构本体时需要对专利中包含的概念及其关系建立分类,以尽可能全面、有效地反映专利结构中的实体间语义关系的差别和特性; (2)文本中描述了专利组成部分之间的物理位置关系和动态关系,文本表达灵活,句子结构复杂,而且其中还出现大量描述实体名称和关系特征的单个技术专利独有的新技术术语,实体关系可能包含在短语、句子或多个句子之间。从中文专利文档抽取实体关系应该考虑所有这些因素; (3)利用专利结构本体进行专利分析时,将考虑每个专利的实体语义关系对分析结果的影响。这一过程将会非常复杂。 但是,另一方面,每个专利从申请到获批都经过反复审核、修改,因此专利数据符合书写规范、数据质量高。尽管不同技术领域的专利文档所描述的新技术千差万别,但它们对专利技术的描述却表现出很多共同点:①专利文档引入了很多以基本术语为中心词的新合成技术术语:②描述专利技术的构成时遵循一定的时空顺序;③在描述新技术实现时,对其中的加工处理相关的实体关系进行了描述。 利用文档具有的以上优势特征,解决从专利文档中实体语义关系的抽取问题是有意义的,它将为进行深入语义分析、挖掘领域技术专利知识提供高质量的数据。基于这一思路,我们研究了有效的专利技术本体建模和数据获取方法,并应用本体知识进行专利分析。 针对专利文档书写质量高、所描述的技术新颖等特点,本文研究了专利结构本体构建的方法和应用,主要完成了以下工作: (1)技术结构相关概念及其语义关系建模 基于关系实例是本体概念和关系的最直观表现的思想,给出了对关系实例进行数据分析、挖掘的方法:通过层次聚类获得语义关系基本分类;利用分类结果对专利结构图中的语义关系赋予关系类型标记,挖掘关系结构图中的频繁模式;进而根据频繁模式,分析专利中与实体关联的不同类型关系的共现情况,最终决定专利本体类及其关系的模式信息;最后,给出了基于本体中已有类及关系的推理规则,通过这些规则可利用已有关系实例获得专利中的隐含实体语义关系。通过实验证明所提出的建模方法可减少专利本体建模的时间花费,这一建模能很好地涵盖领域实体关系类型,便于有效组织和管理专利结构相关知识。 (2)基于自学习的专利结构数据获取方法 研究中充分利用了反映专利遵循书写规范文档的各级文本模式特征,提出了一种利用专利文档中实现关系特征词和实体关系的抽取。在文本预处理阶段,通过统计学习,获得关系实例在对应文本段中表现各种模式特征如:字词搭配、短语构成、句间关系等形成的多级模式规则;然后,给定少量实体实例关系作为种子,基于种子关系实例具有的语义特征构造初始的关系抽取模板,通过自学习方法抽取多元实体关系;最后,通过文本段解析过程获得句间隐含的实体关系。 (3)专利知识数据的典型应用 在专利分析典型示范应用方面,给出了基于贪婪算法的专利技术结构对比分析的方法,进而提出了基于相似子结构自底向上计算专利相似性的方法;进而基于专利结构相似性对专利进行聚类分析,分析专利权人的技术相似度。实验结果证明了专利结构知识可提高专利分析结果的准确性。 (4)专利本体的构建与应用实现 实现了本体构建过程,包括:利用本体工具建立通过实例挖掘获得的专利模式;从文档抽取各类关系特征词、句子构成模式等信息;从文档抽取关系实例。最后,提出了基于专利权人相似性的专利知识实现了用户合作伙伴推荐的新型专利分析方法。
[Abstract]:The patent data covers the latest technology, 95% of the world's invention, the quantity and quality of patent has become a symbol of enterprise, industry and national economic competitiveness. The effective use of patent data, can provide support for enterprise development decision-making, to enable enterprises to effectively save the cost of duplication of development. With the enhancement of people's intellectual property rights awareness of the patent data is growing rapidly, often associated with the same patent number, the formation of a patent group, the patent has different degree of similarity in the realization of the principle or technical details. In modern society the explosive growth in data, people expect to get the needed information in the most convenient way.
The patent analysis method does not consider the semantic relations between keywords, this kind of method is mainly based on the key technology of statistical analysis, the weight vector patent document modeling grounds keywords corresponding composition, using vector space model to calculate the similarity between the patent documents. However, different technical words of this model can not identify different patent patent group the use of the same or similar meaning based. Therefore, if consider the use of semantic information in the patent in the process of analysis, analysis will get better effect.
This project will study the extraction of patent composition information including rich semantic relations, and organize and manage these structural data through ontology, trying to conduct in-depth patent analysis based on patent structure and related semantic knowledge.
The extraction of information from patent documents to text understanding, although there have been some research work on information processing technology to support the realization of economic, biological, chemical and other fields of data extraction and through ontology management, organization and management but these methods are not applicable to the Chinese patent document entity relation extraction and the relationship between these entities. This is because the patent structure information acquisition process and application of patent analysis there are some special problems:
(1) patent documents contain abundant structure related entity relational data. When modeling patent structure ontology, we need to classify the concepts and relationships contained in patents, so as to comprehensively and effectively reflect the differences and characteristics of semantic relations among entities in the patent structure.
(2) the text describes the physical location of the relationship between patent component and dynamic relationship, text expression flexible, complex sentence structures, but which also appeared new technical terms a single patent describes the name of the entity and relationship characteristics of the unique, entity relationship may include in a phrase, sentence or more from sentences. Chinese patent document entity relation extraction should consider all these factors;
(3) the impact of the entity semantic relations of each patent on the results of the analysis will be considered when the patent analysis is used for patent analysis. This process will be very complex.
But, on the other hand, each patent from application to approval have been repeated examination, modify, so patent data in accordance with the written specification, data of high quality. Although the new technology of different technology in the field of patent documents described vary, but they describe the patent technology has shown many points in common: the patent document introduces many in basic terms for the new synthetic technology center: the term word form description of patent technology to follow a spatial and temporal order; in the description of new technology implementation, the entity relation processing which describe the.
The document has the characteristics of the above advantages, solve the patent documents extraction entity semantic relation is meaningful, it will provide in-depth semantic analysis, knowledge mining technology patents to provide high quality data. Based on this idea, we study the patent technology of ontology modeling and data acquisition and application of effective method. Ontology for patent analysis.
Aiming at the characteristics of high quality of patent documents and the novel technology described, this paper studies the methods and applications of ontology construction of patent structure.
(1) the concept of technical structure and its semantic relationship modeling
Based on the relationship between ontology concepts and relations are examples of the most intuitive performance of the thought, gives the relationship instance data analysis, mining methods: to obtain the basic classification of semantic relations through hierarchical clustering using semantic relations; the classification results of the patent in the structure diagram to type marker, frequent pattern mining and relationship structure diagram; according to the frequent patterns of co-occurrence of different types of analysis of the relationship between patent and entity related information, the final decision model of patent ontology classes and their relationships; finally, the existing ontology and reasoning rules are given based on the relationship, through these rules can be used in the existing examples to obtain the underlying entity semantic relation in the patent modeling. The proposed method is proved by experiments can reduce the time cost of patent ontology modeling, this model can well cover the fields of entity relationship type, then Effective organization and management of patent structure related knowledge.
(2) the method of data acquisition of patent structure based on self learning
Study on the full use of the reflection levels of text mode characteristics of patent follow the written specification document, presented by the realization of the relationship between the characteristics and the relationship between the entity extraction of patent document. In the text preprocessing phase, through statistical learning, obtain the relation instances in the corresponding text in the performance of various pattern features such as word collocation, phrases the rules of pattern formation, multi-level relationship between sentences and so on; then, a small amount of a given entity instance relationship as seed, relation extraction template initial semantic feature structure is based on the example of the relationship between seed, through self-learning method to extract multi entity relation; finally, obtain the implicit entity relationship between sentences through text segment parsing process.
(3) typical application of patent knowledge data
In a typical demonstration of the application of patent analysis, method of comparative analysis of patent technology structure based on greedy algorithm is given, and then put forward the similar sub structure bottom-up computation of patent similarity based methods; and then clustering analysis of patent patent similar structure based on similarity analysis technology patent. The experimental result shows that the structure of patent knowledge can improve the accuracy of patent analysis results.
(4) the construction and application of the patent ontology
The process of ontology construction, ontology building tools include: the use of examples obtained from patent mining model; document extraction of all kinds of relations between feature words, sentence patterns and other information from the document; extract relation instances. Finally, put forward the patentee similarity of patent knowledge to achieve a new patent user partner recommendation analysis method based on.

【学位授予单位】:武汉大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:TP391.1

【参考文献】

相关期刊论文 前2条

1 周炜;郑建荣;颜建军;;基于子图同构与事例匹配的装配体局部结构相似性分析[J];计算机辅助设计与图形学学报;2010年02期

2 吴平博;陈群秀;马亮;;基于时空分析的线索性事件的抽取与集成系统研究[J];中文信息学报;2006年01期



本文编号:1718599

资料下载
论文发表

本文链接:https://www.wllwen.com/falvlunwen/zhishichanquanfa/1718599.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f5f92***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com