基于自然语言处理的疑似侵权专利智能检索研究

发布时间：2018-01-04 13:08

本文关键词：基于自然语言处理的疑似侵权专利智能检索研究　出处：《江苏大学》2017年硕士论文　论文类型：学位论文

【摘要】：专利文献作为技术信息最有效的载体,囊括了全球90%以上的最新技术成果,对于知识产权的保护起着至关重要的作用。随着目前专利数量的不断增长以及专利侵权诉讼的日益频繁,专利侵权检索已成为情报科学领域的研究热点之一。传统的专利侵权检索主要是通过构建检索式从专利检索系统中检索相关专利,然后人工筛选出具有侵权风险的专利,不仅耗时耗力还容易受主观因素的影响。因此,研究具有自动检索疑似侵权专利的智能检索算法具有重要的现实意义。本文在介绍了专利侵权检索所涉及的侵权判定、文本预处理、相似度计算等基础上,重点研究了专利侵权检索系统的核心,即疑似侵权专利检测算法。论文就当前专利侵权检索研究中存在的特征选择不合理、权利要求书信息利用不充分等问题提出相应的解决方案。本文的主要工作如下:(1)针对中文专利侵权检索过程中关键词特征表达能力弱等问题,提出了一种基于三元组特征覆盖度计算的侵权专利检测方法。该方法将专利权利要求抽取为三元组特征的集合,并结合词向量技术和HowNet计算三元组特征间的语义相似度。通过对专利技术特征集合覆盖度算法的改进,有效提高了对疑似侵权专利的识别能力。实验结果表明,该方法取得较好的检索效果和准确率。(2)针对依存句法分析器稳定性差而影响三元组特征抽取以及方法类专利检索准确率低等问题,提出了一种基于句子相似度计算的侵权专利检测算法。该算法以句子作为最小计算单元,对权利要求书进行树状结构化构建,并结合侵权判定规则设计了一种树匹配算法,对树状权利要求书进行侵权程度的计算。通过与现有的侵权检索算法进行实验对比表明,该算法具有一定的优势。(3)在Java平台下,采用面向对象的思想,设计并实现了具有数据库更新、预处理、初步检索、侵权检测等功能的中文疑似侵权专利智能检索系统。其中侵权检测模块实现了本文所提出的两种检测方法,其余模块也对传统的方法进行了改进。
[Abstract]:The patent literature as the most effective carrier for technical information, including more than 90% of the world's latest technology, for the protection of intellectual property plays a vital role. With the increasing number of patents and patent infringement litigation is becoming more and more frequent, patent infringement retrieval has become one of the hot research field of information science. The traditional patent infringement retrieval is mainly through the construction of retrieval from the patent search patent retrieval system, and then manually screened with the risk of infringement of patent, not only time-consuming but also easy to be affected by subjective factors. Therefore, the research of intelligent automatic retrieval with suspected infringement of patent retrieval algorithm has important practical significance. This paper introduces the patent infringement retrieval involved in the infringement, text preprocessing, similarity calculation basis, focus on the core of patent infringement retrieval system, That is suspected of patent infringement detection algorithm. The current patent infringement retrieval features in the research of selection is not reasonable, the right to put forward the corresponding solutions by the problem of insufficient demand book information. The main work of this paper are as follows: (1) according to the Chinese patent infringement retrieval keyword feature expression ability in the process of weak and other issues, put forward a three tuple feature coverage based on the calculation of patent infringement detection method. This method will be a collection of patent claims for three tuple feature extraction, and combining the word vector and HowNet semantic similarity calculation of three tuple features. Through the collection of improved coverage algorithm to improve the technical features of the patent, suspected of infringement of patent the ability of recognition. The experimental results show that this method has better retrieval effect and accuracy. (2) according to the dependency parser and the impact of poor stability of three yuan The retrieval accuracy and low feature extraction method patent, this paper presents a calculation based on sentence similarity of patent infringement detection algorithm. In this algorithm, as the minimum sentence calculation unit for claims of tree structured construction, and design a tree matching algorithm combining ofinfringement rules to calculate tort claims tree degree. Through experimental comparison with existing infringement retrieval algorithm show that this algorithm has certain advantages. (3) in the Java platform, using object oriented method, the design and implementation of a database update, preprocessing, initial retrieval, Chinese suspected infringement of patent infringement retrieval system intelligent detection function. Infringement detection module realizes two kinds of detection method proposed in this paper, the method of module also improved the traditional.

【学位授予单位】：江苏大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.1

【参考文献】