科技项目相似性检测系统研究
发布时间:2018-06-15 08:17
本文选题:科技项目 + 相似度计算 ; 参考:《杭州电子科技大学》2011年硕士论文
【摘要】:随着政府对科技项目投入的经费逐年增加,科技项目申报数量和范围的逐步扩大,导致了项目审查和评审工作量的剧增,同时引起了种种管理上的难题,出现了很多“重复立项”的问题。目前,文档复制检测技术在保护知识产权和优化搜索引擎方面应用广泛,但在科技项目管理领域应用甚少。本论文主要研究基于科技项目知识表示模型的一种基于字段的相似度计算方法及其系统,便于高效、准确地查找相似的科技项目,向项目评审者提供预警,有效防止同类科技项目重复立项。论文主要研究工作包括: 1针对科技项目知识表示,研究提出一种结合向量空间模型和物元模型的知识表示模型。通过科技项目文本分词获取关键词,运用TF方法计算关键词权重,从而建立科技项目知识表示模型。 2基于项目知识表示模型,研究提出一种基于字段结构的科技项目相似度计算方法。本方法以科技项目单个字段作为关键词词频统计单元,采用字符串Hash匹配的方法,根据向量余弦公式计算项目对应字段之间的相似度,再对各字段相似度加权平均得到项目之间的相似度。 3基于以上研究成果开发了科技项目相似性检测系统。系统由项目知识库、项目知识构建模块、相似度计算模块、判断和解释模块以及并行计算任务管理模块组成。首先,项目知识构建模块为待审查项目知识和已立项项目知识信息构建项目知识模型。然后,相似度计算模块根据这两个项目知识模型计算出相似度。最后,判断和解释模块根据相似度来判断项目之间的相似关系。而并行计算任务管理模块的主要任务则是对相似度计算模块、判断和解释模块进行并行处理。 本文开发的科技项目相似性检测系统已应用于浙江省科技项目管理系统中,应用验证了论文研究成果的可行性和有效性,对重复立项问题提供了良好的检查和预警手段。
[Abstract]:With the increase of government expenditure on science and technology projects year by year, the number and scope of scientific and technological projects have been gradually expanded, resulting in a sharp increase in the workload of project review and evaluation, and at the same time causing a variety of management problems. There are a lot of "duplicate project" problems. At present, document copy detection technology is widely used in intellectual property protection and search engine optimization, but it is rarely used in the field of science and technology project management. This paper mainly studies a field based similarity calculation method and its system based on the knowledge representation model of science and technology projects, which is convenient to find similar science and technology projects efficiently and accurately, and provide early warning to project reviewers. Effectively prevent similar scientific and technological projects from being repeated. The main work of this paper is as follows: 1 for the knowledge representation of science and technology projects, a knowledge representation model combining vector space model and matter-element model is proposed. The key words are obtained by the text segmentation of scientific and technological projects, and the weight of keywords is calculated by TF method, and then the model of knowledge representation of scientific and technological projects is established. 2 based on the model of knowledge representation of science and technology projects, This paper presents a method for calculating the similarity of scientific and technological items based on field structure. In this method, the single field of scientific and technological project is used as the key word frequency statistic unit, and the matching method of string Hash is used to calculate the similarity between the corresponding fields according to the vector cosine formula. Then the similarity of each field is weighted to get the similarity between items. 3 based on the above research results, a similarity detection system for scientific and technological projects is developed. The system consists of project knowledge base, project knowledge construction module, similarity calculation module, judgment and interpretation module and parallel computing task management module. First, the project knowledge construction module constructs the project knowledge model for the project knowledge to be examined and the project knowledge information established. Then, the similarity calculation module calculates the similarity according to the two item knowledge models. Finally, the judgment and explanation module judge the similarity between items according to similarity. The main task of the parallel computing task management module is to process the similarity calculation module, judgment and interpretation module in parallel. The similarity detection system of scientific and technological projects developed in this paper has been applied to the scientific and technological project management system of Zhejiang Province. The application verifies the feasibility and effectiveness of the research results in this paper and provides a good means of checking and early warning for repeated projects.
【学位授予单位】:杭州电子科技大学
【学位级别】:硕士
【学位授予年份】:2011
【分类号】:G311;TP315
【参考文献】
相关期刊论文 前10条
1 麻会东;刘国华;梁鹏;苑迎;;文档复制检测技术[J];燕山大学学报;2007年05期
2 孙茂松,邹嘉彦;汉语自动分词研究评述[J];当代语言学;2001年01期
3 史彦军,滕弘飞,金博;抄袭论文识别研究与进展[J];大连理工大学学报;2005年01期
4 金博,史彦军,滕弘飞;基于语义理解的文本相似度算法[J];大连理工大学学报;2005年02期
5 党延忠;;项目关联分析与宏观知识挖掘[J];管理学报;2008年04期
6 陈桂林,王永成,韩客松,王刚;一种改进的快速分词算法[J];计算机研究与发展;2000年04期
7 宋擒豹,沈钧毅;数字商品非法复制和扩散的监测机制[J];计算机研究与发展;2001年01期
8 刘群,张华平,俞鸿魁,程学旗;基于层叠隐马模型的汉语词法分析[J];计算机研究与发展;2004年08期
9 陈文亮;朱靖波;朱慕华;姚天顺;;基于领域词典的文本特征表示[J];计算机研究与发展;2005年12期
10 李素建;基于语义计算的语句相关度研究[J];计算机工程与应用;2002年07期
相关硕士学位论文 前2条
1 沈斌;基于分词的中文文本相似度计算研究[D];天津财经大学;2006年
2 丁琼;基于向量空间模型的文本自动分类系统的研究与实现[D];同济大学;2007年
,本文编号:2021370
本文链接:https://www.wllwen.com/guanlilunwen/keyanlw/2021370.html