基于新词发现的服务匹配算法研究及实现

发布时间：2018-07-21 12:31

【摘要】：随着近些年来网络上Web服务数量的爆发增长,如何从海量的服务里匹配到最佳的服务从而达到Web服务复用和Web服务组合的目的,成为了业界研究的热点。传统的解决方案因为缺乏语义层面的匹配机制,其结果无论是从查全率还是查准率来说都比较不理想,另一方面,部分研究在语义网技术的推动下使用Web服务的语义描述来提高机器的理解能力,但是依然存在部分Web服务因为没有相关语义描述从而造成无法查找的情况。搜索日志是大量的查询点击行为产生的数据,意味着查询串与目标串之间的潜在语义联系可以通过文本处理等手段进行挖掘,本文尝试借助搜索日志来解决上述问题。具体包括:通过CRF算法和相关统计手段对搜索日志进行新词挖掘得到新词词典,然后对查询串进行新词识别实现查询串的预处理;提出基于搜索日志的新词语义相似度计算算法来建立新词之间的语义距离评价标准,从而实现服务查询的语义扩展;提出一种Web服务形式化描述模型的构建算法,对Web服务进行建模从而能够和处理过的查询串进行匹配来完成整个流程的最后一步。其中,新词能够被用来对查询串进行查询优化和语义扩展,从而使得加入了语义层面的匹配算法相比于传统服务匹配,匹配质量也有了显著提高。另一方面,对不同的Web服务类型分别进行相应的处理,得到了服务的形式化描述模型,对匹配系统而言屏蔽了Web服务类型的差异,为后续服务查询匹配提供了方便。最后本文设计并实现了基于新词发现的服务匹配算法,该算法在传统算法的基础上完成了基于语义的服务匹配,同时也改善了服务匹配的质量和效果。
[Abstract]:With the increase of the number of Web services on the network in recent years, how to match the best services from a large number of services to achieve the purpose of Web service reuse and Web service composition has become a hot topic in the industry. Because of the lack of semantic matching mechanism in traditional solutions, the results are not ideal in terms of recall or recall, on the other hand, Part of the research uses the semantic description of Web services to improve the understanding ability of the machine, but there are still some Web services because there is no related semantic description to make it impossible to find. Search log is a large amount of data generated by query click behavior, which means that the potential semantic relationship between query string and target string can be mined through text processing. This paper attempts to solve the above problem by means of search log. The details include: using CRF algorithm and related statistical means to mine the new words in search log to obtain the neologism dictionary, and then to realize the preprocessing of the query string by the new word recognition of the query string; A semantic similarity calculation algorithm based on search log is proposed to establish the semantic distance evaluation standard between new words, so as to realize the semantic extension of service query, and a formal description model of Web services construction algorithm is proposed. The Web service is modeled to match the processed query string to complete the final step of the process. Among them, neologisms can be used for query optimization and semantic extension of query strings, so the matching quality of the matching algorithm with semantic level is significantly improved compared with traditional service matching. On the other hand, the different types of Web services are dealt with respectively, and the formal description model of the services is obtained, which shields the differences of the types of Web services for the matching system, and provides convenience for the subsequent service query matching. Finally, this paper designs and implements a service matching algorithm based on neologism discovery, which completes the service matching based on semantics based on the traditional algorithm, and also improves the quality and effect of service matching.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.1;TP393.09

【参考文献】