当前位置:主页 > 科技论文 > 搜索引擎论文 >

词语语义相关度计算研究

发布时间:2018-03-26 12:04

  本文选题:语义相关度 切入点:核函数 出处:《华中师范大学》2013年硕士论文


【摘要】:词语语义相关度是表示两个词语相关程度的一个概念,它反映的是词语的关联程度,即看到一个词语,是不是可以想到另外一个词语,我们可以用两个词语在同一语境下共同出现的可能性来衡量这两个词语的语义相关度。语义相似度和语义相关度是两个很容易混淆的概念,语义相似度是指词语之间的相似性。语义相关度和语义相似度之间是有联系的,如果两个词语语义相似,那么它们一定语义相关,但是反过来,如果两个词语语义相关,它们不一定语义相似,所以我们可以将语义相似度作为语义相关度计算的一个组成部分。 语义相关度计算对于机器翻译、信息检索、文本分析等自然语言处理研究任务具有重要意义,是一项基础性的研究工作。本文研究了现有的语义相关度计算方法,然后提出了一种基于搜索引擎的语义相关度计算方法,具体的工作如下: 第一、现有的词语语义相关度计算方法大致可以分为传统的语义相关度计算方法和基于网络百科全书的语义相关度计算方法;而传统的方法又可以进一步分为两类:基于语义词典(WordNet、知网)的计算方法和基于语料库的计算方法。本文对这些方法需要用到的语义资源做了详细的介绍,紧接着阐述了每一类中具有代表性的几种语义相关度计算方法,详细分析它们的理论基础和特点。 第二、提出了一种核函数与Page Counts相结合的语义相关度计算方法,Page Counts是我们使用搜索引擎进行查询时返回的页面数。这为我们进行语义相关度研究提供了一个新的方向,充分利用高速发展的网络技术,为我们的研究服务。同时,我们还从以下三个方面验证了该方法的有效性:1、分析其理论依据;2、在标准测试集上实验,然后与人工判断的结果做比较;3、特定环境下评估该方法。通过实验验证,本文提出的方法与单独使用核函数或者Page Counts计算语义相关度对比,得到的结果与人工判断的结果更接近,所以本文提出的方法是有效的。 第三、本文介绍了语义相关度计算的一个应用——文本聚类,在词语语义相关度计算结果的基础上,对文本的语义相关度进行计算,我们可以提高文本聚类的精度。
[Abstract]:Semantic relevance of words is a concept that indicates the correlation between two words. It reflects the degree of relevance of a word, that is, if you see a word, can you think of another word? We can use the possibility that two words appear together in the same context to measure the semantic relevance of the two words. Semantic similarity and semantic relevance are two very confusing concepts. Semantic similarity refers to the similarity between words. There is a connection between semantic similarity and semantic similarity. If two words are semantic similar, then they must be semantically related, but conversely, if two words are semantically related, They are not necessarily semantic similarity, so we can use semantic similarity as an integral part of semantic correlation calculation. Semantic relevance computing is of great significance to natural language processing research tasks such as machine translation, information retrieval, text analysis and so on. Then, a method of semantic relevance calculation based on search engine is proposed. The specific work is as follows:. First, the existing semantic relevance calculation methods can be roughly divided into traditional semantic relevance calculation method and network encyclopedia based semantic relevance calculation method. However, the traditional methods can be further divided into two categories: the computing methods based on semantic dictionary (WordNet) and the methods based on corpus. In this paper, the semantic resources that need to be used in these methods are introduced in detail. Then, several representative semantic correlation calculation methods in each class are introduced, and their theoretical basis and characteristics are analyzed in detail. Secondly, this paper proposes a semantic relevance calculation method which combines kernel function with Page Counts. Page Counts is the number of pages returned when we use search engine to query, which provides a new direction for us to study semantic relevance. At the same time, we verify the validity of this method from the following three aspects, analyze its theoretical basis and experiment on the standard test set. Then compared with the result of manual judgment, the method is evaluated in a specific environment. The experimental results show that the method proposed in this paper is compared with the semantic correlation calculated by using kernel function or Page Counts alone. The results obtained are closer to those obtained by manual judgment, so the method proposed in this paper is effective. Thirdly, this paper introduces an application of semantic relevance calculation-text clustering. On the basis of the result of semantic correlation, we can improve the accuracy of text clustering by calculating the semantic relevance of text.
【学位授予单位】:华中师范大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.1

【参考文献】

相关期刊论文 前3条

1 许云,樊孝忠,张锋;基于知网的语义相关度计算[J];北京理工大学学报;2005年05期

2 吴友政,赵军,段湘煜,徐波;问答式检索技术及评测研究综述[J];中文信息学报;2005年03期

3 董振东;董强;郝长伶;;知网的理论发现[J];中文信息学报;2007年04期

相关博士学位论文 前1条

1 钟茂生;基于内容相关度计算的文本结构分析方法研究[D];上海交通大学;2010年



本文编号:1667827

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1667827.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户229e8***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com