跨语言词语语义相似词抽取方法研究与实现
[Abstract]:With the continuous improvement of computer application level, human beings are more and more eager to think and deal with all kinds of problems like human beings. As the most important part of natural language processing, its application can enable computers to better understand the needs of human beings. Semantic similarity reflects the semantic similarity of a group of documents, phrases and words. The semantic similarity of cross-language words is a measure of the same meaning expressed by two (or more) words in two (or more) different languages pointing to similar semantic concepts. As an important part of information processing, semantic similarity of cross-language words has great application and research value in artificial intelligence, natural language processing, information retrieval and other fields with the promotion and development of big data technology. As the most important two languages, Chinese and English have been widely used in the fields of economy, culture, trade, education and so on, showing their important position and function. Therefore, this paper takes these two languages as the main research object. At present, the calculation of semantic similarity of cross-language words is mainly divided into several categories: first, methods based on semantic knowledge base rules; second, methods based on corpus statistics; third, hybrid methods. The semantic knowledge base contains all kinds of semantic knowledge manually specified by human beings, which together form a complex semantic network, so the method based on knowledge base rules will make full use of the semantic knowledge that exists in it. We get enough data by manual input, network crawling, crowdfunding and so on. We use probability statistics, machine learning and other methods to calculate the semantic similarity between words in two different languages, but there is a problem of uneven word distribution in this method, which leads to the deviation of similarity calculation results. The combination of semantic knowledge base and corpus statistics can make up for the shortcomings of the above problems, and has been paid more and more attention by researchers at present. Firstly, using the Chinese concept dictionary (CCD) and WordNet semantic knowledge base, we construct the CSWE (Chinese Semantic Similar Words Extraction) model and ESWE (English Se-mantic Similar Words Extraction), apply the two models to the extraction of semantic similar words in Chinese and English respectively. The experimental results show that the CSWE and ESWE models not only ensure the correctness of semantic similar words extraction, but also for the larger data sets. The less time it takes to extract k similar words. In addition, we extend the CSWE and ESWE models to become CLS WE (Cross-language Semantic Similar Words Extractions) models suitable for cross-linguistic semantic similarity word extraction. In order to better show the good performance of the model, the data set WordSim353 and RW with different data sizes and different words are used to verify the model. Firstly, 77 words co-existing in WordSim353 and RW are extracted as English correctness verification data set, and the 77 English words are translated into several groups, and the most matching Chinese words are finally determined, and finally the Chinese correctness verification data set is obtained. Through experiments, it can be found that compared with the benchmark strategy model, the CLSWE model proposed in this paper can ensure the correctness of the extraction of semantic similar words across languages. At the same time, for the larger the data set, it can extract the first k words that are most similar to the query words in a shorter period of time. Compared with the benchmark strategy model, the cross-language semantic similar word extraction model proposed in this paper has achieved good experimental performance.
【学位授予单位】:南京师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1
【相似文献】
相关期刊论文 前10条
1 胡艳波;崔新春;路青;;2002~2011年国内语义相似度研究计量分析[J];情报科学;2013年07期
2 王家琴;李仁发;李仲生;唐剑波;;一种基于本体的概念语义相似度方法的研究[J];计算机工程;2007年11期
3 刘俊;;基于语义相似度的关键词生成在企业搜索引擎营销中应用[J];电脑知识与技术;2008年14期
4 宗裕朋;吴刚;;一种基于上下文的语义相似度算法[J];微计算机信息;2008年30期
5 刘春辰;刘大有;王生生;赵静滨;王兆丹;;改进的语义相似度计算模型及应用[J];吉林大学学报(工学版);2009年01期
6 徐猛;刘宗田;周文;;一种基于知网语义相似度计算的应用研究[J];微计算机信息;2010年03期
7 孙海霞;钱庆;成颖;;基于本体的语义相似度计算方法研究综述[J];现代图书情报技术;2010年01期
8 魏椺;向阳;陈千;;计算术语间语义相似度的混合方法[J];计算机应用;2010年06期
9 马续补;郭菊娥;;基于《知网》语义相似度的企业事实主题诊断研究[J];情报杂志;2010年05期
10 魏凯斌;冉延平;余牛;;语义相似度的计算方法研究与分析[J];计算机技术与发展;2010年07期
相关会议论文 前10条
1 关毅;王晓龙;;基于统计的汉语词汇间语义相似度计算[A];语言计算与基于内容的文本处理——全国第七届计算语言学联合学术会议论文集[C];2003年
2 李月雷;师瑞峰;林丽冰;周一民;;汉语语句语义相似度的计算方法[A];2008'中国信息技术与应用学术论坛论文集(一)[C];2008年
3 冯新元;魏建国;路文焕;党建武;;引入领域知识的基于《知网》词语语义相似度计算[A];第十二届全国人机语音通讯学术会议(NCMMSC'2013)论文集[C];2013年
4 章成志;;词语的语义相似度计算及其应用研究[A];NCIRCS2004第一届全国信息检索与内容安全学术会议论文集[C];2004年
5 刘寒磊;关毅;徐永东;;多文档文摘中基于语义相似度的最大边缘相关技术研究[A];全国第八届计算语言学联合学术会议(JSCL-2005)论文集[C];2005年
6 石静;邱立坤;王菲;吴云芳;;相似词获取的集成方法[A];中国计算语言学研究前沿进展(2009-2011)[C];2011年
7 陈明;鹿e,
本文编号:2503476
本文链接:https://www.wllwen.com/jingjilunwen/jiliangjingjilunwen/2503476.html