面向语义搜索的汉语名名组合的自动释义研究
发布时间:2018-05-15 19:36
本文选题:名名组合 + 自动释义 ; 参考:《北京大学》2012年硕士论文
【摘要】:本文以现代汉语(特别是网络搜索词)中的名名组合的语义关系为主要研究对象。名名组合内部的语义关系复杂,常常隐含了谓词。对名名组合进行释义的主要目的是发现两个名词之间隐含的谓词,进而揭示这两个名词之间的语义关系。本文在生成词库论等理论的指导下,提出了一种自上而下与自下而上相结合的方法,设计并实现了自动生成由两个名词构成的名名组合的释义短语的程序。 本文首先搜集、分析了谷歌热榜词和百度新闻热搜词,发现名名组合在网络搜索词中占有重要地位,研究名名组合的自动释义对信息检索等自然语言处理应用有所帮助;然后本文借鉴生成词库论,结合《现代汉语语义词典》,对来自百度新闻热搜词、前人文献和各种小说、散文中的850个名名组合进行了归纳,总结得出了356个语义类组合模式及其相应的释义模板,在此基础上建立了名名搭配数据库Noun_Noun;接着,本文利用《知网》资源,进一步建立了名词知识库Noun_Verb;最后,本文在名名搭配数据库Noun_Noun和名词知识库Noun_Verb的基础上,进一步开发了汉语名名组合的自动释义程序。 我们设计的自动生成名名组合释义短语的程序,主要有5个操作步骤:(1)对于输入的名名组合首先进行切词、标注词性操作,得到词串N1+N2,确定为名名组合;(2)分别查询N1和N2在数据库Noun_Verb中的语义类S1和S2;(3)在数据库Noun_Noun中查找语义类组合模式为S1+S2的释义模板;(4)根据释义模板的要求在数据库Noun_Verb中查找相关名词的施成角色或者功能角色(动词),作为表示N1和N2之间的语义关系的谓词;(5)将动词、N1、N2插入至释义模板中,生成释义短语。 在程序建立以后,我们以2011年5月至9月的百度新闻热搜词中的名名组合作为测试数据,检验了程序的有效性。通过研究和程序测试,本文还为《现代汉语语义词典》和《知网》提出了一些改进意见和建议。本文希望能够实现语言资源和应用系统的良性互动,同时,通过开发名名组合自动释义程序,本文深感建设基础语言资源的必要性和重要性。 在国内,关于汉语名名组合自动释义的研究,比较具有代表性的是王萌、黄居仁、俞士汶、李斌(2010)。跟王萌等(2010)的研究相比,本文具有3个特点:(1)释义模板更为丰富;(2)释义短语更为自然;(3)多种方法有机结合。 跟王萌等(2010)的研究相比,本文的不足之处是:(1)我们的研究成果在很大程度上依赖于人工建构的释义模板和相关的知识库,操作的步骤比较多,没有王萌等(2010)的系统智能;(2)我们归纳的释义模板、名词的施成角色以及功能角色还不够完善,还需要在使用过程中不断扩充和改进。 本文还提出了一些进一步改进名名组合自动释义程序的设想。我们期望,在进一步完善名名组合自动释义程序之后,它能够更好地为搜索引擎、机器翻译等自然语言处理任务服务。
[Abstract]:This paper focuses on the semantic relationship of the combination of names and names in modern Chinese, especially in Internet search terms. The semantic relationship within name-name combination is complex, and predicates are often implied. The main purpose of defining the combination of names is to discover the implicit predicates between the two nouns, and then to reveal the semantic relationship between the two nouns. Under the guidance of generative lexicon theory, this paper proposes a method of combining top-down and bottom-up, and designs and implements a program to automatically generate the interpretive phrases composed of two nouns. Firstly, this paper collects and analyzes Google's hot list words and Baidu News's hot search words, and finds that the combination of names plays an important role in the network search words. The research on the automatic interpretation of name combination is helpful to the application of natural language processing such as information retrieval. Then this paper draws lessons from generative lexicon theory and combines the Modern Chinese semantic Dictionary to sum up 850 famous names from Baidu News Hot search words, previous literature and various novels and prose. In this paper, 356 semantic class combination patterns and their corresponding interpretation templates are summarized. On this basis, a noun collocation database, NounNouns, is set up. Then, the noun knowledge base Nouns Verb is further established by using the knowledge net resources. Based on the name collocation database Noun_Noun and the noun knowledge base Noun_Verb, this paper further develops the automatic interpretation program of the Chinese name combination. The program that we designed to automatically generate name-name combination interpretive phrases has five main operation steps: (1) for the input name combination, we first cut words and annotate the part of speech operation. Get the string N1 N2, determine the combination of name and name.) query the semantic classes S1 and S2 in database Noun_Verb of N1 and N2 respectively) find semantic class combination schema of S1 S2 in database Noun_Noun and interpret template 4) according to the request of interpretation template. In the library Noun_Verb, the verb noun is inserted into the interpreted template (verb noun, as a predicate to indicate the semantic relationship between N1 and N2), or functional role (verb / functional role). Generate an interpreted phrase. After the program was set up, we tested the validity of the program by using the name combination of Baidu News hot search words from May to September 2011 as the test data. Through research and program testing, this paper also puts forward some suggestions and suggestions for the semantic Dictionary of Modern Chinese and the Web of knowledge. This paper hopes to realize the benign interaction between the language resources and the application system. At the same time, through the development of the name combination automatic interpretation program, this paper deeply feels the necessity and importance of the construction of the basic language resources. In China, Wang Meng, Huang Juren, Yu Shiwen and Li Bin 2010 are more representative of the automatic interpretation of Chinese name combination. Compared with the study of Wang Meng et al. (2010), this paper has three characteristics: 1) the template of interpretation is more abundant and 2) the phrase of interpretation is more natural than the other 3). Compared with the research by Wang Meng et al. (2010), the disadvantage of this paper is that: 1) our research results depend to a large extent on artificially constructed interpretation templates and related knowledge bases, and there are many steps to operate. Without Wang Meng et al. (2010) the definition template, the roles of nouns and their functions are not perfect enough, and need to be expanded and improved in the process of use. This paper also puts forward some ideas to improve the automatic interpretation program of name combination. We hope that with the further improvement of the name combination automatic interpretation program, it can better serve the natural language processing tasks such as search engine, machine translation and so on.
【学位授予单位】:北京大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:H146
【参考文献】
相关期刊论文 前10条
1 孔令达;“名_1+的+名_2”结构中心名词省略的语义规则[J];安徽师大学报(哲学社会科学版);1992年01期
2 宋作艳;;字族化与汉语未登录词的自动提取[J];北京大学学报(哲学社会科学版);2007年02期
3 沈阳;领属范畴及领属性名词短语的句法作用[J];北京大学学报(哲学社会科学版);1995年05期
4 董振东,董强;知网和汉语研究[J];当代语言学;2001年01期
5 J.Pustejovsky;张秀松;张爱玲;;生成词库论简介[J];当代语言学;2009年03期
6 袁毓林;陈振宇;张秀松;李湘;周强;高嵩;;从认知假设到计算分析和程序实现——一种认知语言学研究的计算范式与技术路线[J];当代语言学;2010年02期
7 周日安;;名名组合的语义折叠与受事域外化[J];佛山科学技术学院学报(社会科学版);2010年02期
8 宋春阳;;现代汉语名+名语义关系的识别及序位研究[J];华东师范大学学报(哲学社会科学版);2007年03期
9 程书秋;附加性联合短语初探[J];哈尔滨学院学报;2005年06期
10 袁毓林;;对“词类是表述功能类”的质疑[J];汉语学报;2006年03期
相关硕士学位论文 前1条
1 李光群;汉英“名+名结构”对比分析及互译研究[D];华中师范大学;2007年
,本文编号:1893638
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1893638.html