当前位置:主页 > 科技论文 > 搜索引擎论文 >

语句压缩及其应用研究

发布时间:2018-02-10 07:53

  本文关键词: 语句压缩 结构化学习方法 整型规划 多文档自动文摘 自然语言处理 出处:《苏州大学》2013年硕士论文 论文类型:学位论文


【摘要】:近年来随着各类信息的日益增多,人们对信息处理的需求促进了自然语言处理技术的空前发展。与此同时,,人们对如何在海量的信息中及时找出有价值的信息越来越关注,语句压缩作为自然语言处理的基础日益受到研究者的关注。语句压缩可以应用于自动摘要、自动标题、搜索引擎和话题检测等诸多任务中。 目前,主流的语句压缩研究主要基于语料驱动的监督模型。本文采用监督模型中的判别式模型,通过学习成分句法树的剪裁实现语句压缩。本文的研究内容主要包括以下几个方面: 1、基于结构化学习的语句压缩研究。首先,使用匹配抽取的方式构建中文平行语料库;然后,提出语料库扩展方式,为解决语料库规模小的问题提供了新思路;最后,使用结构化学习算法学习源语句成分句法树的剪裁过程,并实现语句压缩。实验结果表明,基于结构化学习的语句压缩模型具有很好的性能,且提出的语料库扩展方式具有可行性。 2、语句压缩解码方法研究。在基于判别式模型的方法下,提出使用整型规划方法解码。通过将语句压缩问题转换为整型规划问题实现最优目标语句的查询,该解码方法能在保持较好的压缩率的情况下保留源语句的主要信息。 3、语句压缩评测指标研究。针对语句压缩缺乏合适的自动评测指标,本文在基于单词删除的语句压缩系统中,引入了BLEU和N-Gram两种评测指标用于评测语句压缩性能,并通过实验验证了这两种评测指标的适用性。 4、语句压缩的应用研究。以语句压缩的应用为切入点,将语句压缩系统应用于多文档自动文摘任务中。实验结果表明压缩系统在多文档自动文摘中能够删除部分句子级别的非重要信息且不影响文摘的可读性。
[Abstract]:In recent years, with the increasing of all kinds of information, people's demand for information processing has promoted the unprecedented development of natural language processing technology. At the same time, people pay more and more attention to how to find valuable information in a large amount of information in time. Sentence compression, as the foundation of natural language processing, has attracted more and more attention from researchers. Sentence compression can be used in many tasks, such as automatic summary, automatic title, search engine and topic detection. At present, the mainstream research on sentence compression is mainly based on corpus driven supervisory model. Sentence compression is realized by tailoring the syntactic tree of learning components. The research content of this paper mainly includes the following aspects:. 1. The research of sentence compression based on structured learning. Firstly, we construct Chinese parallel corpus by matching extraction. Then, we propose a new method of corpus expansion, which provides a new way to solve the problem of small size of corpus. The structural learning algorithm is used to learn the clipping process of the syntactic tree of source sentence components, and the sentence compression is realized. The experimental results show that the model of sentence compression based on structured learning has a good performance. And the proposed corpus expansion is feasible. 2. Research on the method of sentence compression and decoding. Based on discriminant model, an integer programming method is proposed to decode. The query of the optimal target statement is realized by converting the sentence compression problem to the integer programming problem. The decoding method can retain the main information of the source statement while keeping a good compression ratio. 3. The research of sentence compression evaluation index. Aiming at the lack of proper automatic evaluation index, this paper introduces BLEU and N-Gram in the sentence compression system based on word deletion, which is used to evaluate the performance of sentence compression. The applicability of these two evaluation indexes is verified by experiments. 4. Research on the application of sentence compression. The sentence compression system is applied to the task of multi-document automatic abstracting. The experimental results show that the system can delete some non-important information at sentence level and does not affect the readability of the abstract.
【学位授予单位】:苏州大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.1

【参考文献】

相关期刊论文 前4条

1 张明慧;王红玲;周国栋;;基于LDA主题特征的自动文摘方法[J];计算机应用与软件;2011年10期

2 秦兵,刘挺,李生;多文档自动文摘综述[J];中文信息学报;2005年06期

3 张瑾;王小磊;许洪波;;自动文摘评价方法综述[J];中文信息学报;2008年03期

4 沈洲,王永成,许一震,方澈;自动文摘系统评价方法的研究与实践[J];情报学报;2001年01期



本文编号:1500031

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1500031.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户7011b***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com