当前位置:主页 > 科技论文 > 软件论文 >

基于引文上下文的学术文献摘要方法研究

发布时间:2018-04-19 12:02

  本文选题:学术文献摘要 + 引文上下文 ; 参考:《西北农林科技大学》2017年硕士论文


【摘要】:随着科研产出的不断增加,爆发式增长的学术文献给学术研究人员检索和查阅带来困难,科研工作量也日益加大。学术文献自动摘要方法借助现代计算机技术自动从海量文献中高效准确地获取信息成为当前研究的热点。本文以引文上下文为研究对象,针对当前基于引文的学术文献摘要方法存在的不足,设计并实现了基于引文上下文的学术文献摘要方法,改善了学术文献自动摘要的质量。本文的主要研究内容和成果如下:(1)设计基于卷积神经网络的引文上下文分类算法。首先分析学术文献的结构特点,提出了基于学术文献论述结构的分类模型,解决摘要信息覆盖不全的问题。为了定位引文上下文在被引文献中所处的论述点,模拟卷积神经网络在图像领域中的应用,设计了简单的神经网络模型结构,并采用基于深度学习的词向量表示作为句子的输入,实现基于卷积神经网络的引文上下文分类。选择CNN-static和CNN-non-static两种模式与传统的基于线性SVM的分类算法进行对比实验,实验结果表明基于卷积神经网络的引文上下文分类在两种模式下都取得较高的准确率,其中CNN-non-static的准确率最高达79.91%,整体平均提高了3.12%,能有效解决引文上下文论述点的分布问题。摘要评测结果也证实该方法提高了摘要的信息量和可读性。(2)构建基于向量空间模型的引文上下文抽取算法。分析引文与引文上下文之间的关系,对引文句子进行特征选择与权值计算,构建引文和被引文献的向量空间模型,采用余弦距离度量引文和引文上下文之间的语义关系,实现利用引文从被引文献中抽取引文上下文。抽取结果表明,引文和引文上下文表述一致性较低,抽取准确率较低,总体均值为17.66%。同时也说明引文并不能准确反映被引文献,基于引文的学术文献摘要存在与被引文献信息不一致的缺陷。(3)提出基于图的句子重要性排序改进算法。传统的图排序仅仅只考虑句子之间的重要性,并没有考虑句子之间的冗余性,导致生成的摘要存在信息冗余的问题。借助句子之间词的语义位置关系,并结合句子之间的语义相似关系对句子的冗余性进行评估,加权句子的重要性和冗余性对其综合打分排序,解决了摘要句子信息冗余的问题。摘要评测结果表明,改进方法提高了摘要的Rouge评测值,改善了摘要质量,使得摘要更贴近标准摘要。
[Abstract]:With the increasing of scientific research output, the explosive growth of academic literature makes it difficult for academic researchers to search and consult, and the workload of scientific research is increasing day by day.With the help of modern computer technology, the automatic abstracting method of academic literature has become a hot topic in current research.In this paper, the citation context is taken as the research object. Aiming at the shortcomings of the current citation based abstracting methods of academic literature, a citation context based approach is designed and implemented to improve the quality of the automatic abstracts of academic documents.The main contents and results of this paper are as follows: (1) A citation context classification algorithm based on convolutional neural network is designed.Firstly, the structural characteristics of academic literature are analyzed, and a classification model based on the structure of academic literature is proposed to solve the problem of incomplete coverage of summary information.In order to locate the argumentation of citation context in cited literature and simulate the application of convolutional neural network in image field, a simple neural network model structure is designed.Furthermore, the word vector representation based on deep learning is used as the input of sentences to realize the classification of citation context based on convolutional neural network.CNN-static and CNN-non-static are selected to compare with the traditional classification algorithm based on linear SVM. The experimental results show that the citation context classification based on convolution neural network achieves high accuracy in both modes.The accuracy of CNN-non-static is as high as 79.91, and the whole average increase is 3.12, which can effectively solve the problem of the distribution of citation contextual argumentation points.The evaluation results also confirm that the proposed method improves the information and readability of the abstract, and constructs a citation context extraction algorithm based on vector space model.The relationship between citation and citation context is analyzed, the feature selection and weight calculation of citation sentences are carried out, the vector space model of citation and citation is constructed, and the semantic relationship between citation and citation context is measured by cosine distance.The context of citation is extracted from the cited document by citation.The extraction results show that the consistency of citation and citation context is low, the extraction accuracy is low, and the total mean value is 17.66.It also shows that citation can not accurately reflect the cited literature, and the citation based abstracts of academic documents have the defect of inconsistent with the cited literature information. (3) an improved algorithm of sentence importance ranking based on graph is proposed.The traditional graph sorting only considers the importance of sentences, and does not consider the redundancy of sentences, which leads to the problem of information redundancy in the generated abstracts.The redundancy of sentences is evaluated by the semantic position relation of words between sentences and the semantic similarity between sentences. The importance and redundancy of weighted sentences are comprehensively graded to solve the problem of redundancy of summary sentence information.The evaluation results show that the improved method improves the Rouge evaluation value of the summary, improves the quality of the summary, and makes the summary more close to the standard summary.
【学位授予单位】:西北农林科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1

【参考文献】

相关期刊论文 前10条

1 陈海华;黄永;张炯;陆伟;;基于引文上下文的学术文本自动摘要技术研究[J];数字图书馆论坛;2016年08期

2 刘天yN;步一;赵丹群;黄文彬;;自动引文摘要研究述评[J];现代图书情报技术;2016年05期

3 莫鹏;胡珀;黄湘冀;何婷婷;;基于超图的文本摘要与关键词协同抽取研究[J];中文信息学报;2015年06期

4 李峰;黄金柱;李舟军;杨伟铭;;使用关键词扩展的新闻文本自动摘要方法[J];计算机科学与探索;2016年03期

5 郭丽丽;丁世飞;;深度学习研究进展[J];计算机科学;2015年05期

6 程园;吾守尔·斯拉木;买买提依明·哈斯木;;基于综合的句子特征的文本自动摘要[J];计算机科学;2015年04期

7 李强;郑宇杰;;基于多视角特征融合与随机森林的蛋白质结晶预测[J];现代电子技术;2015年08期

8 肖晓;张敏;;支持向量机多分类问题研究[J];淮海工学院学报(自然科学版);2014年03期

9 肖升;何炎祥;;改进的潜在语义分析中文摘录方法[J];计算机应用研究;2012年12期

10 卫佳君;宋继华;;自动文摘的方法研究[J];计算机技术与发展;2011年08期

相关会议论文 前1条

1 刘盛博;丁X;;基于引用内容的引文评价分析[A];第九届中国科技政策与管理学术年会论文集[C];2013年

相关博士学位论文 前3条

1 张琳;基于引用聚类的多文档自动文摘技术研究[D];大连海事大学;2013年

2 胡珀;融合上下文信息的自动文摘研究[D];武汉大学;2013年

3 刘娜;文本自动摘要和信息抽取方法及其应用研究[D];大连海事大学;2012年

相关硕士学位论文 前2条

1 任潇雨;基于引文的英文文档文摘自动生成方法研究[D];西安电子科技大学;2014年

2 石子言;用户驱动的特定领域自动文摘系统设计与实现[D];东北师范大学;2009年



本文编号:1773015

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1773015.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户267bf***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com