融合句义特征的多文档自动摘要算法研究
发布时间:2019-04-08 13:49
【摘要】:研究是自然语言处理领域的关键问题之一,为使抽取的摘要更能体现多文档主题,本文在子主题划分的基础上,提出了一种融合句义特征的句子优化选择方法.该方法基于句义结构模型,提取句义结构中的话题、谓词等特征,并融合统计特征构造特征向量计算句子权重,最后采用综合加权选取法和最大边缘相关相结合的方法抽取摘要.选取不同主题的文本集进行实验和评价,在摘要压缩比为15%情况下,系统摘要平均准确率达到66.7%,平均召回率达到65.5%.实验结果表明句义特征的引入可以有效提升多文档摘要的效果.
[Abstract]:Research is one of the key issues in the field of natural language processing. In order to make abstracts more representative of multi-document themes, this paper proposes a sentence optimization selection method based on sub-topic division, which integrates sentence meaning features. Based on the sentence meaning structure model, this method extracts the topic, predicate and other features of the sentence meaning structure, and combines the statistical features to construct the feature vector to calculate the sentence weight. Finally, a combination of comprehensive weighted selection method and maximum edge correlation method is used to extract the abstracts. When the compression ratio of abstracts is 15%, the average accuracy of abstracts is 66.7% and the average recall rate is 65.5%. The experimental results show that the introduction of sentence semantic features can effectively improve the effect of multi-document abstracts.
【作者单位】: 北京理工大学信息与电子学院;
【基金】:国家“二四二”资助项目(2005C48) 北京理工大学科技创新计划重大项目培育专项资助项目(2011CX01015)
【分类号】:TP391.1
本文编号:2454629
[Abstract]:Research is one of the key issues in the field of natural language processing. In order to make abstracts more representative of multi-document themes, this paper proposes a sentence optimization selection method based on sub-topic division, which integrates sentence meaning features. Based on the sentence meaning structure model, this method extracts the topic, predicate and other features of the sentence meaning structure, and combines the statistical features to construct the feature vector to calculate the sentence weight. Finally, a combination of comprehensive weighted selection method and maximum edge correlation method is used to extract the abstracts. When the compression ratio of abstracts is 15%, the average accuracy of abstracts is 66.7% and the average recall rate is 65.5%. The experimental results show that the introduction of sentence semantic features can effectively improve the effect of multi-document abstracts.
【作者单位】: 北京理工大学信息与电子学院;
【基金】:国家“二四二”资助项目(2005C48) 北京理工大学科技创新计划重大项目培育专项资助项目(2011CX01015)
【分类号】:TP391.1
【相似文献】
相关期刊论文 前1条
1 罗森林;韩磊;潘丽敏;冯扬;刘盈盈;;汉语句义结构模型及其验证[J];北京理工大学学报;2013年02期
相关会议论文 前1条
1 周祖亮;;“今”字注释刍议[A];中华中医药学会全国第十七届医古文学术研讨会论文集[C];2008年
相关硕士学位论文 前1条
1 邹丽丽;融合句义特征的事件关系强度计算方法研究[D];北京理工大学;2015年
,本文编号:2454629
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2454629.html