基于超图的汉越新闻关键词抽取研究
[Abstract]:With the development of Belt and Road, our country began to pay more attention to Vietnam. News, as a carrier of information dissemination, is an important way for people to obtain information. However, Vietnamese is a small language, very few people master, and network news can hardly provide keywords, making news positioning difficult. Chinese and Vietnamese news keyword extraction can save a lot of time and improve the utilization rate of information. At present, in the field of keyword extraction, people usually only consider the feature information of words, and do not consider the complex relations in news documents, so it is an urgent problem to use the appropriate model to express these complex relationships. The hypergraph model can express the complex relations between multiple entities, and its characteristics can meet the needs of news documents to express multiple relationships. Therefore, this paper uses the hypergraph model to study the complex relationships between multiple entities, and the hypergraph model is used to study the relationship between multiple entities in a single document. The method of keyword extraction in multi-document and bilingual environment. The main work of this paper is as follows: 1. This paper presents a new method of news keyword extraction based on hypergraph sorting in single document. Considering that the hypergraph model can express the relationship between words and sentences in the document, this method firstly analyzes the structural characteristics of a single document, takes the word as the vertex, selects the word frequency, part of speech, word span and location as the weight of the word. Then the sentence is regarded as the hypergraph edge of the hypergraph, and the single document news hypergraph model. 2. 2. In this paper, a new method of news keyword extraction based on hypergraph sorting is proposed. Considering that hyper-edge in hypergraph model can represent a news document, this method extracts the time factor and comment number element of news page as feature weight of super-edge by analyzing the influence of the feature of news page on keyword extraction. Build multi-document news hypergraph model. 3. In this paper, a method for extracting Chinese and Vietnamese bilingual news keywords based on hypergraph ordering under multi-document is proposed. Considering that hypergraph can express the corresponding relationship between Chinese and Vietnamese bilingual words by hypergraph, this method firstly analyzes the characteristics of bilingual news documents and takes the frequency of bilingual words as the core feature information of words. Then two types of hyperedges are constructed to build a bilingual news hypergraph model. Finally, the hypergraph-based random walk algorithm is used to sort the vertices of the hypergraph, and some words with the highest ranking are output as keywords of the news document. The experiment proves that the method has message.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 张莉婧;李业丽;曾庆涛;雷嘉丽;杨鹏;;基于改进TextRank的关键词抽取算法[J];北京印刷学院学报;2016年04期
2 宁建飞;刘降珍;;融合Word2vec与TextRank的关键词抽取研究[J];现代图书情报技术;2016年06期
3 牛萍;黄德根;;TF-IDF与规则相结合的中文关键词自动抽取研究[J];小型微型计算机系统;2016年04期
4 李强;;一种基于种子扩散策略的关键词抽取方法[J];科技风;2016年01期
5 朱泽德;李淼;张健;曾伟辉;曾新华;;一种基于LDA模型的关键词抽取方法[J];中南大学学报(自然科学版);2015年06期
6 王民;;新闻文档关键词抽取技术研究[J];科技传播;2015年07期
7 梁喜涛;顾磊;;中文分词与词性标注研究[J];计算机技术与发展;2015年02期
8 顾益军;夏天;;融合LDA与TextRank的关键词抽取研究[J];现代图书情报技术;2014年Z1期
9 孙皓;董守斌;;基于标签密度的自适应正文提取方法[J];郑州大学学报(理学版);2009年01期
10 章成志;;自动标引研究的回顾与展望[J];现代图书情报技术;2007年11期
相关博士学位论文 前1条
1 徐晓华;图上的随机游走学习[D];南京航空航天大学;2008年
相关硕士学位论文 前5条
1 汪建成;汉越双语新闻话题分析方法研究[D];昆明理工大学;2015年
2 毛新武;基于组合特征的中文新闻网页关键词提取研究[D];北京林业大学;2013年
3 沈剑平;面向网络人物搜索的中文人名消歧[D];哈尔滨工业大学;2010年
4 尹倩;基于聚类分析的中文新闻网页关键词提取方法研究[D];合肥工业大学;2009年
5 杨洁;多文档关键词抽取技术的研究[D];沈阳航空工业学院;2009年
,本文编号:2261433
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2261433.html