基于事件或话题下文档的实体重要性排序
发布时间:2018-05-17 23:42
本文选题:事件检测 + 实体排序 ; 参考:《华东师范大学》2017年硕士论文
【摘要】:在互联网时代背景下,新型网络媒体的蓬勃发展使得人们可以方便有效的共享海量信息。目前,新型网络媒体积累了大量文本数据,这些数据中记录着社会发展过程中重要的舆情事件和热门讨论话题。通过监测网络舆情,政府、群众以及相关部门可以了解我国社会现状并及时发现社会存在的问题。同时,舆情监测还可以帮助相关政府部门科学管理并做出科学决策。因此,如何从海量网络文本数据中检测出事件或话题成为一个重要并有现实意义的研究课题。而对于事件或话题下的文本,重要实体可以抽象概括文本中所描述的主体。本文基于海量网络新闻数据,检测热门事件和热门话题并抽取文本关键实体概括事件主要元素。本文主要工作包括以下几个方面:·本文通过度量学习方法重新定义新闻文本相似度计算方式;针对海量、无序、冗余的网络新闻文本数据,提出基于主题的事件检测方法ToED。该方法应用主题模型学习文档主题分布,对于任意主题下的文档集合,提出基于密度的事件聚类方法ESACN来检测热门事件。·针对文档重要实体选择问题,本文提出了一种基于前向分步算法的重要实体排序模型LA-FSAM。该算法不仅考虑实体在文档中的重要特征,还通过维基百科和谷歌Word2Vec引入实体外部特征对实体进行排序。该模型运用改进的AUC准则构造损失函数,通过标注训练数据并利用随机梯度下降法学习模型参数。通过LA-FSAM与基线方法的实验对比证明了我们所提方法的有效性。·本文设计并实现了社会热点舆情分析展示系统(KSPOS),该系统提供了基于事件或话题的检索功能。为了向用户展示全面广泛的搜索结果,系统挑选重要实体并挖掘实体语义关系,构建舆情事件语义网络,同时,系统抽取文档集合关键词抽象概括事件或话题描述内容,生成事件时间线充分展示事件发展过程。
[Abstract]:Under the background of the Internet era, the flourishing development of new network media makes it convenient and effective to share massive information. At present, the new network media has accumulated a large amount of text data, which records the important public opinion events and hot discussion topics in the process of social development. Through monitoring the network public opinion, the government, the masses and the relevant departments can understand the social situation of our country and discover the social problems in time. At the same time, public opinion monitoring can also help relevant government departments to manage and make scientific decisions. Therefore, how to detect events or topics from massive network text data has become an important and meaningful research topic. For the text under the event or topic, the important entity can abstract the main body described in the text. Based on mass network news data, this paper detects hot events and hot topics and extracts the main elements of key entities of text to summarize events. The main work of this paper includes the following aspects: this paper redefines the similarity calculation method of news text by metric learning method, and proposes a topic-based event detection method ToED for the massive, unordered and redundant network news text data. In this method, topic model is used to study document topic distribution. For the document set under any topic, a density-based event clustering method (ESACN) is proposed to detect hot events. In this paper, an important entity sorting model, LA-FSAM, is proposed based on forward step algorithm. The algorithm not only considers the important features of entities in documents, but also introduces entity external features to sort entities through Wikipedia and Google Word2Vec. The model uses the improved AUC criterion to construct the loss function and uses the stochastic gradient descent method to learn the parameters of the model by annotating the training data. The effectiveness of the proposed method is proved by the comparison of LA-FSAM and baseline method. In this paper, we design and implement the analysis and display system of social hot public opinion, which provides the retrieval function based on event or topic. In order to show users comprehensive and extensive search results, the system selects important entities and excavates entity semantic relations, constructs semantic network of public opinion events, and extracts document sets of keywords to abstract the event or topic description content. Generate event timeline to fully show the event development process.
【学位授予单位】:华东师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前1条
1 ;Online detection of bursty events and their evolution in news streams[J];Journal of Zhejiang University-Science C(Computer & Electronics);2010年05期
,本文编号:1903422
本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1903422.html
最近更新
教材专著