当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于信息抽取的个性化校园日历系统的研究

发布时间:2018-08-14 14:03
【摘要】:伴随着互联网的飞速发展,信息数据也随之越来越多样化和复杂化,这也给用户在查询信息时带来了很多的不便。如何从每天不断涌现的大量的数据中提取出需要的信息的也成为自然语言处理研究的重点。而本文研究的信息抽取技术应运而生,将大量无序、不规则的信息抽取出来并结构化存储,对推动信息技术的发展具有重要作用。 本文的特色是研究了以事件和时间为中心的信息抽取技术,并且设计和实现了个性化校园日历系统。主要创新点和研究成果如下: 首先,设计和实现了一种将规则和统计模型相结合的中文实体关系抽取算法,该方法利用正则表达式抽取出准确结果,采用条件随机场模型和最大熵模型相结合的机器学习方法给出补充结果,提高了准确率和召回率。该方法在TAC-KBP评测的SlotFilling任务中取得了较好的效果。 其次,提出并设计实现了个性化校园日历系统,该系统在抽取事件信息的同时对事件中的时间信息进行整理,为人们全面了解事件提供了线索。此系统采用基于规则的方法抽取了文本信息中的时间表达式并对其进行归一化处理。在此基础上,提出词激活力模型的事件起止时间表达式的识别方法。事件的起止时间对于了解事件的发展进程提供了更多的信息。该系统已经在校园实体搜索引擎系统COSE中成功应用并上线。 第三,提出一种基于WAF的情感倾向词表扩展方法以及基于机器学习的文本的情感倾向性判断方法。该方法在2011COAE评测的任务一观点词抽取与倾向性判断的问题解决上取得较好成绩。该算法模型为校园日历系统添加了情感倾向性判断功能。该功能可进一步应用于校园舆情监控。
[Abstract]:With the rapid development of the Internet, the information data is becoming more and more diversified and complicated, which also brings a lot of inconvenience to users in querying information. How to extract the needed information from a large number of daily data has also become the focus of natural language processing. The technology of information extraction which is studied in this paper arises as the times require. A large amount of disordered and irregular information is extracted out and stored structurally, which plays an important role in promoting the development of information technology. The feature of this paper is to study the information extraction technology with event and time as the center, and design and implement the personalized campus calendar system. The main innovations and research results are as follows: firstly, a Chinese entity relation extraction algorithm combining rule and statistical model is designed and implemented. The machine learning method combined with conditional random field model and maximum entropy model is used to give the supplementary results, which improves the accuracy and recall rate. This method has achieved good results in the SlotFilling task evaluated by TAC-KBP. Secondly, a personalized campus calendar system is proposed and implemented. The system extracts the event information and collates the time information of the event, which provides a clue for people to understand the event comprehensively. In this system, the time expression of text information is extracted and normalized by rule-based method. On the basis of this, a method of identifying the expression of event start and end time based on word activation force model is proposed. The timing of events provides more information about the evolution of events. The system has been successfully applied in the campus entity search engine system COSE. Thirdly, an extension method of affective propensity lexicon based on WAF and a method to judge the affective tendency of text based on machine learning are proposed. This method has achieved good results in the task-viewpoint word extraction and tendency judgment of 2011COAE evaluation. The algorithm model adds the function of emotional orientation judgment for the campus calendar system. This function can be further applied to the monitoring of campus public opinion.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.1

【参考文献】

相关期刊论文 前8条

1 刘克彬;李芳;刘磊;韩颖;;基于核函数中文关系自动抽取系统的实现[J];计算机研究与发展;2007年08期

2 李保利,陈玉忠,俞士汶;信息抽取研究综述[J];计算机工程与应用;2003年10期

3 张晓艳;王挺;陈火旺;;命名实体识别研究[J];计算机科学;2005年04期

4 邓擘;樊孝忠;杨立公;;用语义模式提取实体关系的方法[J];计算机工程;2007年10期

5 刘迁;焦慧;贾惠波;;信息抽取技术的发展现状及构建方法的研究[J];计算机应用研究;2007年07期

6 车万翔,刘挺,李生;实体关系自动抽取[J];中文信息学报;2005年02期

7 孙茂松,黄昌宁,,高海燕,方捷;中文姓名的自动辨识[J];中文信息学报;1995年02期

8 张小衡,王玲玲;中文机构名称的识别与分析[J];中文信息学报;1997年04期

相关博士学位论文 前1条

1 张素香;信息抽取中关键技术的研究[D];北京邮电大学;2007年



本文编号:2183091

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2183091.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户b131a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com