基于Android操作系统的信息处理模块研究
发布时间:2018-05-03 22:05
本文选题:Android系统 + 自然语言处理 ; 参考:《北京邮电大学》2013年硕士论文
【摘要】:随着电子商务的高速飞跃式的发展,电子营销这种销售方式受到了广泛的关注。邮件凭借其庞大的用户量受到了电子营销业的追捧。然而在生活节奏快速的今天,人们虽然可以接受基于邮件的电子营销方式,但是由于时间有限很难分出精力去关注这些邮件打折促销信息。但是值得关注的是在信息飞速发展的今天,各种打折消息往往都会通过邮件传播。于是为了不让人们错过这些对自己有用的消息,本课题对于邮件中的打折信息进行有效消息的过滤,提取出人们更关注的消息,从而节省人们宝贵的时间。又由于移动互联网的飞速发展,人们利用移动设备连接网络的时间在飞速的增长,在电脑旁的上网时间在不断下降,加之Android系统的51.7%的市场占有率,稳稳的排在智能操作系统的首位,于是本论文选择Android平台作为软件最终的使用平台,力图为人们提供符合人们使用习惯的基于Android系统的邮件打折信息排行软件。 本系统通过分析大量的邮件打折信息发现,自然语言处理词性标注之后的打折信息有诸多的特征,可以帮助我们更准确快速的提取出邮件中的打折信息和商店信息。为了加强自然语言处理中分词步骤的准确性我们提出了针对于打折邮件的CRF分词算法。另外为了加强自然语言处理中词性标注的准确程度本次研究分析了基于自然语言处理的隐马尔科夫模型,提出了基于增强隐马尔科夫模型的完整的自然语言处理词性标注解决方案。 主要包括以下内容: 1)分析了电子商务的快速发展以及邮件的广泛普及,介绍了Android平台的广泛应用前景以及基于Android的通信协议,分析了Android平台的主要应用软件。 2)提出了处理广告邮件的迫切性,分析了SMTP、POP3、IMAP等邮件收发协议,找到了可以屏蔽邮件的不同协议类型,用统一的入口去获取邮件内容的接口。 3)分析了自然语言处理技术的广泛应用范围,介绍了自然语言处理领域的现存问题以及自然语言处理的三种模型——隐马尔科夫模型和最大熵模型以及条件随机场模型。 4)针对于要处理的邮件信息的特点提出了基于条件随机场的邮件分词算法并验证了其正确性。这部分解决了邮件中的交集型及组合型切分歧义问题。 5)提出了改进隐马尔科夫模型的完整词性标注方案。对于增强型隐马尔科夫模型做了验证,实验证明增强型隐马尔科夫模型在词性标注上的显著优势。 6)实现了基于自然语言处理技术的Android广告邮件排行软件,介绍了本软件的类构成,软件设计流程,软件模块构成及操作方式。
[Abstract]:With the rapid development of e-commerce, e-marketing has received wide attention. Email, with its huge number of users, has been sought after by the e-marketing industry. However, in today's fast-paced life, although people can accept email based on electronic marketing, but due to time constraints, it is difficult to focus on these discounted email promotional information. But it is worth noting that in the rapid development of information today, all kinds of discounted messages are often spread through email. Therefore, in order not to let people miss these useful messages, this paper filters the discount information in email, extracts the information that people pay more attention to, and saves people valuable time. Because of the rapid development of the mobile Internet, the time people use mobile devices to connect to the network is growing rapidly, and the time of using the Internet next to the computer is declining, and the market share of Android system is 51.7%. This paper chooses Android platform as the final use platform of software, and tries to provide people with email discount information ranking software based on Android system, which accords with people's usage habits. By analyzing a large number of discounted email information, the system finds that the discounted information after dealing with part of speech labels has many features, which can help us to extract discount information and store information more accurately and quickly. In order to enhance the accuracy of word segmentation in natural language processing, we propose a CRF segmentation algorithm for discounted mail. In addition, in order to enhance the accuracy of part of speech tagging in natural language processing, this study analyzes the hidden Markov model based on natural language processing. A complete natural language processing solution for part of speech tagging based on enhanced Hidden Markov Model is proposed. It mainly includes the following: 1) the rapid development of electronic commerce and the popularization of mail are analyzed. The application prospect of Android platform and the communication protocol based on Android are introduced. The main application software of Android platform is analyzed. 2) this paper puts forward the urgency of dealing with the advertisement mail, analyzes the email sending and receiving protocols such as SMTP / POP3 / IMAP, finds out the different types of protocols which can shield the mail, and uses the unified entry to obtain the mail content interface. 3) the extensive application of natural language processing technology is analyzed. The existing problems in the field of natural language processing and the three models of natural language processing, hidden Markov model, maximum entropy model and conditional random field model, are introduced. 4) according to the characteristics of the message information to be processed, the algorithm of message segmentation based on conditional random field is proposed and its correctness is verified. This part solves the problem of intersection and combination segmentation ambiguity in mail. 5) an improved hidden Markov model is proposed to complete part of speech tagging. The enhanced Hidden Markov Model is verified and the experimental results show that the enhanced Hidden Markov Model has a significant advantage in part of speech tagging. 6) the Android advertisement mail ranking software based on natural language processing technology is implemented. The class structure, software design flow, software module structure and operation mode of this software are introduced.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP316;TP311.52
【参考文献】
相关期刊论文 前5条
1 赵岩;王晓龙;刘秉权;关毅;;融合聚类触发对特征的最大熵词性标注模型[J];计算机研究与发展;2006年02期
2 张晓艳;王挺;陈火旺;;命名实体识别研究[J];计算机科学;2005年04期
3 赵海;揭春雨;;基于有效子串标注的中文分词[J];中文信息学报;2007年05期
4 宋彦;蔡东风;张桂平;赵海;;一种基于字词联合解码的中文分词方法[J];软件学报;2009年09期
5 于江德;谷川;葛文英;樊孝忠;;一种基于字和子串联合标注的汉语分词方法[J];山西大学学报(自然科学版);2011年03期
,本文编号:1840268
本文链接:https://www.wllwen.com/wenyilunwen/guanggaoshejilunwen/1840268.html