海量邮件自动化分析技术的研究与应用
发布时间:2018-03-02 13:33
本文选题:搜索 切入点:海量 出处:《电子科技大学》2014年硕士论文 论文类型:学位论文
【摘要】:电子邮件中蕴含的大量信息让它成为了数据挖掘和大数据分析的重要对象。利用和分析这些信息成为很多用户所关注的需求。而将原始的邮件文件高效快速的转化为邮件元数据,并构建一个拥有便于分析和利用海量邮件数据的邮件自动分析平台,就将为分析和利用好这些信息提供良好的基础。本文研究了海量邮件自动化分析涉及的关键技术、设计实现了一个海量邮件自动化分析系统。首先,针对海量内容和自动化这两大需求,在尽可能不遗漏邮件文件信息的前提下,建立邮件快速导入模块,分析并归类邮件中的元信息,尽最大可能的提升导入效率、降低数据规模、提升用户体验和保证信息的完备性,解决了在海量邮件的前提下处理邮件速度和软硬件基础利用效率的问题,为进一步开展数据挖掘和分析提供良好的数据条件。其次,通过深入用户工作实际,发现在人工分析中存在的流程特点和管理特点,实现对人工分析工作流程的集成,减少人工分析中不必要的工作和降低程序的运行开销,提升了人工分析部分的信息化程度。接下来,在邮件元数据、邮件文本信息和分析结果入库的前提下,本文实现了对上述信息的索引和检索功能,提高了在面对海量邮件信息时快速检索到感兴趣信息的能力。在此基础上,系统实现了邮件自动化分类标记的功能,整体提升了系统的自动化能力。此后,本文设计了对感兴趣的信息进行统计和导出的功能,实现了信息从分解、归类、索引、统计到再次整合的过程。最后,还针对实际工作环境中的信息管理的具体流程和需要,建立了分角色的信息管理系统,提升整个工作的信息化程度。本文对系统部署之后的工作情况做出了统计测试和对比,并对统计和对比结果反映的问题做出了分析和解释。统计和对比数据表明,系统基本实现了用户的需求并且可以服务于实际的工作。最后,本文还总结了海量邮件自动化分析系统需要改进的地方。并为海量邮件自动化分析系统将来功能的拓展和研究提出了一些自己的想法。
[Abstract]:E-mail contains a lot of information that makes it an important object of data mining and big data analysis. Using and analyzing this information has become the concern of many users. And the original mail files are transferred efficiently and quickly. Into mail metadata, And build a mail automatic analysis platform which is easy to analyze and utilize the mass mail data, which will provide a good basis for analyzing and utilizing the information. This paper studies the key technologies involved in the automatic analysis of mass mail. A mass mail automation analysis system is designed and implemented. First of all, aiming at the two major requirements of mass content and automation, a mail fast import module is established on the premise of not omitting the mail file information as much as possible. Analyze and classify the meta-information in email, improve the efficiency of import, reduce the scale of data, improve the user experience and ensure the completeness of information. It solves the problem of processing mail speed and using efficiency of software and hardware base under the premise of mass mail, and provides a good data condition for further developing data mining and analysis. Find out the characteristics of flow and management in manual analysis, realize the integration of manual analysis workflow, reduce unnecessary work in manual analysis and reduce the running cost of program. The information level of the manual analysis part is improved. Next, under the premise of the mail metadata, the mail text information and the analysis result, this paper realizes the function of indexing and retrieving the above information. The system improves the ability of retrieving the information of interest quickly in the face of mass email information. On this basis, the system realizes the function of automatic classification marking of mail, and improves the automation ability of the system as a whole. This paper designs the function of statistics and exportation of information of interest, realizes the process of information from decomposition, classification, index, statistics to re-integration. Finally, it also aims at the concrete flow and needs of information management in the actual working environment. In order to improve the information level of the whole work, this paper makes a statistical test and comparison of the work situation after the system deployment. Statistics and comparative data show that the system basically realizes the needs of users and can serve the actual work. This paper also summarizes the improvement of the mass mail automated analysis system, and puts forward some ideas for the expansion and research of the future functions of the mass mail automation analysis system.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.098
【参考文献】
相关期刊论文 前1条
1 伊卫国,卫金茂,王名扬;挖掘有效的关联规则[J];计算机工程与科学;2005年07期
,本文编号:1556736
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1556736.html