基于逻辑回归的金融数据分类系统的设计与实现
发布时间:2018-04-05 01:23
本文选题:文本分类 切入点:逻辑回归 出处:《山东大学》2017年硕士论文
【摘要】:近半个世纪,伴随着人类进入大数据时代,计算机多媒体技术得到快速而惊人的飞速发展。与之相伴的是,各种图像和文本数据库中存储的有价值信息日新月异,金融类的新闻、公告和资讯等数据对于关心市场变化和热衷于投资的人们来说有着极其重要的参考价值。然而金融数据的来源广泛、种类繁杂,对于人们想直接找到最关注的信息带来了很大的困扰。因此如何对海量的金融数据进行有效而精确快速的管理与检索成为新时期的重大挑战,金融数据的分类问题成为迎接这个挑战的所需要处理的核心问题。金融数据分类平台可以实现对互联网中存在的金融数据做到实时收集并快速的分到准确的类别,可以高效的处理海量的文本信息,有极高的实用价值。分类系统是金融数据分类平台中的一个核心部件,它能够通过对样本数据处理和训练得到性能优越的分类器,并将分类器作用于实时数据的处理,能够很好的实现金融数据分类平台的分类功能,有很好的应用价值。根据以上介绍的背景,本文所要解决的问题是对海量的金融类文本数据实现快速高效的自动分类。通过对金融数据类别的市场需求调研,确定了常用的18个类别为最终的分类结果。本文的主要工作是完成对金融数据分类系统的设计与实现,结合金融数据中专有名词较多、特征较为明显的特点,本文采用逻辑回归算法实现对分类器的构造,将样本数据经过预处理后得到训练样本和测试样本,对训练样本进行特征提取、特征加权、特征向量化,传入逻辑回归模型训练得到分类器,用测试样本测试分类器的分类效果并根据评价结果进一步优化分类器,最终把分类器应用到对实际的金融数据分类。金融数据分类系统可应用在相关的门户网站和数据分类平台,实现对金融类的新闻、公告、资讯等文本数据的自动分类,也可以应用于公司或个人对大量金融类文本的信息管理,帮助用户快速定位到最想要获取的相关内容,有非常好的应用价值。金融数据分类系统实现对数据的分类可以作为金融类信息检索和数据挖掘的基础,进一步提高信息的利用率。本文提出的金融数据分类系统经过测试,有很好的分类效果,有非常好的应用前景。
[Abstract]:In the past half century, with the entering of big data, computer multimedia technology has developed rapidly and amazingly.At the same time, the valuable information stored in various image and text databases is changing with each passing day. Financial news, announcements and information have extremely important reference value for people who care about market changes and are keen to invest.Financial data, however, come from a wide variety of sources, causing a great deal of trouble for people to find the most concerned information directly.Therefore, how to manage and retrieve large amounts of financial data efficiently and accurately becomes a major challenge in the new era, and the classification of financial data becomes the core problem that needs to be dealt with in order to meet this challenge.The financial data classification platform can collect the financial data in the Internet in real time and divide it into accurate categories quickly. It can deal with massive text information efficiently and has high practical value.The classification system is a core component of the financial data classification platform. It can process and train the sample data to obtain the superior classifier, and the classifier can be used in real-time data processing.It can achieve the classification function of financial data classification platform, and has good application value.According to the background above, the problem to be solved in this paper is to realize fast and efficient automatic classification of large amounts of financial text data.By investigating the market demand of financial data categories, 18 categories are determined as the final classification results.The main work of this paper is to complete the design and implementation of the financial data classification system. Combined with the characteristics of more proper nouns and more obvious characteristics in the financial data, this paper uses the logical regression algorithm to construct the classifier.After preprocessing the sample data, the training sample and the test sample are obtained, and the training samples are extracted, weighted, vectorized, and trained by the incoming logical regression model to obtain the classifier.The classification effect of the classifier is tested with test samples, and the classifier is further optimized according to the evaluation results. Finally, the classifier is applied to the classification of the actual financial data.The financial data classification system can be used in relevant web portals and data classification platforms to realize the automatic classification of financial news, announcements, information and other text data.It can also be applied to the information management of a large number of financial texts by companies or individuals, which can help users quickly locate the relevant content that they want to obtain most quickly, and have very good application value.The financial data classification system can be used as the basis of financial information retrieval and data mining to further improve the utilization rate of information.The financial data classification system proposed in this paper has been tested and has a good classification effect and a very good application prospect.
【学位授予单位】:山东大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.52
【相似文献】
相关期刊论文 前1条
1 郭通;黄焱;张白愚;;基于ARM和FPGA的DVB-S2数据分类系统研究[J];电子技术应用;2009年08期
相关硕士学位论文 前2条
1 王蕾;数据分类系统的设计与实现[D];华中科技大学;2014年
2 刘展;基于逻辑回归的金融数据分类系统的设计与实现[D];山东大学;2017年
,本文编号:1712596
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1712596.html