基于游戏虚拟货币市场的数据分析
发布时间:2018-06-11 13:50
本文选题:游戏虚拟货币 + 特定领域的文本检索 ; 参考:《电子科技大学》2013年硕士论文
【摘要】:根据美国市场分析机构ABI Research的报告,全球网游市场规模在2015年将超过290亿美金[1]。游戏虚拟货币是该产业链上的核心商品,处于产业链上的实体都亟需了解市场的工具以获取供求统计信息及实时信息。大规模的网游市场伴随着海量网络数据的出现,但基于该特定领域的自然语言处理技术(包括文本信息表示技术、同义词问题处理、特征词选择方法、文本检索技术、文本分类技术、Web信息提取技术等)的研究仍不多见。 本文针对上述问题,构造虚拟的专业搜索引擎,以获取网游领域相关的结果集作为初始研究对象,并结合游戏虚拟货币网络交易的特征,用适当的分类方法将初始结果集分类,以获得承载游戏虚拟货币网络交易信息的网页集,再基于该网页集进行游戏虚拟货币网络交易订单的数据采集和分析(包括冗余检查和状态更新),主要内容为: 1.建立向量空间模型以处理网页文本,并提出结合领域特征的特征词选择方法和同义词处理方法,来计算和降低向量空间的维度。 2.基于多个通用搜索引擎,构造虚拟的专业搜索引擎以获取网游领域相关的网页集,作为初始研究对象。 3.以K-近邻文本分类方法为基础,提出一种变换的KNN分类方法,对网页集进行文本分类,该方法基于对训练语料的分析,以余弦计算新文本与已知类别的相似度,不仅实现简单且准确率高,,对训练文本的重新训练代价较低,计算的时间和空间复杂度都在训练规模的线性变化空间内。 4.采用基于DOM的Web信息提取技术提取订单信息不仅简单高效,而且信息的采集稳定可靠。结合遗传算法的基本思想以检测多次采集的订单信息的状态变化,不仅具有全局搜索优化性能以及高效的并行计算性能,而且具有自组织、自适应、自学习的特征,从而可以确保订单信息采集的高效性和准确性。 5.建立游戏虚拟货币数据应用平台,以提供供求统计信息服务及实时信息服务。
[Abstract]:The global market for online games will exceed $29 billion in 2015, according to ABI Research, a U.S. market analyst. The game virtual currency is the core commodity in the industry chain. The entities in the industry chain need to know the tools of the market in order to obtain the statistical information of supply and demand and real-time information. Large-scale online game market is accompanied by the emergence of massive network data, but natural language processing technology (including text information representation technology, synonym problem processing, feature word selection method, text retrieval technology) based on this specific field, including text information representation technology, synonym problem processing, text retrieval technology, etc. The research of text classification technology and Web information extraction technology is still rare. In view of the above problems, this paper constructs a virtual professional search engine to obtain the result set related to the domain of online games as the initial research object. Combined with the characteristics of virtual currency network transaction, the initial result set is classified by appropriate classification method, so as to obtain the web page set carrying the information of virtual currency network transaction. Then the data collection and analysis (including redundancy check and status update) of the virtual currency network transaction order based on the web page set are as follows: 1. A vector space model is established to deal with the text of a web page, and a feature selection method combining domain features and a synonym processing method are proposed to calculate and reduce the dimension of vector space. 2. Based on multiple general search engines, a virtual professional search engine is constructed to obtain the web pages related to the online game domain, as the initial research object. 3. Based on the K-nearest neighbor text classification method, a transformed KNN classification method is proposed to classify the web pages. Based on the analysis of the training corpus, the similarity between the new text and the known category is calculated by cosine. Not only is the implementation simple and accurate, but the cost of retraining the training text is low. The time and space complexity of the calculation are both in the linear variation space of the training scale. 4. 4. Using Dom based Web information extraction technology to extract order information is not only simple and efficient, but also stable and reliable. Combining the basic idea of genetic algorithm to detect the state change of order information collected many times, it not only has global search optimization performance and efficient parallel computing performance, but also has the characteristics of self-organization, self-adaptation and self-learning. In order to ensure the order information collection efficiency and accuracy. 5. The virtual currency data application platform is established to provide the statistical information service of supply and demand and the real time information service.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 张继东,刘萍;基于语料库同义词辨析的一般方法[J];解放军外国语学院学报;2005年06期
2 徐小琳,阙喜戎,程时端;信息过滤技术和个性化信息服务[J];计算机工程与应用;2003年09期
3 章成志;一种基于语义体系的同义词识别研究[J];淮阴工学院学报;2004年01期
4 戴文华;焦翠珍;何婷婷;;基于混合并行遗传聚类的文本特征抽取方法研究[J];计算机科学;2008年09期
5 张宁,贾自艳,史忠植;使用KNN算法的文本分类[J];计算机工程;2005年08期
6 杨舟;卓林;赵朋朋;崔志明;;一种针对商品数据记录的自动抽取方法[J];计算机工程;2010年23期
7 郭建兵;崔志明;陈明;赵朋朋;;基于DOM树与领域本体的Web抽取方法[J];计算机工程;2012年05期
8 刘丹;谢庆生;顾新建;;电子商务环境下产品本体构建技术研究[J];计算机应用;2007年03期
9 赵世奇,张宇,刘挺,陈毅恒,黄永光,李生;基于类别特征域的文本分类特征选择方法[J];中文信息学报;2005年06期
10 张琪玉;;网络信息检索工具增强关键词检索功能的措施[J];图书馆杂志;2001年01期
相关博士学位论文 前1条
1 李荣陆;文本分类及其相关技术研究[D];复旦大学;2005年
本文编号:2005494
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2005494.html