旅游数据的查询与可视分析技术研究
发布时间:2018-06-17 03:04
本文选题:社交媒体 + 旅游数据 ; 参考:《西南科技大学》2016年硕士论文
【摘要】:近年来,随着在线社交媒体的发展和普及,越来越多的游客倾向于随时随地在社交媒体上发布旅游信息,产生了海量的、多维度、非结构化的旅游数据。面向这种复杂数据的研究吸引了广大高校和企业界的广泛关注。本文从三个方面介绍了社交媒体上旅游数据的研究工作:首先是旅游数据的采集以及预处理,其次是基于旅游数据的分析,包括Top-k支配查询算法、文本情感挖掘技术、关键词提取技术等,最后是基于旅游数据的可视化研究。1、针对社交媒体上旅游数据的采集以及预处理,首先介绍了获取旅游社交网站旅游数据的过程,其次对比分析抓包方式和模拟浏览器方式获取微博数据,接着介绍了如何通过搜索功能获取微博数据,最后从数据清洗和数据集成的角度对数据预处理。2、基于旅游数据的分析,为满足子空间Top-k支配查询需求,本文展开了Topk支配查询算法的研究。首先采用B+-Tree构建有序列表,接着采用轮询调度算法根据查询条件获取k组终结元组,其次,根据生成的候选元组和终结元组,采用概率分布模型计算终结元组支配分数。迭代上述过程优化查询结果,直到满足条件为止。本文采用SVM对短文本情感分类,特征选取包括标点符号、标签、情感词等。从实验结果来看,本文的方法具有一定的使用价值。3、基于旅游数据的网络舆情,提出了一种面向对象的可视分析Web框架,可以有效地提高了团队协同开发的速度。本文设计并开发了针对旅游网络舆情的可视化分析系统,该系统支持游客地点信息、评论情感信息、社交网络信息可视化显示和交互分析,从而方便用户多角度地理解游客的舆情信息,发现评论中隐含的特征、关系和趋势等。大量实验结果表明了该系统不仅能有效的分析游客地域倾向和情感变化,而且还帮助旅游管理部门及时了解旅游网络舆情。
[Abstract]:In recent years, with the development and popularization of online social media, more and more tourists tend to publish travel information on social media anytime and anywhere, which produces massive, multi-dimensional, unstructured travel data. The research of this kind of complex data attracts the extensive attention of universities and business circles. This paper introduces the research work of tourism data on social media from three aspects: first, the collection and preprocessing of tourism data; secondly, the analysis based on tourism data, including Top-k dominating query algorithm, text emotion mining technology. Finally, based on the visualization research of tourism data, aiming at the collection and preprocessing of tourism data on social media, this paper first introduces the process of obtaining tourism data of tourism social network. Secondly, the paper compares and analyzes how to obtain Weibo data by means of packet capture and simulation browser, and then introduces how to obtain Weibo data by searching function. Finally, it analyzes the data preprocessing from the angle of data cleaning and data integration, based on the analysis of travel data. In order to satisfy the demand of subspace Top-k dominating query, this paper develops the research of Topk dominating query algorithm. First, B Tree is used to construct an ordered list, then polling scheduling algorithm is used to obtain k terminal tuples according to the query conditions. Secondly, a probability distribution model is used to calculate the final tuple dominating fraction according to the candidate tuple and the final tuple. Iterate the above procedure to optimize the query results until the conditions are met. In this paper, SVM is used to classify the emotion of short text. The feature selection includes punctuation, label, affective words and so on. According to the experimental results, the method of this paper has some practical value. Based on the network public opinion of tourism data, an object-oriented visual analysis Web framework is proposed, which can effectively improve the speed of team collaborative development. This paper designs and develops a visual analysis system for tourism network public opinion. The system supports tourist location information, comments emotional information, social network information visual display and interactive analysis. It is convenient for users to understand tourists' public opinion information from many angles, and to discover the implied features, relationships and trends in the comments. A large number of experimental results show that the system can not only effectively analyze the regional tendency and emotional change of tourists, but also help the tourism management department to understand the tourism network public opinion in a timely manner.
【学位授予单位】:西南科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP274
,
本文编号:2029360
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2029360.html