社交媒体数据上的时态关键词查询
[Abstract]:Social media services have become one of the most frequent Internet services used in people's daily use. It records the original content, forwarded and commented by users. With the continuous accumulation of data, these long - span data are of great significance to the study of the user's cluster behavior and the overall understanding of people or events. In order to track events, users will frequently submit the same query in order to get the latest news of the event. In order to understand the object thoroughly, the analyst needs to collect data at different times. However, the existing social media search service and research Work is mainly focused on real-time search, and the release time recorded in information is also used to measure the timeliness of data. This paper uses social media data flow model to model original content, forward and comment, and defines its reference time series for each social object. Based on this model, keyword query uses keywords as a check. In this paper, the time series data in the query time range and the input of the corresponding scoring function are selected to select the largest K social object with the maximum value. In this paper, the time is promoted to a constraint condition of the query. In this paper, two kinds of application scenarios are explored with a query and real-time tracking and analysis. Then the offline index is followed by an offline index. The characteristics of the available social media data and the index update efficiency of the online index are two points of view. The index technology and query algorithm for this query are proposed. Finally, based on the time series data, this paper analyses the change of information propagation behind the rise and fall of sina micro-blog, and also based on the real time social media number. According to the stream, an online micro-blog analysis platform is built, which constitute an example of the application of temporal keyword query. The full text is carried out around the question of temporal keyword query. The main contributions include the following three aspects:. The design of a double index structure based on the characteristics of social media data and the maximum approximate summary. The reference tree of the intersection obeys the long tail distribution in the size and life cycle length. On the other hand, the social objects are often kept hot in some time periods, and are rarely concerned for the rest of the long time. This paper designs a double inverted list structure based on the above two characteristics of social media data. The double inverted list structure uses different index structures to manage the hot objects and ordinary objects respectively. The two structures all support the filtering of data from the time dimension and return the data according to the reverse order of the social object's final reference tree size. This paper reveals that the query algorithm using the index needs to access the upper bound of the amount of data. The statistical analysis on the real data set shows that the upper bound of the number of access data is sublinear with the K value in most cases. This paper further proposes a piecewise maximum approximate summary, which can predict each object more accurately in the query window. The upper boundary of the tree size is quoted in order to avoid the disk access generated by the actual value of a hot object in a non hot state. A log structure octree index is proposed to solve the real-time temporal keyword query. The other feature of social media data is the high-speed generation of user data, which is a phenomenon. It is particularly prominent during hot events. Therefore, it is important to quickly index the data and reflect it to the query results in the face of an online index scene, whether to improve the user experience of the ordinary user, or to provide timely data support for the quick decision. This article introduces the reference time series of each social object. The approximate approximate segment data is mapped to the point in the three-dimensional space, and the octree is used to maintain the locality in the importance and time dimension of the social object in the index. The encoding method of the octree node makes the index not only support the data filtering of the time dimension, but also guarantee the return of the data required by the temporal threshold algorithm. The combination of the merging tree with the log structure, fully utilizing the fast and disk sequence read-write efficiency of the memory access, implements the rapid index of social media data. In the full volume micro-blog behavior data, the change of information propagation behind the rise and fall of sina micro-blog is analyzed. The temporal keyword query is used to improve the accuracy of the data extraction rules in this analysis process and help to cover more comprehensive data. The logarithmic Gauss model is proposed by using the modeling of a single micro-blog forwarding time sequence. Based on the method of fitting the parameters of a group of micro-blog forwarding models, this paper points out a statistic related to the speed of information propagation. This paper further defines the behavior characteristics of the users on the Sina micro-blog platform, as well as the external characteristics that reflect the attitude of the entire network users to the social platforms, and analyzes their changing trends. And explore the relationship between them and the statistics reflecting the information dissemination. Finally, this paper systematized the full text related technology and constructed an online analysis platform of real-time micro-blog data stream based on Sina micro-blog. It can cluster the results of the temporal keyword search search into a topic, and display the preliminary statistics of the topic from several dimensions. In summary, this paper extends the function of keyword search on social media data, proposes temporal keyword query, and explores the organization structure and query arithmetic of index from two aspects of social media data characteristics and index updating efficiency. Two analysis applications based on this query It can be more flexible to adapt to various application scenarios, help users excavate important information from social media data, and provide data base for further complex analysis tasks. The open access system at the end of this paper implements the index and analysis technology in the text, and makes researchers and analysts in various fields. People can benefit from massive real-time social media data.
【学位授予单位】:华东师范大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP391.3;TP393.09
【相似文献】
相关期刊论文 前10条
1 梁银;董永权;;基于对象集合的空间关键词查询[J];计算机应用;2014年07期
2 张颖;李昕;;一种关系数据库上的关键词查询排序方法[J];辽宁工业大学学报(自然科学版);2013年05期
3 寇苏玲;蔡庆生;;应用于用户兴趣建模的多文本关键词抽取研究[J];计算机仿真;2007年02期
4 林子雨;杨冬青;王腾蛟;张东站;;基于关系数据库的关键词查询[J];软件学报;2010年10期
5 林子雨;邹权;赖永炫;林琛;;关系数据库中的关键词查询结果动态优化[J];软件学报;2014年03期
6 李益民;;一种大规模Deep Web查询重构技术[J];情报科学;2014年01期
7 李慧颖;瞿裕忠;;基于关键词的RDF数据查询方法[J];东南大学学报(自然科学版);2010年02期
8 杨书新;徐慧琴;;基于数据图的关系数据库关键词查询排序研究[J];计算机应用研究;2014年02期
9 海沫;郭树行;;网络环境中基于语义聚类的多关键词查询机制[J];图书情报工作;2012年20期
10 安镇宙;杨鉴;仇汶;;一种新的基于分层查询表的关键词识别模型[J];计算机工程与应用;2008年02期
相关会议论文 前3条
1 修慧兰;;台湾大学生个人竞争力之相关研究[A];全国教育与心理统计与测量学术年会暨第八届海峡两岸心理与教育测验学术研讨会论文摘要集[C];2008年
2 杨艳;何天宇;;基于短语的关系数据库关键词查询方法[A];第29届中国数据库学术会议论文集(B辑)(NDBC2012)[C];2012年
3 李_,
本文编号:2166788
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2166788.html