当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于多表数据库的中文关键词Top-N查询处理

发布时间:2018-05-28 20:24

  本文选题:关系数据库 + 中文关键词 ; 参考:《河北大学》2013年硕士论文


【摘要】:关键词查询的理论和技术在信息检索和Web搜索引擎中得到了广泛深入的研究和应用。传统数据库管理系统仅支持模式匹配,不支持自由形态的关键词查询。鉴于此,近年来关系数据库上的关键词查询处理的研究成为备受关注的前沿课题之一。传统关系数据库系统运用结构化查询语言(SQL)对数据库进行操作,需要用户掌握SQL和数据库模式,这对于普通用户是困难的。此外,对返回的查询结果,传统数据库系统只能进行简单排序,用户要想从中获取最感兴趣的信息是很困难的。目前,关键词查询的研究主要针对英文关键词,因此针对具有多表的数据库,本文给出一种中文关键词top-N查询处理方法。此方法创建索引表存储从数据库中析出的中文元组字及其相关信息,进而构造索引用以快速匹配查询关键字,借鉴IR的相似度公式构造适合中文关键词查询的排序策略。对于一个中文关键词查询,利用索引快速匹配查询字和元组字得到相应信息,,并根据这些信息创建候选元组生成链表和SQL查询语句,进而得到候选元组及其与查询之间的相似度,最终按相似度返回Top-N结果。此方法实现了按字搜索及中文的缩略词的查询处理。最后利用真实数据集进行实验,实验内容包括对查询相应时间和准确性的验证,实验数据显示本文方法是有效的。
[Abstract]:The theory and technology of keyword query have been widely studied and applied in information retrieval and Web search engine. Traditional database management system only supports pattern matching, not free form keyword query. In view of this, the research of keyword query processing on relational database has become one of the most concerned topics in recent years. The traditional relational database system uses structured query language SQL) to operate the database, which requires users to master SQL and database schema, which is difficult for ordinary users. In addition, the traditional database system can only sort the returned query results simply, so it is difficult for users to obtain the most interesting information from them. At present, the research of keyword query is mainly focused on English keywords, so for the database with multiple tables, this paper presents a method of Chinese keyword top-N query processing. In this method, the index table is created to store the Chinese tuples and related information extracted from the database, and then the index is constructed to match the query keywords quickly, and the ranking strategy suitable for the Chinese keyword query is constructed by using the similarity formula of IR. For a Chinese keyword query, the index is used to quickly match the query word and the tuple word to get the corresponding information. According to this information, the candidate tuples are created to generate the linked list and the SQL query statement. Then the candidate tuples and their similarity with the query are obtained, and the Top-N results are returned according to the similarity. This method realizes word search and Chinese acronym query processing. Finally, the real data set is used to carry out the experiment, which includes the verification of the time and accuracy of the query, and the experimental data show that the method in this paper is effective.
【学位授予单位】:河北大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP311.13;TP391.1

【参考文献】

相关期刊论文 前5条

1 郗君甫;刘国华;唐军军;祁瑞丽;朱鹤;;基于本体的关系数据库关键词语义查询扩展方法[J];燕山大学学报;2010年03期

2 马志柔;叶屹;;一种有效的多关键词词频统计方法[J];计算机工程;2006年10期

3 柳佳刚;陈山;;基于PAT-tree的中文关键词自动检索模式的研究[J];计算技术与自动化;2009年02期

4 黎方正;谢东;;基于完全化语义的关键词检索研究[J];计算机应用研究;2010年10期

5 王珊;张俊;彭朝晖;战疆;杜小勇;;基于本体的关系数据库语义检索[J];计算机科学与探索;2007年01期



本文编号:1948054

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1948054.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户55a09***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com