数学搜索中索引模型研究
发布时间:2018-11-26 17:58
【摘要】:搜索引擎是从互联网的海量数据中检索有用信息的高效工具,然而随着互联网的迅猛发展,用户群体的增长,数字信息化程度的不断提高和新技术的飞速发展,人们对信息的需求越来越多样化,搜索引擎面临越来越多的挑战。近几年来,数学公式的检索己成为信息学科研究的热点和难点问题,它对学习和科研非常重要,而通用的文本搜索引擎在对数学内容的检索上有很大的局限性,使用户无法得到满意的搜索结果。 数学公式有着复杂的二维结构以及蕴含着丰富的语义,不同结构的数学公式可能有着相同的数学含义,一个数学公式也可能有多种描述方法。此外子公式的查询也是数学搜索中很有意义的一项研究内容,用户输入的查询公式有可能就是某个数学表达式的子公式,在返回检索结果时,应将包含该查询公式的原公式也返回给用户。目前,国内外也有一些专门从事数学搜索研究的机构,但他们大多数都是针对完全相同的数学公式进行检索,未涉及数学公式的语义,对于子公式的检索也未进行深入的探讨和研究。因此,本文在深入分析对比了现存的一些数学搜索引擎索引模型的构建方法和技术基础上,将计算机代数系统(CAS)与数学搜索相结合,提出了一种基于语义的索引模型构建方法。系统采用抽象树倒排索引模型,在建立索引前对数学公式进行预处理,利用CAS对数学公式规范化,并借鉴文本搜索引擎的N-gram方法,对数学公式进行子公式的划分,将它们也插入到索引项中,以此实现等价和相关数学公式的有效存储与管理,大大提升了数学搜索的语义检索能力。
[Abstract]:Search engine is an efficient tool for retrieving useful information from the mass data of the Internet. However, with the rapid development of the Internet, the growth of user groups, the continuous improvement of digital information level and the rapid development of new technologies, the search engine is an efficient tool for retrieving useful information from the mass data of the Internet. People's demand for information is more and more diverse, search engine is facing more and more challenges. In recent years, the retrieval of mathematical formulas has become a hot and difficult problem in the field of information science. It is very important for learning and scientific research, while the general text search engine has great limitations on the retrieval of mathematical content. Prevents the user from obtaining satisfactory search results. Mathematical formulas have complex two-dimensional structure and rich semantics. The mathematical formulas of different structures may have the same mathematical meaning, and a mathematical formula may also have a variety of description methods. In addition, the query of subformula is also a meaningful research content in mathematical search. The query formula input by the user may be a subformula of a mathematical expression, and when the retrieval result is returned, The original formula that contains the query formula should also be returned to the user. At present, there are also some institutions specialized in the research of mathematical search, but most of them search for the exact same mathematical formula, not involving the semantics of the mathematical formula. The search for subformulas has not been deeply discussed and studied. Therefore, on the basis of in-depth analysis and comparison of some existing mathematical search engine index models, this paper combines computer algebra system (CAS) with mathematical search. A semantic-based index model construction method is proposed. The system adopts the Abstract Tree inverted Index Model, preprocesses the mathematical formula before establishing the index, normalizes the mathematical formula by using CAS, and uses the N-gram method of the text search engine to divide the mathematical formula into sub-formulas. They are also inserted into the index items to realize the effective storage and management of equivalent and related mathematical formulas, which greatly improves the semantic retrieval ability of mathematical search.
【学位授予单位】:兰州大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
本文编号:2359236
[Abstract]:Search engine is an efficient tool for retrieving useful information from the mass data of the Internet. However, with the rapid development of the Internet, the growth of user groups, the continuous improvement of digital information level and the rapid development of new technologies, the search engine is an efficient tool for retrieving useful information from the mass data of the Internet. People's demand for information is more and more diverse, search engine is facing more and more challenges. In recent years, the retrieval of mathematical formulas has become a hot and difficult problem in the field of information science. It is very important for learning and scientific research, while the general text search engine has great limitations on the retrieval of mathematical content. Prevents the user from obtaining satisfactory search results. Mathematical formulas have complex two-dimensional structure and rich semantics. The mathematical formulas of different structures may have the same mathematical meaning, and a mathematical formula may also have a variety of description methods. In addition, the query of subformula is also a meaningful research content in mathematical search. The query formula input by the user may be a subformula of a mathematical expression, and when the retrieval result is returned, The original formula that contains the query formula should also be returned to the user. At present, there are also some institutions specialized in the research of mathematical search, but most of them search for the exact same mathematical formula, not involving the semantics of the mathematical formula. The search for subformulas has not been deeply discussed and studied. Therefore, on the basis of in-depth analysis and comparison of some existing mathematical search engine index models, this paper combines computer algebra system (CAS) with mathematical search. A semantic-based index model construction method is proposed. The system adopts the Abstract Tree inverted Index Model, preprocesses the mathematical formula before establishing the index, normalizes the mathematical formula by using CAS, and uses the N-gram method of the text search engine to divide the mathematical formula into sub-formulas. They are also inserted into the index items to realize the effective storage and management of equivalent and related mathematical formulas, which greatly improves the semantic retrieval ability of mathematical search.
【学位授予单位】:兰州大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前1条
1 聂俊;陈天莹;符红光;;基于Latex的互联网数学公式搜索引擎[J];计算机应用;2010年S2期
,本文编号:2359236
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2359236.html