民族节日领域本体的构建及语义检索模型研究

发布时间：2018-09-01 09:24

【摘要】：少数民族文化是民族发展过程中创造并传承下来的精神财富,有着独特的价值、丰富的内涵和鲜明的民族特色,其中少数民族的传统节日更是民族文化不可缺少的一部分。但由于现代社会的发展和主流文化的强势冲击、少数民族信息资源开发利用不足,传统的信息传播方式使民族风俗、文化、宗教等具有民族特色的资源面临着传承中断的困境。本文以泼水节为例,实现了对少数民族节日领域本体的半自动构建,并对民族节日领域语义检索模型进行研究,利用本体技术的优势为后续的少数民族文化传承、保护和宣传提供了技术基础。本文主要工作如下:1.本文总结国内外本体半自动构建的现状,探讨领域本体的半自动构建的模式方法,提出一种民族节日领域本体的半自动构建方法,并构建了领域初始本体,通过专家指导和查询大量文献资料构造了民族节日领域词典用于文本分词。2.结合领域需求,通过网络爬虫技术获取文本,利用SVM文本分类技术获取民族节日领域中泼水节相关文本。在文本预处理的分词阶段使用本文构造的领域词典并对特征选择的卡方检验方法和权重计算的TF-IDF方法做出改进,提高了分类的准确性。3.对文本分类后获取到的领域相关文本进行概念和关系的提取,在概念提取阶段使用基于统计的方法获取领域概念集,在关系提取阶段以词法特征为基础,结合依存句法分析技术,利用基于树结构思想的SVM对概念关系进行提取,用Jena API和Protégé共同完成本体扩展、修正和概念关系细化工作,形成最终领域本体。4.为验证领域本体的实用性,对民族节日领域本体的检索应用提出了详细的设计和构思,并构建了初始的语义检索试验模型,为本体的后续应用提供理论基础和可行性研究。
[Abstract]:Minority culture is the spiritual wealth created and passed down in the process of national development, which has unique value, rich connotation and distinct national characteristics, among which the traditional festival of ethnic minorities is an indispensable part of national culture. However, due to the development of modern society and the strong impact of mainstream culture, the exploitation and utilization of the information resources of ethnic minorities are insufficient, and the traditional ways of information dissemination make the national customs, culture, religion and other resources with national characteristics faced with the dilemma of inheritance and interruption. This paper takes the water splashing festival as an example, realizes the semi-automatic construction of the domain ontology of the minority festival, studies the semantic retrieval model of the national festival field, and makes use of the advantages of ontology technology to carry on the subsequent minority culture inheritance. Protection and publicity provide the technical basis. The main work of this paper is as follows: 1. This paper summarizes the present situation of ontology semi-automatic construction at home and abroad, discusses the mode method of domain ontology semi-automatic construction, puts forward a semi-automatic construction method of national festival domain ontology, and constructs domain initial ontology. Through expert guidance and inquiry a large number of literature materials to construct a national festival field dictionary for text participle. 2. According to the requirements of the field, the text is obtained by the web crawler technology and the text classification technology of SVM is used to obtain the relevant text of the water splashing festival in the field of national festivals. In the word segmentation stage of text preprocessing, the domain dictionary constructed in this paper is used and the chi-square test method of feature selection and the TF-IDF method of weight calculation are improved to improve the accuracy of classification. 3. The concepts and relationships of domain related texts are extracted after text classification. In the phase of concept extraction, a statistical method is used to obtain domain concept sets, and in the phase of relation extraction, lexical features are used as the basis. Combined with dependency syntax analysis technology, SVM based on tree structure is used to extract concept relation, and Jena API and Prot 茅 g 茅 are used to complete ontology extension, revise and refine concept relation, and form final domain ontology. 4. In order to verify the practicability of domain ontology, this paper presents a detailed design and conception for the retrieval application of national festival domain ontology, and constructs an initial semantic retrieval experimental model, which provides a theoretical basis and feasibility study for the subsequent application of ontology.
【学位授予单位】：云南师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【参考文献】