基于规则和SVM的教育资源分类技术研究
[Abstract]:With the rapid development of network technology and the rapid growth of all kinds of online information resources, a large number of educational resources are also emerging in the network. Network education resources are becoming more and more important for students, educational scientists and parents to obtain important information. But existing search engines tend to have a lot of irrelevant or useless content when searching for information. Therefore, how to quickly and effectively obtain useful information and classify educational resources from a large number of information resources is the focus of this paper. The automatic text classification is one of the key technologies to realize the automatic text classification of network education resources. The main contents of this paper are as follows: 1. This paper analyzes the current situation of network education resources and the behavior and needs of network subjects, and constructs the classification system of basic education resources. 2. In view of the existence of a large number of feature selection algorithms, in order to be able to decide which algorithm to use in a specific situation, we need to put forward criteria that can be relied upon or judged. In this paper, we review some basic feature selection algorithms in relevant literature, and compare the feature selection methods and algorithms, and then propose a criterion that can be relied on or judged. 3. There are subordination and parallel relationships among educational resources. According to these relationships, this paper constructs them into a hierarchical structure and discusses the influence of the main structural features of HTML format web pages (i.e. title,Anchor Text,meta) on the classification of web pages. A rule-based classification method is proposed. The experimental results show that the title and anchor text have a positive effect on the classification of web pages. 4. To construct a classifier for educational resources, this paper first introduces the basic theoretical knowledge of SVM. Based on the traditional SVM algorithm, this paper aims at the sensitivity of outlier to classification results in nonlinear separable text problems. An improved multi-class SVM algorithm is proposed. The experimental results of Weighted Multi-Class SVM), show that the algorithm is more effective than the multi-class SVM algorithm. Aiming at the problems of high precision and low recall of rule-based classification algorithm, low precision rate and high recall rate of improved SVM algorithm, this paper proposes a method to combine the two methods. The experimental results show that the classification effect and efficiency of the system can be improved.
【学位授予单位】:新疆大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 谭金波;;面向网络教育资源的文本自动分类系统的设计与实现[J];中国远程教育;2009年04期
2 杨学兵,蔡庆生;一种基于概念层次的分类规则挖掘算法研究[J];华中科技大学学报;2001年09期
3 段宏,张桂清,谭运猛;一种基于Web挖掘的信息自动分类系统[J];华中科技大学学报(自然科学版);2003年07期
4 冀俊忠,刘椿年,沙志强;贝叶斯网模型的学习、推理和应用[J];计算机工程与应用;2003年05期
5 王君泽;黄本雄;胡广;温杰;;社区问答服务中的问题分类任务研究[J];计算机工程与科学;2011年01期
6 蒋刚毅;张云;郁梅;;基于相关性分析的多模式多视点视频编码[J];计算机学报;2007年12期
7 胡于进,周小玲,凌玲,王学林;基于向量空间模型的贝叶斯文本分类方法[J];计算机与数字工程;2004年06期
8 代六玲,黄河燕,陈肇雄;中文文本分类中特征抽取方法的比较研究[J];中文信息学报;2004年01期
9 张学工;关于统计学习理论与支持向量机[J];自动化学报;2000年01期
10 姚旭;王晓丹;张玉玺;权文;;基于近似Markov Blanket和动态互信息的特征选择算法[J];计算机科学;2012年08期
相关博士学位论文 前1条
1 张雪英;基于粗糙集理论的文本自动分类研究[D];南京理工大学;2005年
相关硕士学位论文 前4条
1 黄峰;基础教育搜索引擎中的网页文档特征提取研究[D];南京师范大学;2006年
2 苏劲松;全宋词语料库建设及其风格与情感分析的计算方法研究[D];厦门大学;2007年
3 徐亮;中文新词识别研究[D];大连理工大学;2009年
4 谭俊武;面向网络舆情分析的文本倾向性分类技术的研究与实现[D];国防科学技术大学;2009年
本文编号:2262357
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2262357.html