当前位置:主页 > 外语论文 > 英语论文 >

针对答题关键信息的汉英口语翻译题自动评分方法的研究

发布时间:2018-12-12 10:55
【摘要】:英语口语考试自动评分一直是计算机辅助语言学习领域中的研究热点。目前,已有的自动评分方法大多针对朗读题、跟读题等题型,而针对汉英口语翻译题的自动评分方法则相对鲜见。采用人工评分方式对汉英口语翻译题进行评分时,评分员主要侧重答题中是否包含参考答案中要求回答的关键信息,其中关键信息一般通过关键词体现。基于此,本文模仿人工评分方式,研究了针对答题关键信息的汉英口语翻译题自动评分的相关问题,包括答题语音信号预处理,答题关键信息识别及关键信息完整性特征提取,答题发音流利度计算及流利度特征提取,结合关键信息完整性特征和流利度特征的自动评分方法设计等。主要研究工作包括:(1)考虑到在缺乏大量已标注语音数据的情况下难以构建精确的连续语音识别系统的问题,本文采用基于动态时间规整(DTW)算法的无监督语音关键词检出方法去识别答题语音中的关键词。首先在TIMIT语料库中对基于SLN-DTW(Segmental local-normalized-DTW)的语音关键词检出方法进行有效性验证实验,实验结果证明了其性能优势;进而结合WordNet构建答题关键词识别库,将SLN-DTW关键词检出方法应用于答题语音中的关键词识别,实验结果表明,采用SLN-DTW关键词检出方法检出的关键词个数可作为答题关键信息覆盖程度的有效特征表示。(2)对于检出的关键词,为了进一步得到其置信度,本文采用基于卷积神经网络(CNN)的语音识别方法对检出的关键词进行二次识别。首先通过构建基于CNN的语音识别模型,结合均值规整算法处理语音特征参数,在UCI机器学习库的Spoken Arabic Digit数据集上进行识别实验,实验得出的识别率优于其他模型在相同数据集上的实验结果;在采集的答题关键词数据集上进行识别实验,结果也优于其他两种常用的识别模型,证明了采用CNN语音识别方法的可行性。(3)利用答题语音关键词检出结果和关键词二次识别结果计算得到关键信息完整性特征,与原始答题语音层面上提取的流利度特征一起构成本文的自动评分特征。采用回归分析法对所有特征进行回归分析,构建本文的自动评分模型。通过真实的考试数据对评分模型进行性能测试,测试得到的机器评分与原始评分的整体相关性大小为0.729,证明提取的特征用于机器评分是有效的,也验证了本文提出的针对答题关键信息的汉英口语翻译题自动评分方法的有效性。
[Abstract]:The automatic score of oral English test has been a hot topic in the field of computer-assisted language learning. At present, most of the existing automatic scoring methods are aimed at reading questions, following reading questions and so on, while the automatic scoring methods for Chinese-English translation questions are relatively rare. When using manual scoring method to score Chinese-English translation questions, the grader mainly focuses on whether or not the key information required in the reference answers is included in the answer questions, in which the key information is generally reflected by the key words. Based on this, this paper simulates the manual scoring method, and studies the related questions about the automatic score of the Chinese-English oral translation questions for the key information of the answer question, including the speech signal preprocessing, the key information recognition and the extraction of the key information integrity feature. The pronunciation fluency calculation and fluency feature extraction, and the design of automatic scoring method combined with the key information integrity feature and fluency feature, etc. The main research work includes: (1) considering the difficulty of constructing accurate continuous speech recognition system in the absence of a large amount of annotated speech data, In this paper, an unsupervised keyword detection method based on dynamic time warping (DTW) algorithm is used to identify the keywords in the answer speech. Firstly, the validity of the method based on SLN-DTW (Segmental local-normalized-DTW) is verified in the TIMIT corpus, and the experimental results show that its performance is superior. Then combining with WordNet to construct the key word recognition library of answer questions, the SLN-DTW keyword detection method is applied to the key words recognition in the answer speech. The experimental results show that, The number of keywords detected by SLN-DTW keyword detection method can be used as an effective feature of the coverage of key information. (2) for the keywords detected, in order to obtain further confidence, A speech recognition method based on convolutional neural network (CNN) is proposed in this paper. Firstly, by constructing a speech recognition model based on CNN and processing speech feature parameters with mean warping algorithm, the recognition experiment is carried out on the Spoken Arabic Digit dataset of UCI machine learning library. The recognition rate of the experiment is better than that of other models on the same data set. The result of recognition experiment on the collected key word data set is better than that of the other two commonly used recognition models. The feasibility of using CNN speech recognition method is proved. (3) the key information integrity features are obtained by using the results of keyword detection and keyword recognition. Together with the fluency feature extracted from the speech level of the original answer question, it constitutes the automatic scoring feature of this paper. The automatic scoring model of this paper is constructed by regression analysis of all the features. The whole correlation between the machine score and the original score is 0.729, which proves that the feature extracted is effective for the machine score, and the performance of the scoring model is tested by the real test data, and the overall correlation between the machine score and the original score is 0.729, which proves that the extracted features are effective. It also verifies the effectiveness of the automatic scoring method for the key information of Chinese-English translation.
【学位授予单位】:广东外语外贸大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:H315.9

【参考文献】

相关期刊论文 前10条

1 周飞燕;金林鹏;董军;;卷积神经网络研究综述[J];计算机学报;2017年06期

2 魏扬威;黄萱菁;;结合语言学特征和自编码器的英语作文自动评分[J];计算机系统应用;2017年01期

3 侯靖勇;谢磊;杨鹏;肖雄;梁祥智;徐海华;王磊;吕航;马斌;CHNG EngSiong;李海洲;;基于DTW的语音关键词检出[J];清华大学学报(自然科学版);2017年01期

4 徐海铭;金燕;王磊;;口译水平测评中的语言指标效度研究——以英语专业八级考试中的口译样本为例[J];外语测试与教学;2016年01期

5 张晴晴;刘勇;潘接林;颜永红;;基于卷积神经网络的连续语音识别[J];工程科学学报;2015年09期

6 杨鹏;谢磊;张艳宁;;低资源语言的无监督语音关键词检测技术综述[J];中国图象图形学报;2015年02期

7 杨柳燕;;中国口译学习者汉英交替传译流利度的探索性研究[J];浙江外国语学院学报;2015年01期

8 潘广源;柴伟;乔俊飞;;DBN网络的深度确定方法[J];控制与决策;2015年02期

9 刘建伟;刘媛;罗雄麟;;深度学习研究进展[J];计算机应用研究;2014年07期

10 刘建伟;刘媛;罗雄麟;;玻尔兹曼机研究进展[J];计算机研究与发展;2014年01期

相关博士学位论文 前1条

1 严可;发音质量自动评测技术研究[D];中国科学技术大学;2012年

相关硕士学位论文 前5条

1 李慧慧;基于深度学习的短语音说话人识别研究[D];郑州大学;2016年

2 陈嘉华;基于深度学习的英语语音识别与发音质量评价[D];广东外语外贸大学;2015年

3 梁静;基于深度学习的语音识别研究[D];北京邮电大学;2014年

4 黄文涛;基于神经网络的嵌入式语音识别系统研究[D];广东工业大学;2012年

5 严可;英文朗读题及复述题自动评测技术研究[D];中国科学技术大学;2009年



本文编号:2374441

资料下载
论文发表

本文链接:https://www.wllwen.com/waiyulunwen/yingyulunwen/2374441.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户99ff5***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com