基于iOS系统的语音云开放平台客户端SDK的设计与实现

发布时间：2018-06-12 04:45

本文选题：语音云 + 语音识别　；参考：《北京邮电大学》2014年硕士论文

【摘要】：在智能手机与智能平板等移动终端高度普及的今天,移动互联网飞速发展,移动终端应用对文字输入的要求也变得越来越高,导航类、聊天类等应用更是希望通过语音识别技术解放用户双手进行文字输入。随着iOS设备上Siri平台的日渐成熟,各大互联网公司也相继推出了自己的语音识别系统,但就目前来看iOS系统还未能给开发者提供公共的Siri API来调用语音识别功能,而各大互联网公司对客户端语音识别SDK又有严格限制,iOS系统缺乏通用的开放的语音识别SDK供开发者使用。本文主要研究了目前在iOS系统上可用的开放语音识别SDK,对比各语音识别SDK的产品功能,分析开发者对语音识别SDK的需求,提出了一整套新的解决方案来实现客户端语音识别SDK,全称为语音云开放平台客户端SDK,简称语音云SDK。语音云SDK使开发者可以轻松地在iOS设备上构建功能完备、交互性强的语音识别应用程序,在整个开发和使用过程中,开发者无需维护语音引擎即可享有语音识别服务。本文在软件工程思想的指导下,按照软件开发的过程,逐步实现语音云SDK系统。首先在了解了语音识别服务器端的基本流程,结合用户对语音识别的使用习惯,提出了语音云开放平台客户端SDK的需求,需求分析主要列出了语音云SDK给用户提供的功能以及语音云与服务器交互需要实现的功能。在详细的需求分析后对语音云SDK进行了详细地设计,设计过程中将整个语音云SDK按照功能分成了几个主要模块,分别为：录音模块、有效声音检测模块、音频压缩编码模块、网络收发模块以及识别结果回传模块等,并详细地列举了各个模块内的参数和方法,最后通过图表解释了各模块之间的工作流程以及交互关系。接下来根据设计进行了代码实现,代码实现的过程是按照音频数据在各模块中的流程顺序分先后实现。最后对整个语音云SDK进行了系统化的软件测试,并通过软件测试进一步完善了整个语音云SDK的可用性和安全性。
[Abstract]:With the popularity of mobile terminals, such as smart phones and intelligent tablets, mobile Internet has developed rapidly, and mobile terminal applications have become more and more demanding for text input. The applications of navigation and chat classes are more likely to emancipate users through speech recognition technology. With the increasing of the Siri platform on iOS devices Mature, the major Internet Co have also launched their own speech recognition system, but at present, the iOS system has not provided the developer with the public Siri API to call the voice recognition function, and the major Internet Co has strict restrictions on the client voice recognition SDK, and the iOS system lacks general open speech recognition SDK for opening. The hair is used.
This paper mainly studies the open speech recognition SDK available on the iOS system, compares the product function of each voice recognition SDK, analyzes the developer's demand for the voice recognition SDK, and puts forward a set of new solutions to realize the client voice recognition SDK, which is called the voice cloud open platform client SDK, abbreviated as voice cloud SDK. voice cloud SD. K makes it easy for developers to build a fully functional and interactive voice recognition application on iOS devices. In the whole process of development and use, developers can enjoy voice recognition services without the need to maintain a voice engine.
Under the guidance of software engineering thought, the speech cloud SDK system is gradually realized in accordance with the software development process. First, the basic flow of the voice recognition server is understood, and the requirement of the voice cloud open platform client SDK is put forward by combining the user's habit of using speech recognition. The requirement analysis mainly lists the voice cloud SDK. The function provided by the user and the function of the voice cloud and the server interaction need to be realized. After detailed requirement analysis, the voice cloud SDK is designed in detail. The whole voice cloud SDK is divided into several main modules in the design process, which are the recording module, the effective sound detection module, the audio compression coding module, and the network. The parameters and methods of each module are enumerated in detail. Finally, the work flow and interaction between each module are explained by the chart. Then the code implementation is carried out according to the design. The process of the code realization is divided into the process sequence of each module according to the audio data. Finally, the whole voice cloud SDK is tested in a systematic way, and the usability and security of the whole voice cloud SDK is further improved through software testing.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.34

【参考文献】