基于Android平台的图像文字识别及语音播放系统

发布时间：2018-03-17 00:01

本文选题：安卓平台　切入点：文字识别　出处：《南京邮电大学》2017年硕士论文　论文类型：学位论文

【摘要】：据统计全球约超过1.5%的人群因视觉方面的障碍不能像正常人那样学习和生活,图像文字识别和语音播放技术在一定程度上可以为他们提供阅读帮助。虽然目前市场上已有基于Androi d终端的类似产品,如云脉文档识别、OCR(Optical Character Recognition)文字识别等,但这些识别软件对图像拍摄要求较高,往往要求拍摄的文字清晰、图像不能倾斜、图像仅仅只包含文字等,否则将无法识别或者导致识别准确率降低,故这些要求对于存在视力障碍人群并不现实。为此本文研究开发了基于Android的文字图像识别软件,并增加了语音播放的功能,使用者可通过听觉获取文字信息。本文完成的主要工作如下:首先,提出文字图像倾斜矫正和文字区域裁剪算法,并通过灰度化、二值化、倾斜矫正和文字区域裁剪等过程降低了待识别的文字图像冗余信息,实现了文字图像的预处理。然后,基于google公司优化的tesseract识别引擎开发了文字识别功能,并通过训练和扩展字符库的方法来提高文字识别的准确率。最后,基于手说TTS(Text To S peech)引擎开发了语音播放功能,该功能不仅可以播放识别出来的文字,而且可以以不同性别、不同音量、不同语速进行播放。通过对该系统进行测试验证了本文开发的基于Android平台的图像文字识别及语音播放系统的有效性,并且它同市场上应用最广泛的识别软件之一的云脉文档识别进行了识别对比,验证了其在识别有倾斜或者包含非文字部分的文本图像时效果更好。
[Abstract]:According to statistics, more than 1.5% people in the world are unable to study and live like normal people because of visual difficulties. To a certain extent, the technology of image recognition and speech playback can help them to read. Although there are already similar products based on Androi d terminals in the market, such as cloud pulse document recognition, optical Character recognition, character recognition, etc. However, the recognition software often requires the text to be clear, the image can not be tilted, and the image only contains text, otherwise, the recognition accuracy will be reduced. Therefore, these requirements are not realistic for people with visual impairment. In this paper, the text and image recognition software based on Android is developed, and the function of speech playing is added. The main work of this paper is as follows: firstly, the text image tilt correction and text region clipping algorithm are proposed. The process of skew correction and text region clipping reduces the redundant information of the text image to be recognized, and realizes the preprocessing of the text image. Then, based on the tesseract recognition engine optimized by google Company, the text recognition function is developed. And improve the accuracy of character recognition by training and expanding the character base. Finally, based on the handheld TTS(Text to S peech-based engine, a speech playback function has been developed, which not only can play the recognized text, but also can be of different gender. By testing the system, the validity of the image text recognition and speech playback system based on Android platform is verified. And it is compared with cloud pulse document recognition which is one of the most widely used recognition software in the market. It is proved that it is more effective in recognizing text images with skew or non-text parts.
【学位授予单位】：南京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41;TN912

【参考文献】