基于xPlore的图片搜索系统的设计与实现
发布时间:2018-05-23 08:03
本文选题:XML + xPlore ; 参考:《南京大学》2013年硕士论文
【摘要】:随着企业级文档管理系统不断发展,其信息管理能力和搜索能力不断提高,而对图片各方面信息的支持还不成熟。普通的文档搜索主要关注文件基本信息以及对文档内容建立全文索引,但是图片文件通常本身是没有文本信息的,而且图片的图像信息和EXIF信息都不在基本的文件信息中。在企业级文档搜索系统xPlore中,图片搜索的支持也很不成熟,只能针对图片的基本文件信息进行搜索,因此需要研究如何使得xPlore更好地支持图片搜索。 图片搜索需要全面的图片信息,图片信息主要可以分为文本信息、图像信息和EXIF信息,文本信息可以根据光学字符识别获得,图像信息可以根据感知哈希算法获得图片指纹,EXIF信息则包括相机信息和位置信息等。而在xPlore中,文本信息提取过程是在CPS中完成的,该过程是需要针对图片文件的修改,而其他图片信息的提取可以通过非结构化数据注释器添加到文档处理过程中。 本文设计并实现了一个基于xPlore图片搜索管理系统,首先是对xPlore的一些修改与配置,包括对文档内容提取过程中的完善并实现和配置图片相关的非结构化数据注释器;然后是一个基于修改后xPlore构建的图片搜索系统,该系统使用GWT构建界面,然后提供针对文本、图像、EXIF信息多角度的搜索功能。此外,图片的文本识别之后,本项目增加了拼写检查的过程,以使结果更加准确。 本项目完善了xPlore对图片各方面信息搜索的支持,同时实现了一个基于该搜索引擎的图片系统。在本项目的开发过程中,进一步验证了xPlore新版本对非结构化注释器的支持。搭建的图片搜索系统包含以下功能,图片和相册的管理、搜索以及社交网络发布,给同事图片管理提供了诸多方便。
[Abstract]:With the development of enterprise document management system, its information management ability and searching ability are improved, and the support for all aspects of image information is not mature. Ordinary document search mainly focuses on the basic information of the file and the full text index of the document content, but the picture file itself usually has no text information, and the image information and the EXIF information of the picture are not in the basic file information. In the enterprise document search system (xPlore), the support of image search is also very immature, so it is necessary to study how to make xPlore support image search better because it can only search the basic file information of images. Image search needs comprehensive picture information. Picture information can be divided into text information, image information and EXIF information. Text information can be obtained according to optical character recognition. Image information can be obtained by perceptual hashing algorithm, which includes camera information and location information. In xPlore, the text information extraction process is completed in CPS, which needs to be modified for image files, while other image information extraction can be added to the document processing process through unstructured data annotator. This paper designs and implements an image search and management system based on xPlore. Firstly, some modifications and configurations of xPlore are introduced, including the improvement of document content extraction and the implementation of non-structured data annotator related to the configuration of images. Then there is a picture search system based on modified xPlore, which uses GWT to build interface, and then provides multi-angle search function for text and image information. In addition, after the text recognition of the picture, this item adds the spelling check process to make the results more accurate. This project consummates the xPlore to the picture each aspect information search support, at the same time has realized a picture system based on this search engine. During the development of this project, the support for unstructured annotators in the new version of xPlore is further verified. The system includes the following functions: image and photo album management, search and social network publishing, which provides a lot of convenience for colleague image management.
【学位授予单位】:南京大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP311.52
【参考文献】
相关期刊论文 前6条
1 毕红军,裘正定,杜锡钰;等汉明距离编码的研究[J];北方交通大学学报;1997年05期
2 荆涛,王仲;光学字符识别技术与展望[J];计算机工程;2003年02期
3 李东;邝子民;;XPath结构连接顺序优化[J];计算机科学与探索;2010年11期
4 王波;王瀚波;;基于JQuery的自动完成功能的实现[J];三门峡职业技术学院学报;2010年03期
5 唐拥政;衡冬梅;;基于Hibernate的数据持久层关键技术的研究[J];盐城工学院学报(自然科学版);2006年02期
6 李斌;姚建民;朱巧明;;英文作文的自动拼写检查研究[J];郑州大学学报(理学版);2008年03期
,本文编号:1923928
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1923928.html