基于语义信息的服饰检索平台

发布时间：2018-07-03 17:23

本文选题：服饰检索 + 网络爬虫　；参考：《吉林大学》2017年硕士论文

【摘要】：网上购物因其便捷性自从出现以来一直备受青睐,服装类商品是在线销售量最高的产品之一,为了促进网络销售服装的发展,快速准确的搜索出心仪的衣服,成为了关键。本课题构建了一个基于语义信息的服饰检索平台,将低层次的物理特征与高层次语义特征联系起来,将语义特征作为检索依据完成图像检索。本课题的主要研究内容包括:1、应用网络爬虫技术建立初始数据集。使用网络爬虫技术获取各大服饰网站中不同种类的服饰信息,如缩略图、价格、购买量等,建立初始训练数据集,将图像检索与网络爬虫相结合,对大规模图像数据进行处理,提高了在线搜索性能。2、采用语义分类的方法对图像进行分类处理。相对于低级特征,属性对服装变化有较好的鲁棒性,从大量的不同训练数据中提取的不同的视觉属性,将作为图像的高级语义表示,将服饰图像的物理层面的特征进行提取和归纳,与高层次语义标签建立连接,应用最小二乘概率分类算法生成语义模型,该模型作为图像的分类标准,为图像的相似性度量提供支持。3、引入倒排索引技术,将相似图片定位到候选对象中,减少相似性度量计算次数,有利于缩短检索时间。基于语义信息的服饰检索开发的工作流程为,首先应用爬虫技术抓取服饰网站数据,构建初始数据集,为检索提供数据支持。对数据集中图像提取局部描述算子,使用k-means算法对特征进行归一化处理后得到的特征向量,与颜色特征组合共同代表一幅图像。采用分类算法将初始数据集中的所有图像特征描述向量分类训练得到语义模型。在检索阶段,采用上传本地图片的方法,通常情况下,用户所捕获的服装图像有着不规范性,受到几何变形,闭塞,杂乱的背景和光度变化等因素的影响,将对服饰检索造成极大的挑战,因此对查询图像进行图像分割、特征提取等图像预处理操作,进而提取图像的目标区域的特征。将提取到的特征经由语义分类模型分类,得到属于该图像的分类概率向量,该特征向量作为图像的相似度度量依据,欧式距离越小则图像越相似。检索时引入倒排索引方法,从数据集中快速而准确的获取相同或相似的服饰信息,如服饰图片、价格、来源、访问量等,提高检索速度。实验结果表明,本系统具有健壮性,并能提供优质的检索结果。
[Abstract]:Online shopping has been favored since its convenience, clothing products is one of the highest online sales products, in order to promote the development of online sales of clothing, fast and accurate search for the desired clothing, has become the key. In this paper, a clothing retrieval platform based on semantic information is constructed, which connects the low-level physical features with the high-level semantic features, and uses semantic features as the basis for image retrieval. The main research contents of this thesis include: 1, using web crawler technology to establish initial data set. Using web crawler technology to obtain different kinds of clothing information, such as thumbnail, price, purchase amount, etc., establish initial training data set, combine image retrieval with web crawler, and process large scale image data. The method of semantic classification is used to classify images. Relative to low-level features, attributes are more robust to clothing changes. Different visual attributes extracted from a large number of different training data will be used as high-level semantic representation of images. The features of the physical level of dress image are extracted and induced, connected with the high-level semantic label, and the semantic model is generated by using the least square probability classification algorithm, which is used as the classification standard of the image. The inverted indexing technique is introduced to locate the similarity images in the candidate objects, which can reduce the number of similarity measurement calculations and shorten the retrieval time. The workflow of costume retrieval development based on semantic information is: firstly, crawler technology is used to capture clothing website data, and the initial data set is constructed to provide data support for retrieval. The local description operator is extracted from the image in the dataset, and the feature vector is obtained by using the k-means algorithm to normalize the feature, and the color feature combination is used to represent a single image. The classification algorithm is used to train all the feature description vectors in the initial data set to obtain the semantic model. In the retrieval stage, the method of uploading local images is used. In general, the clothing images captured by users are irregular, affected by geometric deformation, block, clutter background and photometric changes, etc. It will pose a great challenge to dress retrieval, so image preprocessing operations such as image segmentation and feature extraction are carried out to extract the features of the target region of the image. The extracted features are classified by semantic classification model to obtain the classification probability vector of the image. The feature vector is used as the measure of image similarity. The smaller the Euclidean distance is, the more similar the image is. The inverted index method is introduced to obtain the same or similar dress information quickly and accurately from the data set, such as dress picture, price, source, visit quantity and so on, so as to improve the retrieval speed. The experimental results show that the system is robust and can provide high quality retrieval results.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】