基于内容的商品图像分类算法研究

发布时间：2018-06-16 14:52

本文选题：商品图像分类 + 稀疏编码　；参考：《北京邮电大学》2016年硕士论文

【摘要】：互联网的迅速普及和基础设施的不断完善带来了电子商务的飞速发展。随着虚拟网络中的商品种类和数量急剧增加,如何向客户有效展示这些丰富的商品信息成为电子商务在智能化过程中遇到的重要问题。商品图像作为商品在互联网这个虚拟空间中最主要的信息载体,对其进行基于内容的自动分类能够为交易双方迅速检索商品信息、合理制定商品放置策略以及对客户进行个性化推荐提供帮助,从而提高电子商务市场的整体效能。本文在商品图像分类现有研究的基础上,借鉴通用图像分类算法,结合商品图像物体摆放位置居中、拍摄背景简单、细节信息丰富等特点,提出改进的基于内容的商品图像分类算法,具体工作如下:(1)针对商品图像的特点,将图像自适应地划分为具有大量商品信息的前景区域和缺少有用信息的空白背景区域。在前景区域中将具有明显局部特征的区域划分为特征区域,提取稀疏SIFT特征;将灰度变化较为平缓的区域划分为辅助特征区域,提取稠密SIFT特征;在空白的背景区域不提取特征点。采用SCSPM得到各区域的描述向量并根据融合函数进行连接并作为图像最终的描述特征。将这些特征输入支持向量机进行分类,实验结果显示使用自适应的融合SCSPM特征能比传统方法在商品图像分类上获得更好的结果。(2)针对传统SPM方法虽然记录了图像的空间位置信息,但不能体现视觉单词在特定位置所具有的分类能力的问题,本文提出一种基于熵的SPM空间加权方法。不同的视觉单词在不同类别出现的概率可能不同,根据信息理论可以利用熵的概念来描述不同单词的这种分类能力,因而在计算单词权重时融入视觉单词在该区域的分类信息能够进一步提高视觉单词的区分能力。(3)针对部分商品图像集使用单个SVM分类器分类准确率难以得到进一步提高的现象。本文提出一种以弱SVM作为AdaBoost算法分量分类器的多分类器联合决策方法AdaBoostSVM。依次训练一组分量分类器,在每一轮中给每个训练样本赋予一个权重,表明每个分量分类器对其关注程度,通过调整权重将分类器聚焦到更容易错分的那些样本点以获得更好的分类性能。本文利用MATLAB对商品图片分类过程进行模拟,实验结果表明该方法能够对商品图像进行有效分类,在实验图像集上平均分类准确率达到87%。
[Abstract]:The rapid popularization of the Internet and the continuous improvement of infrastructure brought about the rapid development of electronic commerce. With the rapid increase in the types and quantities of goods in virtual networks, how to effectively display these abundant commodity information to customers has become an important problem in the process of intelligent e-commerce. As the main information carrier of commodity in the virtual space of Internet, commodity image can be automatically classified based on content, which can quickly retrieve commodity information for both sides of the transaction. To improve the overall efficiency of e-commerce market, we can make a reasonable product placement strategy and provide help for customer personalized recommendation. On the basis of the existing research of commodity image classification, this paper draws lessons from the general image classification algorithm, combines the characteristics of the commodity image object placement in the middle, the shooting background is simple, the detail information is rich, etc. An improved content-based classification algorithm for commodity images is proposed. The main work is as follows: 1) according to the characteristics of commodity images, the image can be adaptively divided into foreground regions with a large amount of commodity information and blank background areas without useful information. In the foreground region, the region with obvious local features is divided into feature regions, the sparse sift features are extracted, and the regions with gentle grayscale changes are divided into auxiliary feature areas to extract dense sift features. Feature points are not extracted in the blank background area. The description vectors of each region are obtained by SCSPM and connected according to the fusion function and used as the final description feature of the image. Input these features into support vector machines for classification, Experimental results show that using adaptive fusion SCSPM features can obtain better results than traditional methods in commodity image classification. However, it can not reflect the classification ability of visual words in a particular position. In this paper, an entropy based SPM spatial weighting method is proposed. Different visual words may have different probability of appearing in different categories. According to the information theory, the concept of entropy can be used to describe the classification ability of different words. Therefore, it is difficult to improve the classification accuracy of single SVM classifier for some commodity image sets by incorporating the classification information of visual words in this region when calculating the weight of words. In this paper, a multi-classifier joint decision method, AdaBoost SVM, using weak SVM as the component classifier of AdaBoost algorithm is proposed. In turn, a group of component classifiers are trained, and each training sample is given a weight in each round, indicating that each component classifier pays attention to it. By adjusting the weights, the classifier is focused on the sample points that are more easily misclassified to obtain better classification performance. In this paper, we use MATLAB to simulate the process of commodity image classification. The experimental results show that the method can effectively classify the commodity image, and the average classification accuracy on the experimental image set is 87%.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.41

【参考文献】