Research Article: Self-supervised sparse coding scheme for image classification based on low rank representation

Date Published: June 20, 2018

Publisher: Public Library of Science

Author(s): Ao Li, Deyun Chen, Zhiqiang Wu, Guanglu Sun, Kezheng Lin, Quanquan Gu.


Recently, sparse representation, which relies on the underlying assumption that samples can be sparsely represented by their labeled neighbors, has been applied with great success to image classification problems. Through sparse representation-based classification (SRC), the label can be assigned with minimum residual between the sample and its synthetic version with class-specific coding, which means that the coding scheme is the most significant factor for classification accuracy. However, conventional SRC-based coding schemes ignore dependency among the samples, which leads to an undesired result that similar samples may be coded into different categories due to quantization sensitivity. To address this problem, in this paper, a novel approach based on self-supervised sparse representation is proposed for image classification. In the proposed approach, the manifold structure of samples is firstly exploited with low rank representation. Next, the low-rank representation matrix is used to characterize the similarity of samples in order to establish a self-supervised sparse coding model, which aims to preserve the local structure of codings for similar samples. Finally, a numerical algorithm utilizing the alternating direction method of multipliers (ADMM) is developed to obtain the approximate solution. Experiments on several publicly available datasets validate the effectiveness and efficiency of our proposed approach compared with existing state-of-the-art methods.

Partial Text

Sparse representation has attracted great interest recently due to its powerful ability to model images, where it is assumed that an image can be represented by a linear combination of a few atoms of a basis set called a dictionary. It has achieved impressive performance on many computer vision tasks, such as image restoration, compressive sensing, tracking and classification [1–5]. In this paper, we focus on the sparse image classification problem. Many studies have been performed on sparse-based image classification in recent years. J. Wright [6] et al originally proposed the general sparse representation-based classification framework for face recognition. In their method, the sparse representation of a test sample can be computed by taking the training samples as a dictionary, and recognition is viewed as classifying among the multiple linear regression models by the representation errors. The promising results show the robustness of their method to occlusion and disguise. L. Zhang et al pointed out that the good performance of SRC profited not only from sparsity but from the collaboration among samples as well, and proposed the collaborative representation classification (CRC) method to obtain more accurate recognition results [7]. Moreover, they also suggested that sufficient samples were another essential factor affecting the recognition rate in an SRC-based framework. To reveal the intrinsic classification mechanism, a probabilistic collaborative representation method (ProCR) was proposed by Cai et al [8], in which the probability that a test sample belongs to the collaborative subspace of all classes was defined. Consequently, a ProCR-based classifier was designed to achieve excellent performance. Based on the idea of bag-of-features, [9] proposed an effective locality-constraint linear coding (LLC) scheme. Unlike the SRC, LLC projected the descriptor into its local-coordinate systems and the max pooling was performed on the projected codes to generate the final representation. To mitigate the performance degradation on datasets consisting of images with various camera orientations, a discriminative sparse coding approach was proposed to extract an affine-invariant feature and a classifier using AdaBoost was developed by taking affine sparse codes as the input. Fang et al [10] proposed a model to learn a non-negative sparse graph (NNSG), by which the classification is realized with an iterative supervised learning model to propagate the label information. To further improve representation ability, several extensive works on dictionary learning are studied for pattern classification. In [11], the discriminative KSVD (DKSVD) algorithm was first presented to unify the dictionary and classifier learning into the same framework. Next, a label consistent KSVD (LCKSVD) method was proposed by Z. L. Jiang to realize more discriminative sparse coding [12]. In LCKSVD, the label information of atoms was considered to enforce discriminability during dictionary learning. Similar to the DKSVD, classifier learning was also combined with the reconstruction error of dictionary learning to form a unified objective function. In [13], fisher discriminative dictionary learning was proposed for image classification in which, with the fisher criterion, not only was the representation residual used to distinguish the difference among classes, but also both of the scatters within-class and between-class are optimal. However, computational complexity and insufficient samples are two main drawbacks of the aforementioned dictionary learning algorithms. From another view, classification can be seen as a task to separate the samples lie in different linear subspaces. Therefore, the model that can capture the subspace structure among the samples is believed to be very helpful in pattern classification. Recently, the low rank representation (LRR) has attracted more and more interest with applications to image classification tasks [14]. It is noted that the LRR exhibits a remarkable ability in exploring the global manifold structure of data, which is a useful technique to analyze data drawn from multiple subspaces. C. Chen [15] proposed a low rank-based decomposition model with structural incoherence. It enforced the incoherence among the low rank matrix of different classes with an extra regularization, which helps to remove the noise from the contaminated data and provides additional discrimination for classification. Zhang et al [16] established a model with joint LRR and sparse representation, in which both the sparsity and low rank spatial consistency are exploited simultaneously. The preserved local structural information in the coding vector is more helpful for classification. Next, a structured LRR (SLRR) was further learned in a supervised way by [17]. In [17], an ideal LRR matrix was constructed to guide dictionary learning. Then, a simple linear classifier on the low rank representation matrix under the learned dictionary can also obtain promising results. Based on the viewpoint of supervised learning, a low rank and sparse representation (LRSR) model was studied by Zhuang et al [18], where the global mixture of subspace and local liner structure were both captured to construct a non-negative graph embedded in the semi-supervised classification.

To handle the proposed model in Eq (19), we reformulate it as
where tr(•) is the sum of diagonal elements of matrix, and S = L−W, L is a diagonal matrix with Lii=∑jWij. Then, we introduce a slack variable to make the objective function separable as

In our experiments, we evaluate the proposed algorithm on five publicly available datasets, including three face datasets: Extended YaleB, AR, ORL, one object dataset: COIL-20, and a handwritten dataset: USPS. The comparison classification methods include SRC in [6], LLC in [9], LLE+SVM in [23], ProCR in [8], and NNSG in [10]. Considering the computational cost, two dimensional reduction methods are used in our experiment. “Eigenface”, proposed in [24], is used for the face dataset, and PCA is implemented for the others. In our experiments, 30, 40 and 50 percent of the dataset are taken randomly as training samples respectively, and the remaining ones are used as test samples. Also, the test experiment is implemented five times for each method, and the average accuracy is reported as the final classification result, which is shown in Figs 1–5. The detailed description for dataset and setting is listed as follows.

To address the image classification problem, in this paper we propose a discriminative sparse coding approach, in which a representation matrix is first computed by joint low-rankness and sparse representation model to preserve the latent manifold structure and locality of test samples. Next, by incorporating the representation matrix, a self-supervised sparse coding model is established to improve the classification performance. Through this coding scheme, the mutual dependency between similar samples can be better explored and propagated, which will generate self-supervised mechanism to enforce the close coding for similar samples. Meanwhile, a more suitable reconstruction error is designed as the classification criterion. Moreover, we also provides an iterative numerical algorithm to solve the novel objective function in the proposed model based on ADMM. Several experiments on five public datasets clearly show that our proposed method outperforms existing state-of-the-art classification methods.




0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments