Research Article: On the improvement of reinforcement active learning with the involvement of cross entropy to address one-shot learning problem

Date Published: June 19, 2019

Publisher: Public Library of Science

Author(s): Honglan Huang, Jincai Huang, Yanghe Feng, Jiarui Zhang, Zhong Liu, Qi Wang, Li Chen, Maciej Huk.


As a promising research direction in recent decades, active learning allows an oracle to assign labels to typical examples for performance improvement in learning systems. Existing works mainly focus on designing criteria for screening examples of high value to be labeled in a handcrafted manner. Instead of manually developing strategies of querying the user to access labels for the desired examples, we utilized the reinforcement learning algorithm parameterized with the neural network to automatically explore query strategies in active learning when addressing stream-based one-shot classification problems. With the involvement of cross-entropy in the loss function of Q-learning, an efficient policy to decide when and where to predict or query an instance is learned through the developed framework. Compared with a former influential work, the advantages of our method are demonstrated experimentally with two image classification tasks, and it exhibited better performance, quick convergence, relatively good stability and fewer requests for labels.

Partial Text

In recent decades, machine learning has attracted increasing attention from both industry and academia and shown its great power in universal applications, such as pattern analysis [1], knowledge discovery and discipline prediction. As acknowledged in this domain, data resources are crucial in learning tasks. A direct strategy to process data and incorporate human experience is to formulate labels for examples. In small-scale datasets, precise annotation based on expert knowledge is acceptable. However, when large-scale datasets are used for complicated tasks, complete and perfect annotations are no longer viable, due to the reality that labeling process for these datasets is labor-intensive, costly in terms of time and money, and dependent on domain experience. With the increase of dataset volume, the learning system tends to generalize better, but the cost of annotation dramatically increases [2]. Meanwhile, former studies have revealed that obtaining the ground truth label of a dataset not only requires the participation of a large number of experts in the field, but also takes more than 10 times longer to label the instance as to collect it [3]. In contrast, accessing a massive number of unlabeled instances is relatively easy. The availability of a massive number of unlabeled examples as well as the potential task-beneficial information buried in them has led to enlightenment through some effective paradigms employed in the learning domain, including semi-supervised learning and active learning. The goals of these emerging paradigms are to take advantage of the unlabeled datasets for performance promotion and to reduce workloads of human experts. Semi-supervised learning has developed quickly in recent years, exploiting statistical or geometrical information in unlabeled examples to enhance the generalization. Notably, however, the involvement of the unlabeled examples in a semi-supervised framework may be inappropriate and degrade the original accuracy in certain scenarios. Another powerful learning paradigm-active learning is significantly distinct from semi-supervised learning in theory and practice. The difference is that the active learning algorithm simulates the human learning process to some extent: selects part of instances to label and join the training set, and iteratively improves the generalization performance of the classifier. Therefore, this algorithm has been widely used in information retrieval [4], image and speech recognition [5–11], and text analysis [12–14] in recent years.

The setting of active learning is mainly based on three scenarios: (i) membership query synthesis, (ii) pool-based sampling, and (iii) stream-based selective sampling [29]. In the membership query synthesis scenario, the learner can select a new instance to label from the input space, or it can generate a new instance. In the pool-based scenario, the learner can request labels for any instance from a large amount of historical data. Finally, in the stream-based active learning scenario, instances can be continually obtained from the data stream and presented in an exogenously-determined order. The learner must instantly decide whether to request a label for the new instance [30]. Various practical scenarios have benefited from the idea of active learning, including movie recommendation [31–33], medical image classification [34], natural language processing.

In this section, we present a novel model based on the reinforcement one-shot active learning (ROAL) framework, which can monitor a stream of instances and select an appropriate action (classify or query the label) for each arrival instance. Our model metalearns a query strategy, which intelligently captures the time and population of instances that are worth to query. In present study, a long short-term memory (LSTM), which is connected to a linear output layer, is used to approximate the action-value function.

We examined our proposed ROAL model under an AOL set-up for two image classification tasks and compared the experimental results of present study with the results from previous study. Our goal is to further study the following points through experiments: 1) whether the model we proposed can learn a practical strategy that knows how to label instances and when to instead request a label, and 2) whether the model effectively uses its uncertainty of instances to make decisions.

We introduced a model that learns active learning via reinforcement learning. We evaluated the model on one-shot learning tasks. The results show that our model can transform from an engineering heuristic selection of samples to learning strategies from data. Compared to previous works [23], we substantially accelerated the convergence speed, avoided the gradient vanishing problem, improved the stability, reduced the number of request labels, and improved the accuracy of the model. The proposed model may be a good solution to practical problems such as movie recommendation [50] and network traffic analysis [20] due to its ability to learn and generalize new concepts in a short time.