Research Article: Emotion classification using a CNN_LSTM-based model for smooth emotional synchronization of the humanoid robot REN-XIN

Date Published: May 2, 2019

Publisher: Public Library of Science

Author(s): Ning Liu, Fuji Ren, Catalin Buiu.


In this paper, we propose an Emotional Trigger System to impart an automatic emotion expression ability within the humanoid robot REN-XIN, in which the Emotional Trigger is an emotion classification model trained from our proposed Word Mover’s Distance(WMD) based algorithm. Due to the long time delay of the WMD-based Emotional Trigger System, we propose an enhanced Emotional Trigger System to enable a smooth interaction with the robot in which the Emotional Trigger is replaced by a conventional convolution neural network and a long short term memory network (CNN_LSTM)-based deep neural network. In our experiments, the CNN_LSTM based model only need 10 milliseconds or less to finish the classification without a decrease in accuracy, while the WMD-based model needed approximately 6-8 seconds to give a result. In this paper, the experiments are conducted based on the same sub-data sets of the Chinese emotional corpus(Ren_CECps) used in former WMD experiments: one comprises 50% data for training and 50% for testing(1v1 experiment), and the other comprises 80% data for training and 20% for testing(4v1 experiment). The experiments are conducted using WMD, CNN_LSTM, CNN and LSTM. The results show that CNN_LSTM obtains the best F1 score (0.35) in the 1v1 experiment and almost the same accuracy of F1 scores (0.366 vs 0.367) achieved by WMD in the 4v1 experiment. Finally, we present demonstration videos with the same scenario to show the performance of robot control driven by CNN_LSTM-based Emotional Trigger System and WMD-based Emotional Trigger System. To improve the comparison, total manual-control performance is also recorded.

Partial Text

Approximately 2100 years ago, King Mu of Chou made a tour of inspection in the west, on his return journey, a man named Yen Shih presented a handiwork which could sing, act and made the King think it was a real man in astonishment [1, 2]. Since that time, from East to West, making an automation that can mimic human activities has attracted great interest of talents such as the scholars of Alexandria, Leonardo da Vinci, Nikola Tesla and many modern scientists.

Since robots were created, making them more engaged has become one of the predominant research fields. Marian et al. [5] deployed a real time facial expression system in the Aibo robot and the RoboVie robot to enhance user enjoyment. Diego et al.developed a framework to recognize human emotions through facial expressions for NAO [6]. To enable real-time facial expressions on humanoid robot REN-XIN, a forward kinematics model was proposed [7]. Since building a whole-length robot is expensive, some groups try to verify their facial expression system on the head only robot. K Berms and J Hirth [8] utilized 6 basic facial expressions for humanoid robot head ROMAN, which was a behavior-based control system. Hashimoto et al. developed a face robot for rich facial expression; this face robot has 18 control points and can easily imitate six typical facial expressions [9]. A quick application of facial expressions with head-neck coordination is employed on robot SHFR -III [10]. Another way to create expression emotions for a robot partner (in this example, iPhonoid-B) is to combine the facial and gestural expressions together, as implemented with a smart phone and servos [11]. However, all of the robots are controlled manually depending on limited scenarios.

In this part, we introduce Emotional Trigger System, which is designed for emotion-enhanced interaction for Actroid REN-XIN. First, the robot platform REN-XIN and the corpus named Ren_CECps used in this paper are presented in the following subsection.

The experiments proceed in three parts: The first is to verify the response times of Emotional Triggers based on WMD, CNN_LSTM, CNN and LSTM; The second is to apply the classification performance tests with standard micro-F1 and macro-F1 scores based on Ren_CECps; And the third part is a real time demonstration, in which we choose WMD and the best model among the three networks to build two Emotional Trigger Systems. The emotion expressions of humanoid robot REN-XIN based on the two constructed Emotional Trigger Systems are characterized, and one manually encoded emotion expression in the same scenario is presented as a comparison experiment.

The proposed CNN_LSTM network successfully solves the long time delay with trainable accuracy; however there still are some key points to be discussed to characterize the experience for others or for related field research work.

According to the comparison experiments and discussion, we can make the following conclusions:




Leave a Reply

Your email address will not be published.