Research Article: TECLA: A temperament and psychological type prediction framework from Twitter data

Date Published: March 12, 2019

Publisher: Public Library of Science

Author(s): Ana Carolina E. S. Lima, Leandro Nunes de Castro, King-wa Fu.


Temperament and Psychological Types can be defined as innate psychological characteristics associated with how we relate with the world, and often influence our study and career choices. Furthermore, understanding these features help us manage conflicts, develop leadership, improve teaching and many other skills. Assigning temperament and psychological types is usually made by filling specific questionnaires. However, it is possible to identify temperamental characteristics from a linguistic and behavioral analysis of social media data from a user. Thus, machine-learning algorithms can be used to learn from a user’s social media data and infer his/her behavioral type. This paper initially provides a brief historical review of theories on temperament and then brings a survey of research aimed at predicting temperament and psychological types from social media data. It follows with the proposal of a framework to predict temperament and psychological types from a linguistic and behavioral analysis of Twitter data. The proposed framework infers temperament types following the David Keirsey’s model, and psychological types based on the MBTI model. Various data modelling and classifiers are used. The results showed that Random Forests with the LIWC technique can predict with 96.46% of accuracy the Artisan temperament, 92.19% the Guardian temperament, 78.68% the Idealist, and 83.82% the Rational temperament. The MBTI results also showed that Random Forests achieved a better performance with an accuracy of 82.05% for the E/I pair, 88.38% for the S/N pair, 80.57% for the T/F pair, and 78.26% for the J/P pair.

Partial Text

The study of psychological types or temperament lead us to the understanding of how a person relates with the world, either by the choices he makes or the way he absorbs information. For a long time, this theme has been researched and associated with well-being, lifestyle, employment, leadership, study, etc. One way of knowing a person’s psychological type is by submitting him to questionnaires about his habits and choices, for example the MBTI (Myers-Briggs Type Indicator), which returns the psychological type of a person and is based on the studies of Jung and Myers-Briggs, and the Keirsey Temperament Sorter (KTS), which returns a profile associated with the temperament taxonomy created by David Keirsey.

Temperament characterizes a set of mental tendencies related to the way someone perceives, analyzes and makes daily decisions [23]. It represents the uniqueness and intensity of psychic affects and the dominant structure of mood and motivation in each individual. It is a form of reaction and sensitivity of a person to the world, which is revealed by his/her attitudes and behaviors, thus composing his/her organic basis [24]. This set of trends is innate, that is, it appears from birth, and is closely linked to biological or physiological determinants, which therefore change relatively little with development [25]. It can change and weakens throughout life, but it is never eliminated [24]. In the present research, temperament is defined as a set of innate and hereditary tendencies, responsible for how one perceives and interacts with the world.

Understanding social media users involves the analysis of their behaviors and interactions in social media, like their followers, mentions, messages, friends, photos, videos and comments. Understanding the users means being able to quantify and qualify how they present themselves [34]. The automatic recognition of temperament by means of computational techniques can help many business sectors and social researchers in understanding social media users. To date, there are only a few works related to the automatic temperament/psychological types classification in the literature, that is, Keirsey and MBTI labels. The main reason for the scarcity of works in this area is the difficulty in finding data for training classifiers. This section provides a review of the specific works found in the literature related with these two topics. Although there are many other works addressing the prediction of user characteristics from social media data, these are out of the scope of the present paper.

The Temperament Classification Framework (TECLA) was developed as an outcome of the use of text mining and natural language processing techniques to classify the temperament or psychological type of social media users. The goal is to provide a modular structure that allows us to use and evaluate different techniques quickly and intuitively. Furthermore, it follows the main steps of KDD (Knowledge Discovery in Databases) [41]. Hence, the TECLA has the following modules: data acquisition module; message preprocessing module; temperament classification module; and evaluation module. Each one of them will be detailed in the following.

The goal of this study is to design a temperament predictor that can infer the temperament of a certain individual (social media user) based on what he writes in the social media, instead of applying him a specific temperament test. This is a very interesting and promising approach, because it allows one to know someone’s temperament in spontaneous situations. To assess the performance of TECLA we used a recent, public dataset with over one million tweets.

The purpose of this paper was threefold: to provide a brief historical review on temperament theories; to make a brief survey of machine-learning research on temperament and psychological type prediction; and to investigate the temperament and psychological types prediction based on data produced by social media users. In this latter contribution, the hypothesis this work tries to validate is if it is possible to predict the virtual persona temperament without using a questionnaire, that is, to use artificial intelligence techniques to understand and classify the profile of users based on what they share and how they behave in social media. The importance of this tool lies in trying to lessen a possible bias provided by questionnaires, when a user knows he is being explicitly evaluated.




Leave a Reply

Your email address will not be published.