Date Published: January 20, 2017
Publisher: Public Library of Science
Author(s): Sho Tsugawa, Kosuke Kito, Alain Barrat.
Link prediction is the problem of detecting missing links or predicting future link formation in a network. Application of link prediction to social media, such as Twitter and Facebook, is useful both for developing novel services and for sociological analyses. While most existing research on link prediction uses only the social network topology for the prediction, in social media, records of user activities such as posting, replying, and reposting are available. These records are expected to reflect user interest, and so incorporating them should improve link prediction. However, research into link prediction using the records of user activities is still in its infancy, and the effectiveness of such records for link prediction has not been fully explored. In this study, we focus in particular on records of reposting as a promising source that could be useful for link prediction, and investigate their effectiveness for link prediction on the popular social media platform Twitter. Our results show that (1) the prediction accuracy of techniques using reposting records is higher than that of popular topology-based techniques such as common neighbors and resource allocation for actively retweeting users, (2) the accuracy of link prediction techniques that use network topology alone can be improved by incorporating reposting records.
Link prediction is a fundamental problem in social network research, and has been actively studied [1–10]. Typically, link prediction is the problem of detecting missing links or predicting future link formation in a network by utilizing a given network topology . In the literature, several link prediction techniques have been proposed, and these techniques have been applied to several types of social networks [2, 5, 6, 9–12]. Link prediction techniques have a broad range of application domains, and are expected to be utilized for recommendation , anomaly detection , network modeling , missing link detection , evaluation of network evolution mechanisms , reconstruction of networks , and classification of partially labeled networks [17, 18].
In the literature, several link prediction techniques have been proposed. Many researchers have used an unsupervised approach for link prediction [1, 2, 7, 10–12, 25]. Unsupervised link prediction techniques estimate the likelihood of link formation (i.e., link prediction score) between two nodes by using knowledge about the characteristics of real networks. For instance, one of the most popular link prediction techniques, the common neighbors method (CN), estimates the likelihood of link formation based on the idea that the existence of many common adjacent nodes between two nodes implies a high probability of link formation between them . Existing techniques aim to predict link formation or to detect missing links from only the topological structure of social networks [1, 2, 7, 10–12, 25]. In contrast, we focus on social networks in social media systems, and examine the effectiveness of examining records of user activity for link prediction.
The results of this study indicate that in social media, link prediction based on retweet history is more effective than conventional prediction based on network topology alone for actively retweeting users. Furthermore, RTP prediction based on retweeted posts was more effective than RTV prediction based on retweet views. This suggests that the active behavior of posting a retweet indicates stronger user interest than does the passive behavior of viewing a retweet. Previous research on link prediction based on retweet history has been based on retweet views. The main contribution of this paper is that it shows the effectiveness of link prediction based on retweet posts.
In this study, we investigated the effectiveness of user records of retweets for link prediction in the popular social media platform Twitter. Through extensive experiments, we found that using the records of retweets is an effective approach for link prediction on Twitter. Our experimental results showed that a link prediction technique based on retweet posts achieves better prediction accuracy than do popular topology-based techniques (specifically, CN and RA) or techniques based on retweet views for actively retweeting users. Our results also showed that the accuracy of link prediction can be increased by combining retweet records and network topology.