Date Published: April 23, 2019
Publisher: Public Library of Science
Author(s): Thorben Funke, Till Becker, Tiago P. Peixoto.
Finding communities in complex networks is a challenging task and one promising approach is the Stochastic Block Model (SBM). But the influences from various fields led to a diversity of variants and inference methods. Therefore, a comparison of the existing techniques and an independent analysis of their capabilities and weaknesses is needed. As a first step, we review the development of different SBM variants such as the degree-corrected SBM of Karrer and Newman or Peixoto’s hierarchical SBM. Beside stating all these variants in a uniform notation, we show the reasons for their development. Knowing the variants, we discuss a variety of approaches to infer the optimal partition like the Metropolis-Hastings algorithm. We perform our analysis based on our extension of the Girvan-Newman test and the Lancichinetti-Fortunato-Radicchi benchmark as well as a selection of some real world networks. Using these results, we give some guidance to the challenging task of selecting an inference method and SBM variant. In addition, we give a simple heuristic to determine the number of steps for the Metropolis-Hastings algorithms that lack a usual stop criterion. With our comparison, we hope to guide researches in the field of SBM and highlight the problem of existing techniques to focus future research. Finally, by making our code freely available, we want to promote a faster development, integration and exchange of new ideas.
The approach of modeling systems as complex networks has spread into various disciplines from sociology over physics to engineering. The most simple form of a complex network consists of nodes and edges, where nodes represent the viewed elements and edges the relationships between them. Based on this model, various analyses such as the determination of node centralities or the robustness of the complete system can be performed easily.
At first we want to take a look at the different variants of SBM in order to cover the major approaches used for application. We divide the variants into the well-established classic models and more recent extensions. We explicitly omit approaches with a focus in the information theoretic field.
Knowing the SBM variants and the corresponding likelihood functions, we need methods to approximate an optimal community structure of the selected model and in most cases the authors present a combination of a new SBM variant with a corresponding inference algorithm. However, many inference algorithms do not depend on the specific variant. Therefore, we clearly separate these two parts of the community detection problem.
For a systematic test of the presented variants and inference methods we need a suitable set of networks with a known community structure of different strength. The Girvan-Newman test and the LFR test define such sets consisting of random networks generated based on a known node partition and predefined mixing parameters [10, 11]. These tests allow us on the one hand, to compare the performance of SBM variants and inference methods and on the other hand, the comparison to other clustering techniques. To demonstrate applicability and suitability in real cases, we conclude this section with a selection of real networks.
We presented the different approaches to develop a Stochastic Block Model, which are influenced by various disciplines. Moreover, we showed how this diversity led to plurality of formulation and variants of the basic idea of SBM. With our comparison of the complete process from the rationale behind the SBM variants, over different approaches to deal with the model selection, to inference algorithms for the maximization of the resulting objective function, we highlighted the advantages and disadvantages of the existing solutions.