Research Article: A Large Scale Analysis of Information-Theoretic Network Complexity Measures Using Chemical Structures

Date Published: December 15, 2009

Publisher: Public Library of Science

Author(s): Matthias Dehmer, Nicola Barbarini, Kurt Varmuza, Armin Graber, Christopher L. Douglas.

Abstract: This paper aims to investigate information-theoretic network complexity measures which have already been intensely used in mathematical- and medicinal chemistry including drug design. Numerous such measures have been developed so far but many of them lack a meaningful interpretation, e.g., we want to examine which kind of structural information they detect. Therefore, our main contribution is to shed light on the relatedness between some selected information measures for graphs by performing a large scale analysis using chemical networks. Starting from several sets containing real and synthetic chemical structures represented by graphs, we study the relatedness between a classical (partition-based) complexity measure called the topological information content of a graph and some others inferred by a different paradigm leading to partition-independent measures. Moreover, we evaluate the uniqueness of network complexity measures numerically. Generally, a high uniqueness is an important and desirable property when designing novel topological descriptors having the potential to be applied to large chemical databases.

Partial Text: The problem to quantify the complexity of a network appears in various scientific disciplines [1]–[7] and has been a challenging research topic of ongoing interest for several decades [8]. This problem first appeared when studying the complexity of biological and chemical systems, e.g., battery cells or living systems [9]–[12] using information-theoretic measures [13] (in this paper, we use the words “measure”, “index”, “descriptor” synonymously when referring to topological graph complexity measures). Directly afterwards, the idea of applying entropy measures to network-based systems finally emerged as a new branch in mathematical complexity science. An important problem within this area deals with determining the so-called structural information content [8], [12], [14]–[19] of a network. Finally, it turned out that the developed information indices for measuring the information content of a graph have been of substantial impact when solving QSPR (Quantitative structure-property relationship)/QSAR (Quantitative structure-activity relationship) problems in mathematical chemistry and drug design [1], [2], [20]–[25]. Correspondingly, such measures have been widely used to predict biological activities as well as toxicological and physico-chemical properties of molecules using chemical datasets, see, e.g., [1], [20], [23]–[26]. More precisely, most powerful and generally applicable for theses approaches are empirical multivariate models , with being a chemical or a physical property (P) or a biological activity (A), and vector consisting of a series of numerical molecular descriptors describing the molecular structure. For modeling biological activities also (measured or computed) physical properties are used. Some of the already mentioned information-theoretic complexity measures which are well-established in mathematical chemistry will be defined in the next section.

This section aims to present the information-theoretic topological descriptors we want to investigate in this paper. In the following, we briefly shed light on the two main procedures (resulting in partition-based and partition-independent measures) to infer information-theoretic complexity measures for characterizing chemical network structures. Afterwards, we express their concrete definitions for performing our numerical analysis.

In this section, we will apply the complexity measures presented in the previous section. As stated before, we mainly put the emphasis on exploring the relatedness between the topological information content and our graph entropy measures and . Moreover, we numerically calculate further information-theoretic network measures presented in the last section and interpret the results. In particular, an interesting question will be to investigate the so-called uniqueness of the measures when applying them to both databases containing real and synthetic chemical graphs.



0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments