Research Article: Sequence Complexity of Chromosome 3 in Caenorhabditis elegans

Date Published: July 20, 2012

Publisher: Hindawi Publishing Corporation

Author(s): Gaetano Pierro.


The nucleotide sequences complexity in chromosome 3 of Caenorhabditis elegans (C. elegans) is studied. The complexity of these sequences is compared with some random sequences. Moreover, by using some parameters related to complexity such as fractal dimension and frequency, indicator matrix is given a first classification of sequences of C. elegans. In particular, the sequences with highest and lowest fractal value are singled out. It is shown that the intrinsic nature of the low fractal dimension sequences has many common features with the random sequences.

Partial Text

The Caenorhabditis elegans (C. elegans) is a 1 mm length transparent nematode. Thanks to its simple organic structure, it was taken as a model for research into genetic field. Early studies on C. elegans began in 1962 with some works on cell lineage and apoptosis [1, 2]. There are 2 distinct sexual types of the C. elegans, the hermaphrodite and the male. The second one is very rarely represented in nature (being approximately only the 0.05% of the population). We have 959 cells in the hermaphroditic species and 1031 cells for the male. The sexual difference at the chromosomal level provides: XX chromosomes for hermafrodite and X0 for the male. The sexual reproduction of C. elegans is realized by 2 distinct pathways: mating or, in case of the hermaphrodite, by a self-fertilization. The life cycle of C. elegans consists of 4 larval stages (from L1 to L4); however, if there exists some hard environment conditions, such as lacking of food, the C. elegans remains in the L3 larval stage, until the conditions improve.

In the chromosome 3 of C. elegans, there have been singled out 2780 genes [19]. Some of them are very short, less than about 50 nucleotides, thus being useless for any statistical analysis, and some of them are still under investigation, so that some nucleotides are not yet properly identified. For this reason, there have been selected only some sequences with significant length, the shortest being about 100 nucleotides. In particular, we investigated 100 genes (whole sequence), 85 repeats sequences, 71 noncoding sequences (introns), and 100 coding sequences (exons lacks of UTR). In order to make a comparison with random sequences, 100 random sequences of 100 nucleotides have been generated. In this work, all sequences were downloaded from the National Center for Biotechnology Information [19]. A simple formula to estimate the fractal dimension has been given in [20, 21] and based on the correlation matrix, as follows. The fractal dimension is defined as the average of the number p(n) of 1 in the randomly taken n × n minors of the N × N correlation matrix uhk (see also [20–24]).

By using formula (7), for each sequence of nucleotides, the corresponding fractal dimension has been computed, and obtained results are shown in Tables 1 and 2. In particular, the sequences with max/min values of fractal dimension among the whole sequences, coding/noncoding sequences, repeat sequences, random sequences have been singled out.

In this work, by means of statistical parameters such as indicator matrix, complexity, frequency, and fractal dimension, the different types of sequences (repeats, coding, noncoding, whole gene, random) of chromosome 3 (the one with the highest fractality) of the C. elegans have been analyzed. Our attempt was to give a statistical classification of these sequences and to understand the complexity of the sequences as a function of the nucleotides’ distribution. By using (7) the values of the fractal dimension for all sequences are obtained. In detail, it was observed that the repeats sequences (which do not code for proteins) have a higher variability of values, since they assume the minimum and maximum on all sequences in the C. elegans. This leads us to analyze the role and the functional meaning of the repeats within the sequences of genes. Thereafter, we have verified the equivalence, with respect to the complexity, between the fractal dimension and complexity, since the sequences with highest fractality appear to have also a greater degree of complexity. Through the frequency distribution of nucleotide, it was noticed that the adenine is more present in sequences having a lower fractal dimension and, in particular, for the one being in absolute the lowest fractal (AT RICH). This result seems to be dependent on the fact that the sequence is made up of only 2 nucleotides, that is, adenine and thymine. Cytosine, instead, appears to be the most frequent nucleotide in the sequence with the highest fractal value and in particular for the sequence CER 16-2-i-CE. These results lead us to conjecture that there is a correlation between fractal dimension and the frequency of nucleotides such as adenine and cytosine. The information contents of a sequence of nucleotides depend on the different distribution of nucleotides, so that two sequences having the same nucleotides which are distributed according to two different permutations might have two different complexities (fractal dimension). In future work, this aspect of the different organization within the sequence will be further analyzed. Moreover, these results must be confirmed in other organisms which are evolutionarily distant from each other to better investigate the findings so far. At the moment, the obtained results were compared with some random sequences, which have a nucleotide random distribution, and in that case, we have obtained a significant correspondence with the complexity of the nucleotide sequences.