Date Published: June 13, 2018
Publisher: Public Library of Science
Author(s): C. K. Sruthi, Meher Prakash, Alexandre G. de Brevern.
Amino acid mutations in proteins are random and those mutations which are beneficial or neutral survive during the course of evolution. Conservation or co-evolution analyses are performed on the multiple sequence alignment of homologous proteins to understand how important different amino acids or groups of them are. However, these traditional analyses do not explore the directed influence of amino acid mutations, such as compensatory effects. In this work we develop a method to capture the directed evolutionary impact of one amino acid on all other amino acids, and provide a visual network representation for it. The method developed for these directed networks of inter- and intra-protein evolutionary interactions can also be used for noting the differences in amino acid evolution between the control and experimental groups. The analysis is illustrated with a few examples, where the method identifies several directed interactions of functionally critical amino acids. The impact of an amino acid is quantified as the number of amino acids that are influenced as a consequence of its mutation, and it is intended to summarize the compensatory mutations in large evolutionary sequence data sets as well as to rationally identify targets for mutagenesis when their functional significance can not be assessed using structure or conservation.
Anfinsen’s dogma of molecular biology postulates that the native structure and function of proteins are uniquely determined by its amino acid sequence.  As such there is a lot of fundamental interest in analysing the sequences of proteins. For example, sequence data of protein from multiple species helps in understanding evolutionary patterns and that from a cohort helps with the drug resistance patterns. Multiple Sequence Alignment (MSA) of protein sequences obtained from across species or a cohort is usually the starting point for many such analyses. The simplest analysis one can perform using MSA is evaluating the level of conservation of the individual amino acids. A highly conserved amino acid is likely to have an important role either in structure or in function, and it is especially true for the perfectly conserved amino acids that are mostly identified in the functional sites of proteins. Based on similarity and homology of sequences curated from different species, protein sequences are classified into families which are likely to share structural and functional similarities. The interest in the functional information contained in the sequence analysis is only enhanced by the next generation sequencing technology  which is making sequence data easily accessible compared to the structural data.
Sequence selection and alignment: All the sequence data other than for HIV-1 was obtained from Pfam database.  We used the full alignment provided by Pfam. For HIV-1 proteins, the data was obtained from Los Alamos server (https://www.hiv.lanl.gov/). Both the databases give aligned sequences. So separate sequence alignment was not performed. But the alignment was truncated to the reference protein sequence and all sequences having more than 20% gaps were eliminated from the alignment.
We introduced a way to measure and visualise the directed influence of amino acids on one another. The directed influence network summarizes the compensatory mutations under functional constraints in response to changes of key amino acids in homologous sequences. We demonstrate the utility of the method using evolutionary sequences from a few proteins. The principal results seem to be unaffected by changes in parameters and identify effects from compensation to distal mutations, as well as the binding pocket and catalytic residues. The simple and intuitive definition of the directional impact of amino acid interactions can bring a new perspective to the field that had so far relied on symmetric co-evolution.