Date Published: February 22, 2018
Publisher: Public Library of Science
Author(s): Tom Brown, Nick Brown, Elliott J. Stollar, Manuela Helmer-Citterich.
A need exists to develop bioinformatics for predicting differences in protein function, especially for members of a domain family who share a common fold, yet are found in a diverse array of proteins. Many domain families have been conserved over large evolutionary spans and representative genomic data during these periods are now available. This allows a simple method for grouping domain sequences to reveal common and unique/specific binding residues. As such, we hypothesize that sequence alignment analysis of the yeast SH3 domain family across ancestral species in the fungal kingdom can determine whether each member encodes specific information to bind unique peptide targets. With this approach, we identify important specific residues for a given domain as those that show little conservation within an alignment of yeast domain family members (paralogs) but are conserved in an alignment of its direct relatives (orthologs). We find most of the yeast SH3 domain family members have maintained unique amino acid conservation patterns that suggest they bind peptide targets with high intrinsic specificity through varying degrees of non-canonical recognition. For a minority of domains, we predict a less diverse binding surface, likely requiring additional factors to bind targets specifically. We observe that our predictions are consistent with high throughput binding data, which suggests our approach can probe intrinsic binding specificity in any other interaction domain family that is maintained during evolution.
Signals are transmitted through cellular pathways via relays of protein-protein interactions resulting in specific outputs, such as cell growth, differentiation, or apoptosis. To achieve the correct responses from signaling pathways, the protein-protein interactions involved must be specific, and not potentiate inappropriate activation of off-target pathways. This requisite precision can be readily achieved by proteins that possess high “intrinsic specificity”, directly binding their intended targets much more tightly than any other protein. For protein-DNA interactions, this can involve differences of three orders of magnitude or more in Kd value between target and non-target binding . For example, the cro repressor binds its cognate OR3 operator with a Kd of 2 pM while binding non-specific DNA ∼ 104 times weaker with a Kd of 1.5 μM . However, other proteins appear to have low intrinsic specificity, binding their intended target and many other non-specific targets with similar affinities [3–5]. For example, Michaud et al. analyzed the binding of 11 antibodies to ∼5000 different yeast proteins and although they found five were highly specific towards their antigen, five others were cross reactive towards a number of other antigens, and one was promiscuous, binding >1000 partners . The interactions of these proteins may still achieve high specificity through alternative mechanisms that Bhattacharyya et al. define as “contextual specificity” . Contextual specificity is the contribution of the environment to interaction specificity. For example, the intended target can be separated from other proteins through coordinated temporal and spatial localization within the cell. This is seen in the case of signaling pathways that are initiated at the membrane, where recruitment serves to enhance specificity by increasing the local concentration of the specific interaction partners over other proteins. Contextual specificity also operates through the requirement that some target proteins bind in a cooperative multi-protein complex. As such these proteins usually provide additional binding sites in the interaction that are less likely to be present in other proteins. Fig 1A illustrates these specificity concepts and provides examples from known SH3 domain interactions. The relative importance of intrinsic and contextual specificity in families of related proteins has not yet been well defined [7, 8], and is the purpose of the present study.
Our simple approach of grouping representative domain sequences into paralog and orthologs allowed us to comprehensively assess the degree of uniqueness for all members of the yeast SH3 domain family. We focused on the binding surfaces I and II, where most proline-rich target peptides bind and large amounts of binding data is available. We also made an initial exploration of other clusters within the domain and sequences outside the canonical boundaries. Our findings, which are supported by high-throughput binding data analysis, accurately predict that most of this family can bind peptide targets specifically via a unique SII [41, 43]. In fact, our study supports two approaches for high intrinsic specificity at the peptide binding site. The first and more common type involves extending the canonical binding region (SI) with a unique surface (SII). A second less common mode changes SI to be unique in addition to a unique SII, making the complete binding surface even more distinct from the other family members (only Bud14, Fus1 and Nbp2 domains significantly deviate from the canonical SI). It should be noted that evolutionary changes occur on a sliding scale as some domains deviate by just one residue in the canonical SI. Thus these two approaches appear to represent two extremes and the family appears to have varying degrees of non-canonical recognition.