Research Article: The Protein Structure Context of PolyQ Regions

Date Published: January 26, 2017

Publisher: Public Library of Science

Author(s): Franziska Totzeck, Miguel A. Andrade-Navarro, Pablo Mier, Patrick van der Wel.


Proteins containing glutamine repeats (polyQ) are known to be structurally unstable. Abnormal expansion of polyQ in some proteins exceeding a certain threshold leads to neurodegenerative disease, a symptom of which are protein aggregates. This has led to extensive research of the structure of polyQ stretches. However, the accumulation of contradictory results suggests that protein context might be of importance. Here we aimed to evaluate the structural context of polyQ regions in proteins by analysing the secondary structure of polyQ proteins and their homologs. The results revealed that the secondary structure in polyQ vicinity is predominantly random coil or helix. Importantly, the regions surrounding the polyQ are often not solved in 3D structures. In the few cases where the point of insertion of the polyQ was mapped to a full protein, we observed that these are always located in the surface of the protein. The findings support the hypothesis that polyQ might serve to extend coiled coils at their C-terminus in highly disordered regions involved in protein-protein interactions.

Partial Text

Homopeptide repeats are consecutive stretches of the same amino acid in protein sequences. They are surprisingly common in proteins, and it has been suggested that they form unstructured stretches within a protein and may serve a function in protein-protein interaction (PPI) [1, 2]. Polyglutamine (polyQ) in particular is one of the most common homopeptide repeats in eukaryotic proteomes [1, 3]. It can be found in a variety of protein families which do not appear to be related [4].

A set of 178 clusters, consisting of polyQ proteins and their homologs (at least a 53% identity between proteins in a cluster, see Methods for details), was analysed for secondary structure context in the vicinity of polyQ regions. Homologs were taken into account in order to increase the amount of secondary structure information, which was categorised into helix, sheet or random coil. The clusters in the dataset were of varying size, from one to 243 proteins. A total of 282 proteins out of the 926 in the dataset contain a polyQ with at least eight glutamines per ten amino acids (an 8/10 polyQ). Most of them only contain one polyQ, but there are a few that contain a higher number, up to six.

We previously reported the co-occurrence of coiled coils and polyQ by sequence analysis of proteins containing polyQ [5]. There, these findings were complemented with the analysis of the protein interaction networks surrounding polyQ proteins (e.g. finding that polyQ proteins tend to interact with other polyQ proteins). Here, we used a different set of sequences, which for the most part consists of proteins of known 3D structure without polyQ but homologous to polyQ proteins. This allowed us to have a focus on the study of the structure surrounding polyQ while at the same time avoiding the problem of lack of 3D structures of polyQ regions.