Date Published: July 18, 2017
Publisher: Public Library of Science
Author(s): Mahbubur Rahman, Rovshan G. Sadygov, Andrew C. Gill.
Protein half-life is an important feature of protein homeostasis (proteostasis). The increasing number of in vivo and in vitro studies using high throughput proteomics provide estimates of the protein half-lives in tissues and cells. However, protein half-lives in cells and tissues are different. Due to the resource requirements for researching tissues, more data is available from cellular studies than tissues. We have designed a multivariate linear model for predicting protein half-life in tissue from its cellular properties. Inputs to the model are cellular half-life, abundance, intrinsically disordered sequences, and transcriptional and translational rates. Before the modeling, we determined substructures in the data using the relative distance from the regression line of the protein half-lives in tissues and cells, identifying three separate clusters. The model was trained on and applied to predict protein half-lives from murine liver, brain and heart tissues. In each tissue type we observed similar prediction patterns of protein half-lives. We found that the model provides the best results when there is a strong correlation between tissue and cell culture protein half-lives. Additionally, we clustered the protein half-lives to determine variations in correlation coefficients between the protein half-lives in the tissue versus in cell culture. The clusters identify strongly and weakly correlated protein half-lives, further improves the overall prediction and identifies sub groupings which exhibit specific characteristics. The model described herein, is generalizable to other data sets and has been implemented in a freely available R code.
Proteostasis is a cellular process that includes control of concentrations, conformations, binding interactions, and locations of individual proteins. Proteostasis integrates into other cellular processes such as (external or internal) signal response, cellular proliferation, and aging. It enables cells to change their physiology for successful organismal development and aging while under constant challenges from intrinsic and environmental factors. An important characteristic of proteostasis is the turnover rate of a protein (half-life). New technological advances in proteomics field are enabling researchers to profile the proteome dynamics of cell lines[2, 3], tissues, and living organisms in high throughput experiments, allowing for half-life estimations for a large number of proteins. These experiments create new opportunities for inferring the networks and pathways controlling cellular proteostasis and assist with understanding the sequence of regulatory events that lead to the integration of cellular processes including gene expression, translation, post-translational protein modifications, and sub cellular localization. However, the analysis of the time course data from metabolic labeling experiments, especially generalization of the results from cell lines to the tissues which is required for such studies, poses several new challenges in bioinformatics, statistical data processing, and modeling. While proteome dynamics data from cell lines is becoming readily available, the labeling of living organisms is expensive and laborious. In addition, half-life measurements in vivo are meaningful only for relatively long living proteins as it takes a few hours for the administered labeling to be incorporated into a tissue in the body. However, this limitation is not present in cultured cells, allowing half-lives as short as one to two hours to potentially be measured. Therefore, computational techniques are needed to map the observations from cell lines to the corresponding tissues. Another challenge, though not addressed here, is that tissues are composed of different cell types, therefore requiring the combination of protein information from multiple cell types. In this study, we make a first attempt at predicting protein turnover rates in tissues from their cellular properties (e.g. Fig 1), and propose a multivariate linear model.
We used publicly available in vivo data sets from murine liver, brain, and heart, and in vitro data sets from murine fibroblast (NIH3T3), and myoblast (C2C12) cell lines for protein half-lives. The in vivo experiment used Nitrogen-15 (15N) isotope labeling in the murine brain and liver study and heavy water labeling in the heart study. The cell lines studies used stable isotope labeling with amino acids in cell cultures (SILAC). In these data sets, there were 434 liver proteins common between cell culture and liver tissue and 354 brain proteins common between cell culture and brain tissue. Of these, 366 common liver proteins and 346 common brain proteins have longer half-lives in the tissue than in the cell cultures (e.g. NIH3T3). We have used these common protein data sets to train and validate the multivariate linear model of the protein half-life prediction. Finally, we have applied the model to other two data sets; one from the in vitro C2C12 myoblasts and another from the in vivo murine heart experiment (Supporting information).
We developed a model using proteins which have longer half-lives in the tissue than in cell culture and are common in both (91% of all data). However, the model is also applicable to those common proteins that have shorter half-lives in the tissue than in cell culture (see Results). For a first attempt, we were interested in predicting the protein half-lives for the former group (longer tissue half-lives). Hence, we created our protein data sets from the experimental data sets[7–9] following the first assumption. We applied a linear regression model to the common protein half-life data sets[2, 7] (Fig 2 and S1 Fig) to first understand their linear relationship.
We are presenting the analysis of the performance of the model, the effect of clustering on the protein half-life prediction, and half-life prediction of uncommon proteins. The analysis focuses on the correlation coefficients between protein half-lives in the tissue and cell cultures, as the half-life clusters are formed based on the correlation coefficients. This also leads to the identification of common protein half-life characteristics of each cluster (S4 Table). We have observed that these characteristics are common to the murine proteins from liver, brain, and heart. Additionally, we have analyzed biological/biochemical properties of proteins from the protein database.
We have provided the first study of predicting the protein half-life at the tissue level from the cellular level. The model is simple, easy to implement and will be applicable to other tissues and cell line experimental data sets. We have analyzed the linear relationships between the protein half-life in the tissue and cell by using the multivariate linear model along with clustering. The clustering reveals linear and correlation coefficient based relationships between the protein half-lives and protein properties along with improvement of the prediction.