Prabakaran R, Goel Dhruv, Kumar Sandeep, Gromiha M Michael
Department of Biotechnology, Bhupat Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India.
Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, Allahabad, 211004, India.
Proteins. 2017 Jun;85(6):1099-1118. doi: 10.1002/prot.25276. Epub 2017 Mar 24.
Protein aggregation leads to several burdensome human maladies, but a molecular level understanding of how human proteome has tackled the threat of aggregation is currently lacking. In this work, we survey the human proteome for incidence of aggregation prone regions (APRs), by using sequences of experimentally validated amyloid-fibril forming peptides and via computational predictions. While approximately 30 human proteins are currently known to be amyloidogenic, we found that 260 proteins (∼1% of human proteome) contain at least one experimentally validated amyloid-fibril forming segment. Computer predictions suggest that more than 80% of the human proteins contain at least one potential APR and approximately two-thirds (65%) contain two or more APRs; spanning 3-5% of their sequences. Sequence randomizations show that this apparently high incidence of APRs has been actually significantly reduced by unique amino acid composition and sequence patterning of human proteins. The human proteome has utilized a wide repertoire of sequence-structural optimization strategies, most of them already known, to minimize deleterious consequences due to the presence of APRs while simultaneously taking advantage of their order promoting properties. This survey also found that APRs tend to be located near the active and ligand binding sites in human proteins, but not near the post translational modification sites. The APRs in human proteins are also preferentially found at heterotypic interfaces rather than homotypic ones. Interestingly, this survey reveals that APRs play multiple, often opposing, roles in the human protein sequence-structure-function relationships. Insights gained from this work have several interesting implications towards novel drug discovery and development. Proteins 2017; 85:1099-1118. © 2017 Wiley Periodicals, Inc.
蛋白质聚集会引发多种令人困扰的人类疾病,但目前尚缺乏对人类蛋白质组如何应对聚集威胁的分子水平理解。在这项研究中,我们通过使用经实验验证的淀粉样纤维形成肽的序列并借助计算预测,来调查人类蛋白质组中易聚集区域(APR)的发生率。虽然目前已知约30种人类蛋白质具有淀粉样变性,但我们发现260种蛋白质(约占人类蛋白质组的1%)至少包含一个经实验验证的淀粉样纤维形成片段。计算机预测表明,超过80%的人类蛋白质至少包含一个潜在的APR,约三分之二(65%)包含两个或更多APR;这些APR跨越其序列的3 - 5%。序列随机化表明,由于人类蛋白质独特的氨基酸组成和序列模式,APR这种明显较高的发生率实际上已显著降低。人类蛋白质组利用了多种序列 - 结构优化策略,其中大多数已为人所知,以尽量减少由于APR的存在而产生的有害后果,同时利用它们促进有序排列的特性。这项调查还发现,APR往往位于人类蛋白质的活性和配体结合位点附近,但不在翻译后修饰位点附近。人类蛋白质中的APR也优先出现在异型界面而非同型界面。有趣的是,这项调查揭示了APR在人类蛋白质序列 - 结构 - 功能关系中发挥着多种、通常相互对立的作用。从这项工作中获得的见解对新型药物的发现和开发具有若干有趣的启示。《蛋白质》2017年;85:1099 - 1118。© 2017威利期刊公司