Feng Peng-Mian, Chen Wei, Lin Hao, Chou Kuo-Chen
School of Public Health, Hebei United University, Tangshan 063000, China.
Anal Biochem. 2013 Nov 1;442(1):118-25. doi: 10.1016/j.ab.2013.05.024. Epub 2013 Jun 10.
Heat shock proteins (HSPs) are a type of functionally related proteins present in all living organisms, both prokaryotes and eukaryotes. They play essential roles in protein-protein interactions such as folding and assisting in the establishment of proper protein conformation and prevention of unwanted protein aggregation. Their dysfunction may cause various life-threatening disorders, such as Parkinson's, Alzheimer's, and cardiovascular diseases. Based on their functions, HSPs are usually classified into six families: (i) HSP20 or sHSP, (ii) HSP40 or J-class proteins, (iii) HSP60 or GroEL/ES, (iv) HSP70, (v) HSP90, and (vi) HSP100. Although considerable progress has been achieved in discriminating HSPs from other proteins, it is still a big challenge to identify HSPs among their six different functional types according to their sequence information alone. With the avalanche of protein sequences generated in the post-genomic age, it is highly desirable to develop a high-throughput computational tool in this regard. To take up such a challenge, a predictor called iHSP-PseRAAAC has been developed by incorporating the reduced amino acid alphabet information into the general form of pseudo amino acid composition. One of the remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimension disaster or overfitting problem in statistical prediction. It was observed that the overall success rate achieved by iHSP-PseRAAAC in identifying the functional types of HSPs among the aforementioned six types was more than 87%, which was derived by the jackknife test on a stringent benchmark dataset in which none of HSPs included has ≥40% pairwise sequence identity to any other in the same subset. It has not escaped our notice that the reduced amino acid alphabet approach can also be used to investigate other protein classification problems. As a user-friendly web server, iHSP-PseRAAAC is accessible to the public at http://lin.uestc.edu.cn/server/iHSP-PseRAAAC.
热休克蛋白(HSPs)是一类在原核生物和真核生物等所有生物中都存在的功能相关蛋白。它们在蛋白质 - 蛋白质相互作用中发挥着重要作用,例如折叠、协助建立正确的蛋白质构象以及防止不必要的蛋白质聚集。它们的功能失调可能导致各种危及生命的疾病,如帕金森病、阿尔茨海默病和心血管疾病。基于其功能,热休克蛋白通常分为六个家族:(i)HSP20或小热休克蛋白(sHSP),(ii)HSP40或J类蛋白,(iii)HSP60或GroEL/ES,(iv)HSP70,(v)HSP90,以及(vi)HSP100。尽管在区分热休克蛋白与其他蛋白质方面已经取得了相当大的进展,但仅根据序列信息在其六种不同功能类型中识别热休克蛋白仍然是一个巨大的挑战。随着后基因组时代产生的大量蛋白质序列,非常需要开发一种高通量计算工具来应对这一挑战。为了迎接这一挑战,通过将简化氨基酸字母表信息纳入伪氨基酸组成的一般形式,开发了一种名为iHSP - PseRAAAC的预测器。引入简化氨基酸字母表的一个显著优点是能够避免统计预测中臭名昭著的维度灾难或过拟合问题。据观察,iHSP - PseRAAAC在识别上述六种类型的热休克蛋白功能类型方面的总体成功率超过87%,这是通过对一个严格的基准数据集进行留一法检验得出的,在该数据集中,没有任何热休克蛋白与同一子集中的任何其他热休克蛋白具有≥40%的成对序列同一性。我们也注意到简化氨基酸字母表方法还可用于研究其他蛋白质分类问题。作为一个用户友好的网络服务器,公众可通过http://lin.uestc.edu.cn/server/iHSP-PseRAAAC访问iHSP - PseRAAAC。