Kaushik Rahul, Singh Ankita, Jayaram B
Kusuma School of Biological Sciences, Indian Institute of Technology , Delhi, India.
Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology , Delhi, India.
Biochemistry. 2018 Feb 6;57(5):503-506. doi: 10.1021/acs.biochem.7b01073. Epub 2017 Dec 22.
The fact that amino acid sequences dictate the tertiary structures of proteins has been known for more than five decades. While the molecular pathways to tertiary structure are still being worked out, with the axiom that similar sequences adopt similar structures, computational methods are being developed continually in parallel, utilizing the Protein Data Bank structural repository and homologue detection strategies to predict structures of sequences of interest. The success of this approach is limited by the ability to unravel the hidden similarities among amino acid sequences. We consider here the 20 amino acids as a complete set of chemical templates in the physicochemical space of proteins and propose a new structural and chemical classification of amino acids. An integration of this perspective into the conventional evolutionary methods of similarity detection leads to an unprecedented increase in the accuracy in homologue detection, resulting in improved protein structure prediction. The performance is validated on a large data set of 11716 unique proteins, and the results are benchmarked against conventional methods. The availability of good quality protein structures helps in structure-based drug design endeavors and in establishing protein structure-function correlations.
氨基酸序列决定蛋白质三级结构这一事实已为人所知五十多年了。虽然通向三级结构的分子途径仍在研究之中,基于相似序列采用相似结构这一公理,并行地不断开发计算方法,利用蛋白质数据库结构储存库和同源物检测策略来预测感兴趣序列的结构。这种方法的成功受到解开氨基酸序列间隐藏相似性能力的限制。我们在此将20种氨基酸视为蛋白质物理化学空间中的一套完整化学模板,并提出一种新的氨基酸结构和化学分类。将这一观点整合到传统的相似性检测进化方法中,会使同源物检测的准确性得到前所未有的提高,从而改进蛋白质结构预测。在一个包含11716种独特蛋白质的大数据集上验证了该性能,并将结果与传统方法进行了基准对比。高质量蛋白质结构的可得性有助于基于结构的药物设计工作以及建立蛋白质结构-功能相关性。