Srivastava S, Lal S B, Mishra D C, Angadi U B, Chaturvedi K K, Rai S N, Rai A
ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India ; Biostatistics Shared Facility, James Graham Brown Cancer Center, University of Louisville, Louisville, USA.
ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India.
Algorithms Mol Biol. 2016 Sep 29;11:27. doi: 10.1186/s13015-016-0089-1. eCollection 2016.
Protein structure comparison play important role in in silico functional prediction of a new protein. It is also used for understanding the evolutionary relationships among proteins. A variety of methods have been proposed in literature for comparing protein structures but they have their own limitations in terms of accuracy and complexity with respect to computational time and space. There is a need to improve the computational complexity in comparison/alignment of proteins through incorporation of important biological and structural properties in the existing techniques.
An efficient algorithm has been developed for comparing protein structures using elastic shape analysis in which the sequence of 3D coordinates atoms of protein structures supplemented by additional auxiliary information from side-chain properties are incorporated. The protein structure is represented by a special function called square-root velocity function. Furthermore, singular value decomposition and dynamic programming have been employed for optimal rotation and optimal matching of the proteins, respectively. Also, geodesic distance has been calculated and used as the dissimilarity score between two protein structures. The performance of the developed algorithm is tested and found to be more efficient, i.e., running time reduced by 80-90 % without compromising accuracy of comparison when compared with the existing methods. Source codes for different functions have been developed in R. Also, user friendly web-based application called ProtSComp has been developed using above algorithm for comparing protein 3D structures and is accessible free.
The methodology and algorithm developed in this study is taking considerably less computational time without loss of accuracy (Table 2). The proposed algorithm is considering different criteria of representing protein structures using 3D coordinates of atoms and inclusion of residue wise molecular properties as auxiliary information.
蛋白质结构比较在新蛋白质的计算机功能预测中起着重要作用。它还用于理解蛋白质之间的进化关系。文献中已经提出了多种比较蛋白质结构的方法,但它们在计算时间和空间方面的准确性和复杂性方面都有各自的局限性。需要通过将重要的生物学和结构特性纳入现有技术来提高蛋白质比较/比对中的计算复杂性。
已经开发了一种使用弹性形状分析比较蛋白质结构的高效算法,其中结合了蛋白质结构的3D坐标原子序列以及来自侧链特性的额外辅助信息。蛋白质结构由一种称为平方根速度函数的特殊函数表示。此外,分别采用奇异值分解和动态规划进行蛋白质的最佳旋转和最佳匹配。还计算了测地距离并将其用作两个蛋白质结构之间的差异分数。测试了所开发算法的性能,发现其更高效,即与现有方法相比,运行时间减少了80 - 90%,同时不影响比较的准确性。不同函数的源代码已用R语言开发。此外,还使用上述算法开发了一个名为ProtSComp的用户友好型基于网络的应用程序,用于比较蛋白质3D结构,可免费访问。
本研究中开发的方法和算法在不损失准确性的情况下计算时间大大减少(表2)。所提出的算法考虑了使用原子的3D坐标表示蛋白质结构以及纳入残基层面分子特性作为辅助信息的不同标准。