Razmara Jafar, Deris Safaai, Parvizpour Sepideh
Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia 81310, Johor Bahru, Malaysia.
Algorithms Mol Biol. 2012 Feb 15;7:4. doi: 10.1186/1748-7188-7-4.
In structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Despite the considerable number of techniques that have been explored within the past two decades, the development of new alternative methods is still an active research area due to the need for high performance tools.
In this paper, we present TS-AMIR, a Topology String Alignment Method for Intensive Rapid comparison of protein structures. The proposed method works in two stages: In the first stage, the method generates a topology string based on the geometric details of secondary structure elements, and then, utilizes an n-gram modelling technique over entropy concept to capture similarities in these strings. This initial correspondence map between secondary structure elements is submitted to the second stage in order to obtain the alignment at the residue level. Applying the Kabsch method, a heuristic step-by-step algorithm is adopted in the second stage to align the residues, resulting in an optimal rotation matrix and minimized RMSD. The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes. It is shown that the method obtains a high running speed similar to that of the linear encoding schemes. In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.
The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.
在结构生物学中,蛋白质结构的相似性分析是研究蛋白质之间关系的关键步骤。尽管在过去二十年中已经探索了大量技术,但由于对高性能工具的需求,新的替代方法的开发仍然是一个活跃的研究领域。
在本文中,我们提出了TS-AMIR,一种用于蛋白质结构密集快速比较的拓扑字符串比对方法。该方法分两个阶段工作:在第一阶段,该方法基于二级结构元件的几何细节生成一个拓扑字符串,然后,利用基于熵概念的n元语法建模技术来捕获这些字符串中的相似性。二级结构元件之间的初始对应映射被提交到第二阶段,以获得残基水平的比对。应用Kabsch方法,在第二阶段采用启发式逐步算法来比对残基,从而得到最优旋转矩阵并使均方根偏差最小化。该方法的性能在不同的信息检索测试中进行了评估,并将结果与CE和TM-align这两种几何工具以及YAKUSA、3D-BLAST和SARST这三种线性编码方案的代表进行了比较。结果表明,该方法具有与线性编码方案相似的高运行速度。此外,该方法的运行速度分别比TM-align和CE快约800倍和7200倍,同时与TM-align和CE保持有竞争力的准确性。
实验结果表明,线性编码技术能够达到与几何方法相同的高精度,同时通常比传统程序快数百倍。