School of Software Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China.
School of Life Science, Huazhong University of Science and Technology, Wuhan, China.
BMC Bioinformatics. 2020 Sep 29;21(1):426. doi: 10.1186/s12859-020-03757-2.
Structure comparison can provide useful information to identify functional and evolutionary relationship between proteins. With the dramatic increase of protein structure data in the Protein Data Bank, computation time quickly becomes the bottleneck for large scale structure comparisons. To more efficiently deal with informative multiple structure alignment tasks, we propose pmTM-align, a parallel protein structure alignment approach based on mTM-align/TM-align. pmTM-align contains two stages to handle pairwise structure alignments with Spark and the phylogenetic tree-based multiple structure alignment task on a single computer with OpenMP.
Experiments with the SABmark dataset showed that parallelization along with data structure optimization provided considerable speedup for mTM-align. The Spark-based structure alignments achieved near ideal scalability with large datasets, and the OpenMP-based construction of the phylogenetic tree accelerated the incremental alignment of multiple structures and metrics computation by a factor of about 2-5.
pmTM-align enables scalable pairwise and multiple structure alignment computing and offers more timely responses for medium to large-sized input data than existing alignment tools such as mTM-align.
结构比对可以提供有用的信息,以识别蛋白质之间的功能和进化关系。随着蛋白质结构数据库中蛋白质结构数据的急剧增加,计算时间迅速成为大规模结构比对的瓶颈。为了更有效地处理信息丰富的多结构比对任务,我们提出了 pmTM-align,这是一种基于 mTM-align/TM-align 的并行蛋白质结构比对方法。pmTM-align 包含两个阶段,使用 Spark 处理两两结构比对,使用 OpenMP 在单台计算机上处理基于系统发育树的多结构比对任务。
使用 SABmark 数据集进行的实验表明,并行化和数据结构优化为 mTM-align 提供了相当大的加速。基于 Spark 的结构比对在处理大型数据集时实现了近乎理想的可扩展性,而基于 OpenMP 的系统发育树构建加速了多个结构的增量比对和度量计算,速度提高了约 2-5 倍。
pmTM-align 实现了可扩展的两两和多结构比对计算,为中等至大型输入数据提供了比现有比对工具(如 mTM-align)更及时的响应。