Zhang Xuechen, Chen Zhuoyang, Li Junyu, Luo Qiong, Wu Longjun, Yu Weichuan
Department of Electronic and Computational Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
Data Science and Analytics Thrust, Information Hub, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong 511400, China.
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf309.
The three-dimensional protein tertiary structure alignment is a fundamental problem that seeks insights into functions and evolution. Previous structure alignment algorithms have adopted the sequential assumption and used dynamic programming solvers. However, many distantly related structures exhibit non-sequential similarities, and non-sequential alignment tools are less efficient and accurate than sequential ones. In this paper, we formulate the non-sequential alignment as the Entropy-regularized Partial Linear Sum Assignment Problem (epLSAP) and propose a solver based on Sinkhorn algorithms, referred to as epLSAP-Align.
Compared with existing non-sequential alignment solvers, our epLSAP-Align can explicitly model the gap penalty, efficiently achieve global optimality and balance coverage and fidelity. We show that epLSAP-Align can be easily integrated into the existing frameworks, such as TM-align and MICAN, resulting in the non-sequential alignment tool epLSAP-TM and epLSAP-MICAN, respectively. Both epLSAP-TM and epLSAP-MICAN achieve better performance than the existing non-sequential alignment tools in terms of biologically meaningful structure overlaps on two sequential alignment test sets MALIDUP and MALISAM, and four non-sequential alignment test sets MALIDUP-ns, MALISAM-ns, 64-difficult-case and RIPC datasets. Also, compared with the most recent non-sequential alignment tool USalign2, our epLSAP-TM is at least 22% faster under the same setting.
Our source code is available at https://github.com/xzhangem/epLSAP-align.
三维蛋白质三级结构比对是一个基础性问题,旨在深入了解蛋白质的功能和进化。以往的结构比对算法采用了序列假设并使用动态规划求解器。然而,许多远缘相关结构表现出非序列相似性,并且非序列比对工具在效率和准确性方面都不如序列比对工具。在本文中,我们将非序列比对问题表述为熵正则化部分线性和分配问题(epLSAP),并提出一种基于Sinkhorn算法的求解器,称为epLSAP-Align。
与现有的非序列比对求解器相比,我们的epLSAP-Align能够明确地对空位罚分进行建模,有效地实现全局最优性,并在覆盖度和保真度之间取得平衡。我们表明,epLSAP-Align可以很容易地集成到现有框架中,如TM-align和MICAN,分别产生非序列比对工具epLSAP-TM和epLSAP-MICAN。在两个序列比对测试集MALIDUP和MALISAM以及四个非序列比对测试集MALIDUP-ns、MALISAM-ns、64难例和RIPC数据集上,就具有生物学意义的结构重叠而言,epLSAP-TM和epLSAP-MICAN的性能均优于现有的非序列比对工具。此外,与最新的非序列比对工具USalign2相比,在相同设置下,我们的epLSAP-TM速度至少快22%。