通过熵正则化对偶分解实现可扩展凸多序列比对

Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition.

作者信息

Zhang Jiong, Yen Ian E H, Ravikumar Pradeep, Dhillon Inderjit S

机构信息

University of Texas at Austin.

Carnegie Mellon University.

出版信息

JMLR Workshop Conf Proc. 2017 Apr;54:1514-1522.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5581665/

Abstract

is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm [23], which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an algorithm that exploits to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.

摘要

是生物序列分析中的基本任务之一，它是诸如系统发育树、轮廓和结构预测等应用的基础。然而，该任务是NP难的，当前的做法是采用启发式和局部搜索方法。最近，基于原子范数的概念提出了一种用于多序列比对（MSA）的凸优化方法[23]，该方法在比对质量上比现有方法有显著提高。然而，由于两个原子范数球相交给出的约束，凸规划很难求解，现有算法只能处理长度达50的序列，其迭代复杂度取决于与MSA自然参数关系未知的常数。在这项工作中，我们提出了一种算法，该算法利用[具体内容缺失]为每个原子范数约束子问题导出闭式解，给出了一种迭代复杂度与问题规模（所有序列的总长度）成线性关系的单循环算法。在长度达数百的序列上，所提出的算法比对现有方法得到的比对结果要好得多，而现有凸规划方法在一天内无法收敛。

相似文献

1

Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition.通过熵正则化对偶分解实现可扩展凸多序列比对

JMLR Workshop Conf Proc. 2017 Apr;54:1514-1522.

2

A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery.一种用于多序列比对和基序发现的凸原子范数方法。

JMLR Workshop Conf Proc. 2016;48:2272-2280.

3

Regularized Primal-Dual Subgradient Method for Distributed Constrained Optimization.正则化主对偶子梯度法用于分布式约束优化。

IEEE Trans Cybern. 2016 Sep;46(9):2109-18. doi: 10.1109/TCYB.2015.2464255. Epub 2015 Aug 13.

4

Binary particle swarm optimization algorithm with mutation for multiple sequence alignment.带变异的二进制粒子群优化算法用于多序列比对

Riv Biol. 2009 Jan-Apr;102(1):75-94.

5

Gradient regularization of Newton method with Bregman distances.基于布雷格曼距离的牛顿法梯度正则化

Math Program. 2024;204(1-2):1-25. doi: 10.1007/s10107-023-01943-7. Epub 2023 Mar 24.

6

A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems.一种用于非凸正则化优化问题的通用迭代收缩与阈值算法。

JMLR Workshop Conf Proc. 2013;28(2):37-45.

7

Solving large-scale general phase retrieval problems via a sequence of convex relaxations.通过一系列凸松弛来解决大规模一般相位恢复问题。

J Opt Soc Am A Opt Image Sci Vis. 2018 Aug 1;35(8):1410-1419. doi: 10.1364/JOSAA.35.001410.

8

A Novel Approach to Multiple Sequence Alignment Using Multiobjective Evolutionary Algorithm Based on Decomposition.基于分解的多目标进化算法在多重序列比对中的新方法

IEEE J Biomed Health Inform. 2016 Mar;20(2):717-27. doi: 10.1109/JBHI.2015.2403397. Epub 2015 Feb 12.

9

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks.从多个任务中学习非相干稀疏和低秩模式。

ACM Trans Knowl Discov Data. 2012 Feb 1;5(4):22. doi: 10.1145/2086737.2086742.

10

Hessian Schatten-norm regularization for linear inverse problems.海森暗范数正则化用于线性反问题。

IEEE Trans Image Process. 2013 May;22(5):1873-88. doi: 10.1109/TIP.2013.2237919. Epub 2013 Jan 4.

本文引用的文献

1

A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery.一种用于多序列比对和基序发现的凸原子范数方法。

JMLR Workshop Conf Proc. 2016;48:2272-2280.

2

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.使用 Clustal Omega 快速、可扩展地生成高质量蛋白质多重序列比对。

Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75.

3

A simple genetic algorithm for multiple sequence alignment.一种用于多序列比对的简单遗传算法。

Genet Mol Res. 2007 Oct 5;6(4):964-82.

4

Clustal W and Clustal X version 2.0.Clustal W和Clustal X 2.0版本

Bioinformatics. 2007 Nov 1;23(21):2947-8. doi: 10.1093/bioinformatics/btm404. Epub 2007 Sep 10.

5

Recent evolutions of multiple sequence alignment algorithms.多重序列比对算法的最新进展。

PLoS Comput Biol. 2007 Aug;3(8):e123. doi: 10.1371/journal.pcbi.0030123.

6

Settling the intractability of multiple alignment.解决多重比对的棘手问题。

J Comput Biol. 2006 Sep;13(7):1323-39. doi: 10.1089/cmb.2006.13.1323.

7

Kalign--an accurate and fast multiple sequence alignment algorithm.Kalign——一种准确且快速的多序列比对算法。

BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298.

8

MAFFT version 5: improvement in accuracy of multiple sequence alignment.MAFFT 5 版本：多重序列比对准确性的提升。

Nucleic Acids Res. 2005 Jan 20;33(2):511-8. doi: 10.1093/nar/gki198. Print 2005.

9

MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE：具有高精度和高吞吐量的多序列比对。

Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.

10

Recent progress in multiple sequence alignment: a survey.多重序列比对的最新进展：一项综述。

Pharmacogenomics. 2002 Jan;3(1):131-44. doi: 10.1517/14622416.3.1.131.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。