Zhang Jiong, Yen Ian E H, Ravikumar Pradeep, Dhillon Inderjit S
University of Texas at Austin.
Carnegie Mellon University.
JMLR Workshop Conf Proc. 2017 Apr;54:1514-1522.
is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm [23], which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an algorithm that exploits to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.
是生物序列分析中的基本任务之一,它是诸如系统发育树、轮廓和结构预测等应用的基础。然而,该任务是NP难的,当前的做法是采用启发式和局部搜索方法。最近,基于原子范数的概念提出了一种用于多序列比对(MSA)的凸优化方法[23],该方法在比对质量上比现有方法有显著提高。然而,由于两个原子范数球相交给出的约束,凸规划很难求解,现有算法只能处理长度达50的序列,其迭代复杂度取决于与MSA自然参数关系未知的常数。在这项工作中,我们提出了一种算法,该算法利用[具体内容缺失]为每个原子范数约束子问题导出闭式解,给出了一种迭代复杂度与问题规模(所有序列的总长度)成线性关系的单循环算法。在长度达数百的序列上,所提出的算法比对现有方法得到的比对结果要好得多,而现有凸规划方法在一天内无法收敛。