帕累托最优的两两序列比对。

Pareto optimal pairwise sequence alignment.

机构信息

Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):481-93. doi: 10.1109/TCBB.2013.2.

DOI:10.1109/TCBB.2013.2

Abstract

Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

摘要

使用进化轮廓进行序列比对是研究蛋白质时常用的工具。已经开发了许多用于此类比对的轮廓-轮廓评分函数，但尚未对组合多个此类函数的 Pareto 最优成对比对进行全面研究。我们表明，生成 Pareto 最优成对比对的问题具有最优子结构属性，并开发了一种用于生成 Pareto 最优成对比对前沿的有效算法。从 11 个函数的池中使用所有可能的两个、三个和四个轮廓评分函数对 ce_ref 数据集中的 588 对蛋白质进行应用。还在从 BAliBASE RV11 数据集中提取的 913 对蛋白质的独立集中评估了最佳目标组合在 ce_ref 上的性能。我们基于动态规划的启发式方法生成了近似 Pareto 最优的成对比对前沿，其中包含与精确前沿上可比的比对，但在四个目标的情况下，平均时间不到 1/58。我们的结果表明，Pareto 前沿包含质量优于单个目标获得的比对的比对。然而，在 Pareto 前沿中识别单个高质量比对的任务仍然具有挑战性。