Saule Cédric, Giegerich Robert
Faculty of Technology and the Center for Biotechnology, Bielefeld University, Bielefeld, Germany.
Algorithms Mol Biol. 2015 Jul 7;10:22. doi: 10.1186/s13015-015-0051-7. eCollection 2015.
Pareto optimization combines independent objectives by computing the Pareto front of its search space, defined as the set of all solutions for which no other candidate solution scores better under all objectives. This gives, in a precise sense, better information than an artificial amalgamation of different scores into a single objective, but is more costly to compute. Pareto optimization naturally occurs with genetic algorithms, albeit in a heuristic fashion. Non-heuristic Pareto optimization so far has been used only with a few applications in bioinformatics. We study exact Pareto optimization for two objectives in a dynamic programming framework. We define a binary Pareto product operator [Formula: see text] on arbitrary scoring schemes. Independent of a particular algorithm, we prove that for two scoring schemes A and B used in dynamic programming, the scoring scheme [Formula: see text] correctly performs Pareto optimization over the same search space. We study different implementations of the Pareto operator with respect to their asymptotic and empirical efficiency. Without artificial amalgamation of objectives, and with no heuristics involved, Pareto optimization is faster than computing the same number of answers separately for each objective. For RNA structure prediction under the minimum free energy versus the maximum expected accuracy model, we show that the empirical size of the Pareto front remains within reasonable bounds. Pareto optimization lends itself to the comparative investigation of the behavior of two alternative scoring schemes for the same purpose. For the above scoring schemes, we observe that the Pareto front can be seen as a composition of a few macrostates, each consisting of several microstates that differ in the same limited way. We also study the relationship between abstract shape analysis and the Pareto front, and find that they extract information of a different nature from the folding space and can be meaningfully combined.
帕累托优化通过计算其搜索空间的帕累托前沿来组合独立目标,帕累托前沿定义为在所有目标下没有其他候选解得分更高的所有解的集合。从精确意义上讲,这比将不同分数人为合并为单个目标能提供更好的信息,但计算成本更高。帕累托优化自然地出现在遗传算法中,尽管是以启发式的方式。到目前为止,非启发式帕累托优化仅在生物信息学的少数应用中使用。我们在动态规划框架中研究针对两个目标的精确帕累托优化。我们在任意评分方案上定义了一个二元帕累托积算子[公式:见正文]。独立于特定算法,我们证明对于动态规划中使用的两个评分方案A和B,评分方案[公式:见正文]在相同搜索空间上正确地执行帕累托优化。我们研究了帕累托算子的不同实现方式在渐近效率和经验效率方面的情况。在不人为合并目标且不涉及启发式方法的情况下,帕累托优化比分别为每个目标计算相同数量的答案要快。对于最小自由能与最大预期准确率模型下的RNA结构预测,我们表明帕累托前沿的经验规模保持在合理范围内。帕累托优化有助于对同一目的的两种替代评分方案的行为进行比较研究。对于上述评分方案,我们观察到帕累托前沿可以看作是几个宏观状态的组合,每个宏观状态由几个微观状态组成,这些微观状态以相同的有限方式不同。我们还研究了抽象形状分析与帕累托前沿之间的关系,发现它们从折叠空间中提取不同性质的信息并且可以有意义地结合起来。