Paszek Jarosław, Markin Alexey, Górecki Paweł, Eulenstein Oliver
Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland.
Department of Computer Science, Iowa State University, Ames, Iowa, USA.
J Comput Biol. 2021 Aug;28(8):758-773. doi: 10.1089/cmb.2021.0011. Epub 2021 Apr 16.
The duplication-loss-coalescence (DLC) parsimony model is invaluable for analyzing the complex scenarios of concurrent duplication loss and deep coalescence events in the evolution of gene families. However, inferring such scenarios for already moderately sized families is prohibitive owing to the computational complexity involved. To overcome this stringent limitation, we make the first step by describing a flexible integer linear programming (ILP) formulation for inferring DLC evolutionary scenarios. Then, to make the DLC model more scalable, we introduce four sensibly constrained versions of the model and describe modified versions of our ILP formulation reflecting these constraints. Our simulation studies showcase that our constrained ILP formulations compute evolutionary scenarios that are substantially larger than scenarios computable under our original ILP formulation and the original dynamic programming algorithm by Wu et al. Furthermore, scenarios computed under our constrained DLC models are remarkably accurate compared with corresponding scenarios under the original DLC model, which we also confirm in an empirical study with thousands of gene families.
复制-丢失-合并(DLC)简约模型对于分析基因家族进化过程中同时发生的复制丢失和深度合并事件的复杂情况非常有价值。然而,由于涉及的计算复杂性,为已经中等规模的家族推断此类情况是令人望而却步的。为了克服这一严格限制,我们迈出了第一步,描述了一种用于推断DLC进化情况的灵活整数线性规划(ILP)公式。然后,为了使DLC模型更具可扩展性,我们引入了该模型的四个合理约束版本,并描述了反映这些约束的ILP公式的修改版本。我们的模拟研究表明,我们的约束ILP公式计算出的进化情况比我们原来的ILP公式和Wu等人的原始动态规划算法所能计算的情况要大得多。此外,与原始DLC模型下的相应情况相比,在我们的约束DLC模型下计算出的情况非常准确,我们在对数千个基因家族的实证研究中也证实了这一点。