Suppr超能文献

非二元基因树与转移、重复和丢失的解析与调和

Resolution and reconciliation of non-binary gene trees with transfers, duplications and losses.

作者信息

Jacox Edwin, Weller Mathias, Tannier Eric, Scornavacca Celine

机构信息

ISE-M, Université Montpellier, CNRS, IRD, EPHE, Montpellier, France.

Institut de Biologie Computationnelle (IBC), Montpellier, France.

出版信息

Bioinformatics. 2017 Apr 1;33(7):980-987. doi: 10.1093/bioinformatics/btw778.

Abstract

SUMMARY

Gene trees reconstructed from sequence alignments contain poorly supported branches when the phylogenetic signal in the sequences is insufficient to determine them all. When a species tree is available, the signal of gains and losses of genes can be used to correctly resolve the unsupported parts of the gene history. However finding a most parsimonious binary resolution of a non-binary tree obtained by contracting the unsupported branches is NP-hard if transfer events are considered as possible gene scale events, in addition to gene origination, duplication and loss. We propose an exact, parameterized algorithm to solve this problem in single-exponential time, where the parameter is the number of connected branches of the gene tree that show low support from the sequence alignment or, equivalently, the maximum number of children of any node of the gene tree once the low-support branches have been collapsed. This improves on the best known algorithm by an exponential factor. We propose a way to choose among optimal solutions based on the available information. We show the usability of this principle on several simulated and biological datasets. The results are comparable in quality to several other tested methods having similar goals, but our approach provides a lower running time and a guarantee that the produced solution is optimal.

AVAILABILITY AND IMPLEMENTATION

Our algorithm has been integrated into the ecceTERA phylogeny package, available at http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and which can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera .

CONTACT

celine.scornavacca@umontpellier.fr.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

当序列中的系统发育信号不足以确定所有分支时,从序列比对重建的基因树会包含支持度较差的分支。当物种树可用时,基因得失信号可用于正确解析基因历史中支持度不足的部分。然而,如果除了基因起源、复制和丢失之外,还将转移事件视为可能的基因尺度事件,那么找到通过收缩支持度不足的分支得到的非二叉树的最简约二叉解析是NP难问题。我们提出了一种精确的参数化算法,能在单指数时间内解决这个问题,其中参数是基因树中显示序列比对支持度低的连通分支的数量,或者等效地,是一旦支持度低的分支被收缩后基因树中任何节点的最大子节点数。这比最著名的算法有指数级的改进。我们提出了一种基于可用信息在最优解中进行选择的方法。我们在几个模拟和生物数据集上展示了这一原理的实用性。结果在质量上与其他几个有类似目标的测试方法相当,但我们的方法运行时间更短,并能保证产生的解是最优的。

可用性和实现

我们的算法已集成到ecceTERA系统发育包中,可从http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA获取,也可在http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera在线运行。

联系方式

celine.scornavacca@umontpellier.fr

补充信息

补充数据可在《生物信息学》在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验