Markin Alexey, Wagle Sanket, Anderson Tavis K, Eulenstein Oliver
Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA 50010, USA.
Department of Computer Science, Iowa State University, Ames, IA 50011, USA.
Bioinformatics. 2022 Apr 12;38(8):2144-2152. doi: 10.1093/bioinformatics/btac075.
A phylogenetic network is a powerful model to represent entangled evolutionary histories with both divergent (speciation) and convergent (e.g. hybridization, reassortment, recombination) evolution. The standard approach to inference of hybridization networks is to (i) reconstruct rooted gene trees and (ii) leverage gene tree discordance for network inference. Recently, we introduced a method called RF-Net for accurate inference of virus reassortment and hybridization networks from input gene trees in the presence of errors commonly found in phylogenetic trees. While RF-Net demonstrated the ability to accurately infer networks with up to four reticulations from erroneous input gene trees, its application was limited by the number of reticulations it could handle in a reasonable amount of time. This limitation is particularly restrictive in the inference of the evolutionary history of segmented RNA viruses such as influenza A virus (IAV), where reassortment is one of the major mechanisms shaping the evolution of these pathogens.
Here, we expand the functionality of RF-Net that makes it significantly more applicable in practice. Crucially, we introduce a fast extension to RF-Net, called Fast-RF-Net, that can handle large numbers of reticulations without sacrificing accuracy. In addition, we develop automatic stopping criteria to select the appropriate number of reticulations heuristically and implement a feature for RF-Net to output error-corrected input gene trees. We then conduct a comprehensive study of the original method and its novel extensions and confirm their efficacy in practice using extensive simulation and empirical IAV evolutionary analyses.
RF-Net 2 is available at https://github.com/flu-crew/rf-net-2.
Supplementary data are available at Bioinformatics online.
系统发育网络是一种强大的模型,用于表示具有分歧(物种形成)和趋同(例如杂交、重配、重组)进化的复杂进化历史。推断杂交网络的标准方法是:(i)重建有根基因树;(ii)利用基因树不一致性进行网络推断。最近,我们引入了一种名为RF-Net的方法,用于在存在系统发育树中常见错误的情况下,从输入基因树中准确推断病毒重配和杂交网络。虽然RF-Net展示了从错误的输入基因树中准确推断出具有多达四个网状结构的网络的能力,但其应用受到在合理时间内可处理的网状结构数量的限制。这种限制在推断诸如甲型流感病毒(IAV)等分段RNA病毒的进化历史时尤为严格,其中重配是塑造这些病原体进化的主要机制之一。
在此,我们扩展了RF-Net的功能,使其在实际应用中更具适用性。关键的是,我们引入了RF-Net的一个快速扩展版本,称为Fast-RF-Net,它可以处理大量的网状结构而不牺牲准确性。此外,我们开发了自动停止标准,以启发式地选择合适的网状结构数量,并为RF-Net实现了一个输出错误校正输入基因树的功能。然后,我们对原始方法及其新扩展进行了全面研究,并通过广泛的模拟和IAV进化的实证分析,在实践中证实了它们的有效性。
RF-Net 2可在https://github.com/flu-crew/rf-net-2获取。
补充数据可在《生物信息学》在线获取。