Suppr超能文献

评估多物种网络合并模型对多基因座数据的拟合度。

Assessing the fit of the multi-species network coalescent to multi-locus data.

机构信息

Department of Statistics, University of Wisconsin - Madison, Madison, WI 53706, USA.

Department of Botany, University of Wisconsin - Madison, Madison, WI 53706, USA.

出版信息

Bioinformatics. 2021 May 5;37(5):634-641. doi: 10.1093/bioinformatics/btaa863.

Abstract

MOTIVATION

With growing genome-wide molecular datasets from next-generation sequencing, phylogenetic networks can be estimated using a variety of approaches. These phylogenetic networks include events like hybridization, gene flow or horizontal gene transfer explicitly. However, the most accurate network inference methods are computationally heavy. Methods that scale to larger datasets do not calculate a full likelihood, such that traditional likelihood-based tools for model selection are not applicable to decide how many past hybridization events best fit the data. We propose here a goodness-of-fit test to quantify the fit between data observed from genome-wide multi-locus data, and patterns expected under the multi-species coalescent model on a candidate phylogenetic network.

RESULTS

We identified weaknesses in the previously proposed TICR test, and proposed corrections. The performance of our new test was validated by simulations on real-world phylogenetic networks. Our test provides one of the first rigorous tools for model selection, to select the adequate network complexity for the data at hand. The test can also work for identifying poorly inferred areas on a network.

AVAILABILITY AND IMPLEMENTATION

Software for the goodness-of-fit test is available as a Julia package at https://github.com/cecileane/QuartetNetworkGoodnessFit.jl.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

随着下一代测序产生的越来越多的全基因组分子数据集,可以使用各种方法来估计系统发育网络。这些系统发育网络明确地包括杂交、基因流或水平基因转移等事件。然而,最准确的网络推断方法计算量很大。可扩展到更大数据集的方法并没有计算完整的似然度,因此传统的基于似然度的模型选择工具不适用于确定过去的杂交事件中有多少最适合数据。我们在这里提出了一种拟合优度检验,以量化全基因组多基因座数据中观察到的数据与候选系统发育网络上多物种合并模型下预期模式之间的吻合程度。

结果

我们发现了之前提出的 TICR 检验的弱点,并提出了修正方法。我们的新检验的性能通过对真实系统发育网络的模拟进行了验证。我们的检验提供了第一个严格的模型选择工具之一,用于为手头的数据选择适当的网络复杂度。该检验还可以用于识别网络上推断不佳的区域。

可用性和实现

拟合优度检验的软件作为 Julia 包可在 https://github.com/cecileane/QuartetNetworkGoodnessFit.jl 获得。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验