Suppr超能文献

利用基因组数据估算跨物种渐渗率,尽管模型不可识别。

Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability.

机构信息

Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK.

出版信息

Mol Biol Evol. 2022 May 3;39(5). doi: 10.1093/molbev/msac083.

Abstract

Full-likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies of unidentifiability have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full-likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has 2k unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between nonsister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo samples to remove label-switching problems and implement them in the bpp program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data.

摘要

带有基因渗入的多物种合并(MSci)模型的全似然实现将基因组中的系统发育波动视为推断物种分歧和基因流历史的主要信息来源,方法是使用多位点序列数据。然而,MSci 模型存在不可识别性问题,即不同的模型或参数对数据做出相同的预测,并且无法通过数据区分。先前对不可识别性的研究集中在基于基因树的启发式方法上,并且没有有效地利用数据中的信息。在这里,我们研究了全似然方法下 MSci 模型的不可识别性。我们描述了双向基因渗入(BDI)模型的不可识别性,该模型假设基因流在两个方向上发生。我们为任意 BDI 模型推导出了简单的规则,这些规则会产生标签转换类型的不可识别性。一般来说,具有 k 个 BDI 事件的 MSci 模型在后验中具有 2k 个不可识别的模式或塔,姐妹物种之间的每个 BDI 事件都会导致模型内参数的不可识别性,而非姐妹物种之间的每个 BDI 事件都会导致模型间的不可识别性。我们开发了用于处理马尔可夫链蒙特卡罗样本的新算法来解决标签转换问题,并将其实现到 bpp 程序中。我们分析了真实和合成数据,以说明 BDI 模型和新算法的实用性。我们讨论了启发式方法的不可识别性,并提供了使用 MSci 模型使用基因组数据推断基因流的指南。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf1f/9087891/4a975616a11f/msac083f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验