Suppr超能文献

单字符插入-缺失模型在祖先序列重建中保留长插入缺失。

Single-character insertion-deletion model preserves long indels in ancestral sequence reconstruction.

作者信息

Jowkar Gholamhossein, Pečerska Jūlija, Gil Manuel, Anisimova Maria

机构信息

Institute of Biology, University of Neuchâtel, Rue Emile-Argand 11, 2000, Neuchâtel, Neuchâtel, Switzerland.

Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Amphipôle, 1015, Lausanne, Vaud, Switzerland.

出版信息

BMC Bioinformatics. 2024 Dec 2;25(1):370. doi: 10.1186/s12859-024-05986-1.

Abstract

Insertions and deletions (indels) play a significant role in genome evolution across species. Realistic modelling of indel evolution is challenging and is still an open research question. Several attempts have been made to explicitly model multi-character (long) indels, such as TKF92, by relaxing the site independence assumption and introducing fragments. However, these methods are computationally expensive. On the other hand, the Poisson Indel Process (PIP) assumes site independence but allows one to infer single-character indels on the phylogenetic tree, distinguishing insertions from deletions. PIP's marginal likelihood computation has linear time complexity, enabling ancestral sequence reconstruction (ASR) with indels in linear time. Recently, we developed ARPIP, an ASR method using PIP, capable of inferring indel events with explicit evolutionary interpretations. Here, we investigate the effect of the single-character indel assumption on reconstructed ancestral sequences on mammalian protein orthologs and on simulated data. We show that ARPIP's ancestral estimates preserve the gap length distribution observed in the input alignment. In mammalian proteins the lengths of inserted segments appear to be substantially longer compared to deleted segments. Further, we confirm the well-established deletion bias observed in real data. To date, ARPIP is the only ancestral reconstruction method that explicitly models insertion and deletion events over time. Given a good quality input alignment, it can capture ancestral long indel events on the phylogeny.

摘要

插入和缺失(indels)在跨物种的基因组进化中起着重要作用。对indel进化进行现实建模具有挑战性,仍是一个开放的研究问题。已经有几次尝试通过放宽位点独立性假设并引入片段来明确地对多字符(长)indels进行建模,例如TKF92。然而,这些方法计算成本很高。另一方面,泊松插入缺失过程(PIP)假设位点独立性,但允许在系统发育树上推断单字符indels,区分插入和缺失。PIP的边际似然计算具有线性时间复杂度,能够在线性时间内进行带indels的祖先序列重建(ASR)。最近,我们开发了ARPIP,一种使用PIP的ASR方法,能够推断具有明确进化解释的indel事件。在这里,我们研究单字符indel假设对哺乳动物蛋白质直系同源物和模拟数据上重建的祖先序列的影响。我们表明,ARPIP的祖先估计保留了输入比对中观察到的间隙长度分布。在哺乳动物蛋白质中,插入片段的长度似乎比缺失片段长得多。此外,我们证实了在实际数据中观察到的既定的缺失偏差。迄今为止,ARPIP是唯一一种能随时间明确建模插入和缺失事件的祖先重建方法。给定高质量的输入比对,它可以在系统发育树上捕捉祖先长indel事件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06b6/11610121/7dbb25915c93/12859_2024_5986_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验