Suppr超能文献

系统发育地层学偏差会产生虚假的基因组进化模式。

Phylostratigraphic bias creates spurious patterns of genome evolution.

作者信息

Moyers Bryan A, Zhang Jianzhi

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor.

Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor

出版信息

Mol Biol Evol. 2015 Jan;32(1):258-67. doi: 10.1093/molbev/msu286. Epub 2014 Oct 13.

Abstract

Phylostratigraphy is a method for dating the evolutionary emergence of a gene or gene family by identifying its homologs across the tree of life, typically by using BLAST searches. Applying this method to all genes in a species, or genomic phylostratigraphy, allows investigation of genome-wide patterns in new gene origination at different evolutionary times and thus has been extensively used. However, gene age estimation depends on the challenging task of detecting distant homologs via sequence similarity, which is expected to have differential accuracies for different genes. Here, we evaluate the accuracy of phylostratigraphy by realistic computer simulation with parameters estimated from genomic data, and investigate the impact of its error on findings of genome evolution. We show that 1) phylostratigraphy substantially underestimates gene age for a considerable fraction of genes, 2) the error is especially serious when the protein evolves rapidly, is short, and/or its most conserved block of sites is small, and 3) these errors create spurious nonuniform distributions of various gene properties among age groups, many of which cannot be predicted a priori. Given the high likelihood that conclusions about gene age are faulty, we advocate the use of realistic simulation to determine if observations from phylostratigraphy are explainable, at least qualitatively, by a null model of biased measurement, and in all cases, critical evaluation of results.

摘要

系统发育年代学是一种通过在生命之树中识别基因或基因家族的同源物来确定其进化起源时间的方法,通常使用BLAST搜索。将这种方法应用于一个物种的所有基因,即基因组系统发育年代学,能够研究不同进化时期全基因组范围内新基因起源的模式,因此得到了广泛应用。然而,基因年龄估计依赖于通过序列相似性检测远缘同源物这项具有挑战性的任务,而这对于不同基因的准确性可能存在差异。在这里,我们通过基于从基因组数据估计的参数进行逼真的计算机模拟来评估系统发育年代学的准确性,并研究其误差对基因组进化研究结果的影响。我们发现:1)对于相当一部分基因,系统发育年代学大幅低估了基因年龄;2)当蛋白质进化迅速、长度较短和/或其最保守的位点块较小时,误差尤为严重;3)这些误差在不同年龄组之间造成了各种基因特性的虚假非均匀分布,其中许多无法事先预测。鉴于关于基因年龄的结论很可能有误,我们主张使用逼真的模拟来确定系统发育年代学的观察结果是否至少在定性上可以由有偏测量的零模型来解释,并且在所有情况下都要对结果进行批判性评估。

相似文献

引用本文的文献

2
A compendium of human gene functions derived from evolutionary modelling.基于进化建模得出的人类基因功能概要。
Nature. 2025 Apr;640(8057):146-154. doi: 10.1038/s41586-025-08592-0. Epub 2025 Feb 26.
4
Orphan genes are not a distinct biological entity.孤儿基因并非一个独特的生物学实体。
Bioessays. 2025 Jan;47(1):e2400146. doi: 10.1002/bies.202400146. Epub 2024 Nov 3.

本文引用的文献

1
The life cycle of Drosophila orphan genes.果蝇孤儿基因的生命周期。
Elife. 2014 Feb 19;3:e01311. doi: 10.7554/eLife.01311.
2
FlyBase 102--advanced approaches to interrogating FlyBase.FlyBase 102--高级方法探索 FlyBase。
Nucleic Acids Res. 2014 Jan;42(Database issue):D780-8. doi: 10.1093/nar/gkt1092. Epub 2013 Nov 13.
6
Epistasis as the primary factor in molecular evolution.上位性是分子进化的主要因素。
Nature. 2012 Oct 25;490(7421):535-8. doi: 10.1038/nature11510. Epub 2012 Oct 14.
7
Proto-genes and de novo gene birth.原基因和从头基因的诞生。
Nature. 2012 Jul 19;487(7407):370-4. doi: 10.1038/nature11184.
10
HMMER web server: interactive sequence similarity searching.HMMER 网页服务器:交互式序列相似性搜索。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验