Moyers Bryan A, Zhang Jianzhi
HudsonAlpha Institute for Biotechnology, Huntsville, Alabama.
Department of Ecology and Evolutionary Biology, University of Michigan.
Genome Biol Evol. 2017 Jun 1;9(6):1519-1527. doi: 10.1093/gbe/evx109.
Phylostratigraphy, originally designed for gene age estimation by BLAST-based protein homology searches of sequenced genomes, has been widely used for studying patterns and inferring mechanisms of gene origination and evolution. We previously showed by computer simulation that phylostratigraphy underestimates gene age for a nonnegligible fraction of genes and that the underestimation is severer for genes with certain properties such as fast evolution and short protein sequences. Consequently, many previously reported age distributions of gene properties may have been methodological artifacts rather than biological realities. Domazet-Lošo and colleagues recently argued that our simulations were flawed and that phylostratigraphic bias does not impact inferences about gene emergence and evolution. Here we discuss conceptual difficulties of phylostratigraphy, identify numerous problems in Domazet-Lošo et al.'s argument, reconfirm phylostratigraphic error using simulations suggested by Domazet-Lošo and colleagues, and demonstrate that a phylostratigraphic trend claimed to be robust to error disappears when genes likely to be error-resistant are analyzed. We conclude that extreme caution is needed in interpreting phylostratigraphic results because of the inherent biases of the method and that reanalysis using genes exhibiting no error in realistic simulations may help reduce spurious findings.
系统发育地层学最初是通过对已测序基因组进行基于BLAST的蛋白质同源性搜索来估计基因年龄的,现已广泛用于研究基因起源和进化的模式并推断其机制。我们之前通过计算机模拟表明,系统发育地层学对相当一部分基因低估了基因年龄,而且对于具有某些特性(如快速进化和短蛋白质序列)的基因,这种低估更为严重。因此,许多先前报道的基因特性年龄分布可能是方法学假象而非生物学现实。多马泽特 - 洛索及其同事最近认为我们的模拟存在缺陷,且系统发育地层学偏差不会影响对基因出现和进化的推断。在此,我们讨论系统发育地层学的概念难点,指出多马泽特 - 洛索等人论点中的诸多问题,利用多马泽特 - 洛索及其同事建议的模拟重新确认系统发育地层学误差,并证明当分析可能抗误差的基因时,一种声称对误差具有稳健性的系统发育地层学趋势就会消失。我们得出结论,由于该方法存在固有偏差,在解释系统发育地层学结果时需要极其谨慎,并且使用在现实模拟中无误差的基因进行重新分析可能有助于减少虚假发现。