Suppr超能文献

为减少系统发育年代错误和偏差而努力。

Toward Reducing Phylostratigraphic Errors and Biases.

机构信息

HudsonAlpha Institute for Biotechnology, Huntsville, Alabama.

Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan.

出版信息

Genome Biol Evol. 2018 Aug 1;10(8):2037-2048. doi: 10.1093/gbe/evy161.

Abstract

Phylostratigraphy is a method for estimating gene age, usually applied to large numbers of genes in order to detect nonrandom age-distributions of gene properties that could shed light on mechanisms of gene origination and evolution. However, phylostratigraphy underestimates gene age with a nonnegligible probability. The underestimation is severer for genes with certain properties, creating spurious age distributions of these properties and those correlated with these properties. Here we explore three strategies to reduce phylostratigraphic error/bias. First, we test several alternative homology detection methods (PSIBLAST, HMMER, PHMMER, OMA, and GLAM2Scan) in phylostratigraphy, but fail to find any that noticeably outperforms the commonly used BLASTP. Second, using machine learning, we look for predictors of error-prone genes to exclude from phylostratigraphy, but cannot identify reliable predictors. Finally, we remove from phylostratigraphic analysis genes exhibiting errors in simulation, which by definition minimizes error/bias if the simulation is sufficiently realistic. Using this last approach, we show that some previously reported phylostratigraphic trends (e.g., younger proteins tend to evolve more rapidly and be shorter) disappear or even reverse, reconfirming the necessity of controlling phylostratigraphic error/bias. Taken together, our analyses demonstrate that phylostratigraphic errors/biases are refractory to several potential solutions but can be controlled at least partially by the exclusion of error-prone genes identified via realistic simulations. These results are expected to stimulate the judicious use of error-aware phylostratigraphy and reevaluation of previous phylostratigraphic findings.

摘要

系统发生地层学是一种估计基因年龄的方法,通常应用于大量基因,以检测基因属性的非随机年龄分布,这些分布可以揭示基因起源和进化的机制。然而,系统发生地层学以不可忽略的概率低估了基因的年龄。对于具有某些属性的基因,这种低估更为严重,从而产生这些属性及其相关属性的虚假年龄分布。在这里,我们探索了三种减少系统发生地层学误差/偏差的策略。首先,我们在系统发生地层学中测试了几种替代同源性检测方法(PSIBLAST、HMMER、PHMMER、OMA 和 GLAM2Scan),但没有发现任何明显优于常用 BLASTP 的方法。其次,我们使用机器学习寻找易错基因的预测因子以从系统发生地层学中排除,但无法识别可靠的预测因子。最后,我们从模拟中删除了系统发生地层学分析中出现错误的基因,如果模拟足够真实,这将最大限度地减少误差/偏差。使用最后一种方法,我们表明,以前报告的一些系统发生地层学趋势(例如,年轻的蛋白质往往进化得更快且更短)消失或甚至逆转,再次证实了控制系统发生地层学误差/偏差的必要性。总之,我们的分析表明,系统发生地层学误差/偏差难以解决,但至少可以通过排除通过真实模拟识别的易错基因来部分控制。这些结果预计将刺激明智地使用具有误差意识的系统发生地层学并重新评估以前的系统发生地层学发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/6105108/08fd021723f1/evy161f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验