Suppr超能文献

片段基因序列对基因树和种系发生树的重建有负面影响。

Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction.

机构信息

Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA.

Department of Entomology, University of Illinois, Urbana, IL.

出版信息

Mol Biol Evol. 2017 Dec 1;34(12):3279-3291. doi: 10.1093/molbev/msx261.

Abstract

Species tree reconstruction from genome-wide data is increasingly being attempted, in most cases using a two-step approach of first estimating individual gene trees and then summarizing them to obtain a species tree. The accuracy of this approach, which promises to account for gene tree discordance, depends on the quality of the inferred gene trees. At the same time, phylogenomic and phylotranscriptomic analyses typically use involved bioinformatics pipelines for data preparation. Errors and shortcomings resulting from these preprocessing steps may impact the species tree analyses at the other end of the pipeline. In this article, we first show that the presence of fragmentary data for some species in a gene alignment, as often seen on real data, can result in substantial deterioration of gene trees, and as a result, the species tree. We then investigate a simple filtering strategy where individual fragmentary sequences are removed from individual genes but the rest of the gene is retained. Both in simulations and by reanalyzing a large insect phylotranscriptomic data set, we show the effectiveness of this simple filtering strategy.

摘要

从全基因组数据中重建物种树的尝试越来越多,在大多数情况下,采用两步法,首先估计单个基因树,然后对它们进行总结以获得物种树。这种方法的准确性有望解释基因树的分歧,取决于推断出的基因树的质量。与此同时,系统基因组学和系统转录组学分析通常使用复杂的生物信息学管道进行数据准备。这些预处理步骤产生的错误和缺陷可能会影响管道另一端的物种树分析。在本文中,我们首先表明,基因比对中某些物种的片段数据的存在(这种情况在实际数据中经常出现)会导致基因树的严重恶化,进而导致物种树的恶化。然后,我们研究了一种简单的过滤策略,其中从单个基因中删除单个片段序列,但保留其余基因。无论是在模拟还是重新分析大型昆虫系统转录组数据集时,我们都展示了这种简单过滤策略的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验