Suppr超能文献

评估单拷贝基因在恢复稳健系统发育中的性能。

Assessing the performance of single-copy genes for recovering robust phylogenies.

机构信息

Laboratoire Ecologie, Systématique et Evolution, Université Paris-Sud, Orsay, UMR8079, Orsay, Cedex, France.

出版信息

Syst Biol. 2008 Aug;57(4):613-27. doi: 10.1080/10635150802306527.

Abstract

Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available.

摘要

涉及非模式物种的系统发育是基于少数几个基因,这些基因大多是根据历史或实际标准选择的。由于基因树有时与物种树不一致,因此得到的系统发育可能无法准确反映物种之间的进化关系。现在,基因组序列的可用性增加,提供了大量可用于构建系统发育的基因。然而,由于实际原因,只能对广泛的物种进行少数几个基因的测序。在这里,我们想知道是否可以从大多数真菌基因组中共同存在的单拷贝基因中识别出少数几个基因,这些基因足以恢复准确且支持良好的系统发育。真菌是系统发育基因组学的一个模型组,因为有许多完整的真菌基因组可供使用。我们开发了一种自动程序,使用马尔可夫聚类算法(Tribe-MCL)从完整的真菌基因组中提取单拷贝直系同源基因。使用 21 个具有可靠蛋白质预测的完整、公开可用的真菌基因组,鉴定出 246 个单拷贝直系同源基因簇。我们使用单个直系同源序列推断最大似然树,并从串联蛋白质比对构建参考树。使用三种不同的方法比较单个基因树的拓扑结构与参考树的拓扑结构。个别基因在恢复参考树方面的性能差异很大。基因大小和可变位点数量高度相关,对基因的性能有显著影响,但平均替代率没有影响。两个基因恢复了与参考树完全相同的拓扑结构,当串联时提供了高的自举值。通常用于真菌系统发育的基因表现不佳,这表明目前基于这些基因的真菌系统发育可能无法准确反映物种之间的进化关系。对物种子集的分析表明,系统发育性能似乎不强烈依赖于样本。我们期望这里确定的性能最佳的基因对于真菌的系统发育研究非常有用,至少在大的分类尺度上是如此。此外,我们将这里为构建稳健系统发育而开发的方法与以前的方法进行了比较,并主张在有更多完整基因组时,可以将我们的方法应用于其他生物群。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验