Suppr超能文献

利用物种间基因覆盖不完整的分子数据构建大型时间树的前景。

Prospects for building large timetrees using molecular data with incomplete gene coverage among species.

作者信息

Filipski Alan, Murillo Oscar, Freydenzon Anna, Tamura Koichiro, Kumar Sudhir

机构信息

Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University.

Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State UniversitySchool of Life Sciences, Arizona State University.

出版信息

Mol Biol Evol. 2014 Sep;31(9):2542-50. doi: 10.1093/molbev/msu200. Epub 2014 Jun 27.

Abstract

Scientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness of the species-gene matrix on the accuracy of divergence time estimates. Here, we present results from computer simulations and empirical data analyses to quantify the impact of missing gene data on divergence time estimation in large phylogenies. We found that estimates of divergence times were robust even when sequences from a majority of genes for most of the species were absent. From the analysis of such extremely sparse data sets, we found that the most egregious errors occurred for nodes in the tree that had no common genes for any pair of species in the immediate descendant clades of the node in question. These problematic nodes can be easily detected prior to computational analyses based only on the input sequence alignment and the tree topology. We conclude that it is best to use larger alignments, because adding both genes and species to the alignment augments the number of genes available for estimating divergence events deep in the tree and improves their time estimates.

摘要

科学家们正在收集越来越多物种和基因的序列数据集,以构建全面的时间树。然而,某些物种和基因组合的数据往往无法获取,而且对于包含许多基因和物种的数据集来说,缺失数据的比例通常很大。令人惊讶的是,尚未对物种 - 基因矩阵的稀疏程度对分歧时间估计准确性的影响进行系统分析。在此,我们展示了计算机模拟和实证数据分析的结果,以量化缺失基因数据对大型系统发育中分歧时间估计的影响。我们发现,即使大多数物种的大多数基因序列缺失,分歧时间的估计仍然稳健。通过对如此极端稀疏的数据集进行分析,我们发现,对于所讨论节点的直接后代分支中任何一对物种都没有共同基因的树节点,会出现最严重的错误。仅基于输入序列比对和树拓扑结构,这些有问题的节点在计算分析之前就可以很容易地被检测到。我们得出结论,最好使用更大的比对,因为在比对中同时添加基因和物种会增加可用于估计树中深处分歧事件的基因数量,并改善对它们的时间估计。

相似文献

6
A new method for inferring timetrees from temporally sampled molecular sequences.一种从时间采样的分子序列推断时间树的新方法。
PLoS Comput Biol. 2020 Jan 17;16(1):e1007046. doi: 10.1371/journal.pcbi.1007046. eCollection 2020 Jan.

引用本文的文献

本文引用的文献

1
MEGA6: Molecular Evolutionary Genetics Analysis version 6.0.MEGA6:分子进化遗传学分析版本 6.0。
Mol Biol Evol. 2013 Dec;30(12):2725-9. doi: 10.1093/molbev/mst197. Epub 2013 Oct 16.
2
Estimating divergence times in large molecular phylogenies.估计大型分子系统发育中的分歧时间。
Proc Natl Acad Sci U S A. 2012 Nov 20;109(47):19333-8. doi: 10.1073/pnas.1213199109. Epub 2012 Nov 5.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验