Suppr超能文献

缺失数据和数据类型对石珊瑚(刺胞动物门:珊瑚虫纲:石珊瑚目)系统转录组分析的影响。

Effects of missing data and data type on phylotranscriptomic analysis of stony corals (Cnidaria: Anthozoa: Scleractinia).

机构信息

Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore.

Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore; Tropical Marine Science Institute, National University of Singapore, Singapore 119227, Singapore.

出版信息

Mol Phylogenet Evol. 2019 May;134:12-23. doi: 10.1016/j.ympev.2019.01.012. Epub 2019 Jan 22.

Abstract

Across the tree of life, phylogenetic analysis is increasingly being performed using transcriptome data. As a result of heterogeneous gene expression within individual organisms and unequal sequencing depth between samples, coverage of homologous loci in such datasets is typically inhomogeneous. Consequently, missing data are a common feature of phylotranscriptomic inference, but their impact on phylogenetic analysis remains poorly characterised empirically. Considering the complexity of the evolutionary history of stony corals (Cnidaria: Anthozoa: Scleractinia), transcriptome data hold great promise for resolving their phylogeny, particularly if there is a good understanding of missing data and data type (either amino acid or DNA) effects. Here, we reconstructed a broad phylogenetic tree of 39 scleractinian species with 3 corallimorpharians as outgroups, including 15 transcriptomes that were newly sequenced and assembled in this study. Between 63 and 505 loci were used to analyse the scleractinian phylogeny, and we quantified differences in tree topology, tree shape, bootstrap support and effects of conflicting gene trees among datasets of varying completeness for both amino acid and DNA sequences. Even with almost 70% missing data, tree topologies appear to be mostly unaffected, although there are higher incongruence levels in the less complete datasets. Furthermore, DNA trees outperform amino acid trees in bootstrap support and robustness against incongruent loci. Overall, our findings indicate that high levels of missing data can still produce expected tree topologies, but identifying and omitting incongruent loci can lead to more consistent branch length estimates.

摘要

在整个生命之树上,越来越多地使用转录组数据进行系统发育分析。由于个体生物体内基因表达的异质性和样本之间测序深度的不平等,此类数据集中文同源基因座的覆盖度通常是不均匀的。因此,缺失数据是系统发育转录组推断的一个常见特征,但它们对系统发育分析的影响在实践中仍未得到很好的描述。考虑到石珊瑚(刺胞动物门:珊瑚纲:石珊瑚目)进化历史的复杂性,转录组数据对于解析其系统发育具有很大的潜力,特别是如果对缺失数据和数据类型(氨基酸或 DNA)的影响有很好的理解的话。在这里,我们构建了一个包含 39 种石珊瑚物种和 3 种珊瑚虫作为外群的广泛的系统发育树,其中包括 15 个在本研究中全新测序和组装的转录组。分析石珊瑚系统发育的基因座数量在 63 到 505 个之间,我们量化了不同完整性的氨基酸和 DNA 数据集之间的树拓扑结构、树形状、自举支持和冲突基因树的差异。即使缺失了近 70%的数据,树拓扑结构似乎也基本不受影响,尽管在不完整的数据集中文本之间的不匹配程度更高。此外,DNA 树在自举支持和对冲突基因座的稳健性方面优于氨基酸树。总的来说,我们的研究结果表明,高水平的缺失数据仍然可以产生预期的树拓扑结构,但识别和排除不匹配的基因座可以导致更一致的分支长度估计。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验