Suppr超能文献

可视化系统发育树景观。

Visualizing phylogenetic tree landscapes.

作者信息

Wilgenbusch James C, Huang Wen, Gallivan Kyle A

机构信息

Department of Scientific Computing, Florida State University, Tallahassee, FL, 32306, USA.

Present Address: Minnesota Supercomputing Center, University of Minnesota, Minneapolis, 55455, USA.

出版信息

BMC Bioinformatics. 2017 Feb 2;18(1):85. doi: 10.1186/s12859-017-1479-1.

Abstract

BACKGROUND

Genomic-scale sequence alignments are increasingly used to infer phylogenies in order to better understand the processes and patterns of evolution. Different partitions within these new alignments (e.g., genes, codon positions, and structural features) often favor hundreds if not thousands of competing phylogenies. Summarizing and comparing phylogenies obtained from multi-source data sets using current consensus tree methods discards valuable information and can disguise potential methodological problems. Discovery of efficient and accurate dimensionality reduction methods used to display at once in 2- or 3- dimensions the relationship among these competing phylogenies will help practitioners diagnose the limits of current evolutionary models and potential problems with phylogenetic reconstruction methods when analyzing large multi-source data sets. We introduce several dimensionality reduction methods to visualize in 2- and 3-dimensions the relationship among competing phylogenies obtained from gene partitions found in three mid- to large-size mitochondrial genome alignments. We test the performance of these dimensionality reduction methods by applying several goodness-of-fit measures. The intrinsic dimensionality of each data set is also estimated to determine whether projections in 2- and 3-dimensions can be expected to reveal meaningful relationships among trees from different data partitions. Several new approaches to aid in the comparison of different phylogenetic landscapes are presented.

RESULTS

Curvilinear Components Analysis (CCA) and a stochastic gradient decent (SGD) optimization method give the best representation of the original tree-to-tree distance matrix for each of the three- mitochondrial genome alignments and greatly outperformed the method currently used to visualize tree landscapes. The CCA + SGD method converged at least as fast as previously applied methods for visualizing tree landscapes. We demonstrate for all three mtDNA alignments that 3D projections significantly increase the fit between the tree-to-tree distances and can facilitate the interpretation of the relationship among phylogenetic trees.

CONCLUSIONS

We demonstrate that the choice of dimensionality reduction method can significantly influence the spatial relationship among a large set of competing phylogenetic trees. We highlight the importance of selecting a dimensionality reduction method to visualize large multi-locus phylogenetic landscapes and demonstrate that 3D projections of mitochondrial tree landscapes better capture the relationship among the trees being compared.

摘要

背景

基因组规模的序列比对越来越多地用于推断系统发育关系,以便更好地理解进化过程和模式。这些新比对中的不同分区(例如基因、密码子位置和结构特征)通常支持成百甚至上千种相互竞争的系统发育关系。使用当前的一致树方法总结和比较从多源数据集中获得的系统发育关系会丢弃有价值的信息,并可能掩盖潜在的方法学问题。发现用于在二维或三维中一次性展示这些相互竞争的系统发育关系之间关系的高效且准确的降维方法,将有助于从业者在分析大型多源数据集时诊断当前进化模型的局限性以及系统发育重建方法的潜在问题。我们引入了几种降维方法,以二维和三维可视化从三个中大型线粒体基因组比对中发现的基因分区所获得的相互竞争的系统发育关系之间的关系。我们通过应用几种拟合优度度量来测试这些降维方法的性能。还估计了每个数据集的内在维度,以确定二维和三维投影是否有望揭示来自不同数据分区的树之间有意义的关系。提出了几种有助于比较不同系统发育景观的新方法。

结果

曲线成分分析(CCA)和随机梯度下降(SGD)优化方法对三个线粒体基因组比对中的每一个都能最好地表示原始树间距离矩阵,并且大大优于当前用于可视化树景观的方法。CCA + SGD方法收敛速度至少与先前应用的可视化树景观的方法一样快。我们针对所有三个线粒体DNA比对证明,三维投影显著提高了树间距离之间的拟合度,并有助于解释系统发育树之间的关系。

结论

我们证明降维方法的选择会显著影响大量相互竞争的系统发育树之间的空间关系。我们强调选择降维方法来可视化大型多位点系统发育景观的重要性,并证明线粒体树景观的三维投影能更好地捕捉被比较树之间的关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea8f/5290614/732b40126b07/12859_2017_1479_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验