Suppr超能文献

KD树:系统发育树分布的非参数估计

kdetrees: Non-parametric estimation of phylogenetic tree distributions.

作者信息

Weyenberg Grady, Huggins Peter M, Schardl Christopher L, Howe Daniel K, Yoshida Ruriko

机构信息

Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA.

出版信息

Bioinformatics. 2014 Aug 15;30(16):2280-7. doi: 10.1093/bioinformatics/btu258. Epub 2014 Apr 24.

Abstract

MOTIVATION

Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics.

RESULTS

We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy.

AVAILABILITY AND IMPLEMENTATION

Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.

摘要

动机

尽管在一个生物进化枝中发现的大多数基因历史预计是由一个共同过程(例如合并过程)产生的,但众所周知,许多其他共存过程(例如水平基因转移、基因复制和随后的新功能化)会导致一些基因呈现出与大多数基因不同的历史。这种“异常”基因树被认为具有生物学意义,识别这些基因已成为系统发育学中的一个重要问题。

结果

我们提出并实现了kdetrees,这是一种用于估计系统发育树分布的非参数方法,目的是识别与样本中其他树有显著差异的树。我们的方法与最近发表的一种类似方法相比具有优势,计算复杂度提高了一个多项式阶(达到所分析树数量的二次方),模拟研究表明对分类准确性的影响很小。将kdetrees应用于一组顶复门基因,识别出了几个之前未被检测到的不可靠序列比对,以及一个独立报道的可能是水平基因转移的基因。我们还分析了一组与禾本科植物共生的真菌Epichloë基因,成功识别出一个人为构建的旁系同源实例。

可用性和实现方式

我们用于估计树分布和识别异常树的方法以R包kdetrees实现,可从CRAN下载。

相似文献

1
kdetrees: Non-parametric estimation of phylogenetic tree distributions.KD树:系统发育树分布的非参数估计
Bioinformatics. 2014 Aug 15;30(16):2280-7. doi: 10.1093/bioinformatics/btu258. Epub 2014 Apr 24.
2
Normalizing Kernels in the Billera-Holmes-Vogtmann Treespace.规范化比尔勒-霍姆斯-沃格特曼树空间中的核。
IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1359-1365. doi: 10.1109/TCBB.2016.2565475. Epub 2016 May 10.
4
imPhy: Imputing Phylogenetic Trees with Missing Information Using Mathematical Programming.imPhy:使用数学规划推断具有缺失信息的系统发育树。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1222-1230. doi: 10.1109/TCBB.2018.2884459. Epub 2018 Nov 30.
8
Tropical Density Estimation of Phylogenetic Trees.系统发育树的热带密度估计
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1855-1863. doi: 10.1109/TCBB.2024.3420815. Epub 2024 Dec 10.
10
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

引用本文的文献

1
Distances Between Extension Spaces of Phylogenetic Trees.系统发育树扩展空间之间的距离。
IEEE Trans Comput Biol Bioinform. 2025 Mar-Apr;22(2):614-627. doi: 10.1109/TCBBIO.2025.3526422.
3
Tukey's Depth for Object Data.对象数据的图基深度
J Am Stat Assoc. 2023;118(543):1760-1772. doi: 10.1080/01621459.2021.2011298. Epub 2022 Feb 3.

本文引用的文献

1
Calculating SPR distances between trees.计算树之间的SPR距离。
Cladistics. 2008 Aug;24(4):591-597. doi: 10.1111/j.1096-0031.2007.00189.x. Epub 2007 Nov 14.
7
A fast algorithm for computing geodesic distances in tree space.一种用于计算树空间测地距离的快速算法。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):2-13. doi: 10.1109/TCBB.2010.3.
8
DendroPy: a Python library for phylogenetic computing.DendroPy:一个用于系统发育计算的 Python 库。
Bioinformatics. 2010 Jun 15;26(12):1569-71. doi: 10.1093/bioinformatics/btq228. Epub 2010 Apr 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验