• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

KD树:系统发育树分布的非参数估计

kdetrees: Non-parametric estimation of phylogenetic tree distributions.

作者信息

Weyenberg Grady, Huggins Peter M, Schardl Christopher L, Howe Daniel K, Yoshida Ruriko

机构信息

Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA.

出版信息

Bioinformatics. 2014 Aug 15;30(16):2280-7. doi: 10.1093/bioinformatics/btu258. Epub 2014 Apr 24.

DOI:10.1093/bioinformatics/btu258
PMID:24764459
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4176058/
Abstract

MOTIVATION

Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics.

RESULTS

We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy.

AVAILABILITY AND IMPLEMENTATION

Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.

摘要

动机

尽管在一个生物进化枝中发现的大多数基因历史预计是由一个共同过程(例如合并过程)产生的,但众所周知,许多其他共存过程(例如水平基因转移、基因复制和随后的新功能化)会导致一些基因呈现出与大多数基因不同的历史。这种“异常”基因树被认为具有生物学意义,识别这些基因已成为系统发育学中的一个重要问题。

结果

我们提出并实现了kdetrees,这是一种用于估计系统发育树分布的非参数方法,目的是识别与样本中其他树有显著差异的树。我们的方法与最近发表的一种类似方法相比具有优势,计算复杂度提高了一个多项式阶(达到所分析树数量的二次方),模拟研究表明对分类准确性的影响很小。将kdetrees应用于一组顶复门基因,识别出了几个之前未被检测到的不可靠序列比对,以及一个独立报道的可能是水平基因转移的基因。我们还分析了一组与禾本科植物共生的真菌Epichloë基因,成功识别出一个人为构建的旁系同源实例。

可用性和实现方式

我们用于估计树分布和识别异常树的方法以R包kdetrees实现,可从CRAN下载。

相似文献

1
kdetrees: Non-parametric estimation of phylogenetic tree distributions.KD树:系统发育树分布的非参数估计
Bioinformatics. 2014 Aug 15;30(16):2280-7. doi: 10.1093/bioinformatics/btu258. Epub 2014 Apr 24.
2
Normalizing Kernels in the Billera-Holmes-Vogtmann Treespace.规范化比尔勒-霍姆斯-沃格特曼树空间中的核。
IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1359-1365. doi: 10.1109/TCBB.2016.2565475. Epub 2016 May 10.
3
Invariant transformers of Robinson and Foulds distance matrices for Convolutional Neural Network.不变的 Robinson 和 Foulds 距离矩阵变换用于卷积神经网络。
J Bioinform Comput Biol. 2022 Aug;20(4):2250012. doi: 10.1142/S0219720022500123. Epub 2022 Jul 6.
4
imPhy: Imputing Phylogenetic Trees with Missing Information Using Mathematical Programming.imPhy:使用数学规划推断具有缺失信息的系统发育树。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1222-1230. doi: 10.1109/TCBB.2018.2884459. Epub 2018 Nov 30.
5
A support vector machine based test for incongruence between sets of trees in tree space.基于支持向量机的树空间中树集之间不一致性的检验。
BMC Bioinformatics. 2012 Aug 21;13:210. doi: 10.1186/1471-2105-13-210.
6
Estimating optimal species trees from incomplete gene trees under deep coalescence.在深度溯祖情况下从不完整基因树估计最优物种树。
J Comput Biol. 2012 Jun;19(6):591-605. doi: 10.1089/cmb.2012.0037.
7
Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of apicomplexa.核苷酸序列比对对系统发育估计的影响:以顶复门18S rDNA为例的研究
Mol Biol Evol. 1997 Apr;14(4):428-41. doi: 10.1093/oxfordjournals.molbev.a025779.
8
Tropical Density Estimation of Phylogenetic Trees.系统发育树的热带密度估计
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1855-1863. doi: 10.1109/TCBB.2024.3420815. Epub 2024 Dec 10.
9
Genomic Data Quality Impacts Automated Detection of Lateral Gene Transfer in Fungi.基因组数据质量影响真菌中横向基因转移的自动检测。
G3 (Bethesda). 2017 Apr 3;7(4):1301-1314. doi: 10.1534/g3.116.038448.
10
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

引用本文的文献

1
Distances Between Extension Spaces of Phylogenetic Trees.系统发育树扩展空间之间的距离。
IEEE Trans Comput Biol Bioinform. 2025 Mar-Apr;22(2):614-627. doi: 10.1109/TCBBIO.2025.3526422.
2
Pangenome-Wide Association Study in the Family Reveals Key Evolutionary Aspects of Their Relationship with Their Hosts.家族中的泛基因组关联研究揭示了它们与宿主关系的关键进化方面。
Int J Mol Sci. 2024 Nov 26;25(23):12671. doi: 10.3390/ijms252312671.
3
Tukey's Depth for Object Data.对象数据的图基深度
J Am Stat Assoc. 2023;118(543):1760-1772. doi: 10.1080/01621459.2021.2011298. Epub 2022 Feb 3.
4
The relationship between transposable elements and ecological niches in the Greater Cape Floristic Region: A study on the genus (Asteraceae).开普植物区系大区中转座元件与生态位的关系:关于**属**(菊科)的一项研究。 注:原文中“the genus ”这里“**属**”部分原文缺失具体属名。
Front Plant Sci. 2022 Sep 29;13:982852. doi: 10.3389/fpls.2022.982852. eCollection 2022.
5
Comparative transcriptomics of ice-crawlers demonstrates cold specialization constrains niche evolution in a relict lineage.冰行虫的比较转录组学表明,寒冷特化限制了一个残遗谱系中的生态位演化。
Evol Appl. 2020 Sep 11;14(2):360-382. doi: 10.1111/eva.13120. eCollection 2021 Feb.
6
Riding the wave of genomics to investigate aquatic coliphage diversity and activity.乘基因组学之浪,探究水栖噬菌体多样性与活性。
Environ Microbiol. 2019 Jun;21(6):2112-2128. doi: 10.1111/1462-2920.14590. Epub 2019 Apr 4.
7
CURatio: Genome-wide phylogenomic analysis method using ratios of total branch lengths.CURatio:使用总分支长度比率的全基因组系统发育分析方法。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Oct 30. doi: 10.1109/TCBB.2018.2878564.
8
The Genomic Basis of Intrinsic and Acquired Antibiotic Resistance in the Genus .属内固有和获得性抗生素耐药性的基因组基础 。 需注意,原文中“in the Genus.”表述不完整,可能会影响对准确含义的理解。
Front Microbiol. 2018 May 11;9:828. doi: 10.3389/fmicb.2018.00828. eCollection 2018.
9
GET_PHYLOMARKERS, a Software Package to Select Optimal Orthologous Clusters for Phylogenomics and Inferring Pan-Genome Phylogenies, Used for a Critical Geno-Taxonomic Revision of the Genus .GET_PHYLOMARKERS,一个用于为系统发育基因组学选择最佳直系同源簇并推断泛基因组系统发育的软件包,用于该属的关键基因分类修订。
Front Microbiol. 2018 May 1;9:771. doi: 10.3389/fmicb.2018.00771. eCollection 2018.
10
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.主成分分析与系统发育树空间中弗雷歇均值的轨迹
Biometrika. 2017 Dec;104(4):901-922. doi: 10.1093/biomet/asx047. Epub 2017 Sep 27.

本文引用的文献

1
Calculating SPR distances between trees.计算树之间的SPR距离。
Cladistics. 2008 Aug;24(4):591-597. doi: 10.1111/j.1096-0031.2007.00189.x. Epub 2007 Nov 14.
2
Plant-symbiotic fungi as chemical engineers: multi-genome analysis of the clavicipitaceae reveals dynamics of alkaloid loci.植物共生真菌作为化学工程师:棒束孢科的多基因组分析揭示了生物碱基因座的动态。
PLoS Genet. 2013;9(2):e1003323. doi: 10.1371/journal.pgen.1003323. Epub 2013 Feb 28.
3
Horizontal gene transfer of epigenetic machinery and evolution of parasitism in the malaria parasite Plasmodium falciparum and other apicomplexans.水平基因转移的表观遗传机制和寄生虫在疟原虫恶性疟原虫和其他顶复门寄生虫的进化。
BMC Evol Biol. 2013 Feb 11;13:37. doi: 10.1186/1471-2148-13-37.
4
A support vector machine based test for incongruence between sets of trees in tree space.基于支持向量机的树空间中树集之间不一致性的检验。
BMC Bioinformatics. 2012 Aug 21;13:210. doi: 10.1186/1471-2105-13-210.
5
Improvements to a class of distance matrix methods for inferring species trees from gene trees.从基因树推断物种树的一类距离矩阵方法的改进。
J Comput Biol. 2012 Jun;19(6):632-49. doi: 10.1089/cmb.2012.0042.
6
Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis.Phylo-MCOA:一种利用多重共惰性分析快速高效检测系统发育基因组学中外源基因和物种的方法。
Mol Biol Evol. 2012 Jun;29(6):1587-98. doi: 10.1093/molbev/msr317. Epub 2012 Jan 3.
7
A fast algorithm for computing geodesic distances in tree space.一种用于计算树空间测地距离的快速算法。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):2-13. doi: 10.1109/TCBB.2010.3.
8
DendroPy: a Python library for phylogenetic computing.DendroPy:一个用于系统发育计算的 Python 库。
Bioinformatics. 2010 Jun 15;26(12):1569-71. doi: 10.1093/bioinformatics/btq228. Epub 2010 Apr 25.
9
SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.SCPS:一种快速实现的基于谱方法的全基因组蛋白质家族检测。
BMC Bioinformatics. 2010 Mar 9;11:120. doi: 10.1186/1471-2105-11-120.
10
Incomplete lineage sorting: consistent phylogeny estimation from multiple loci.不完全谱系分选:从多个基因座推断一致的系统发育。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):166-71. doi: 10.1109/TCBB.2008.66.