• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有共同进化历史的基因聚类

Clustering Genes of Common Evolutionary History.

作者信息

Gori Kevin, Suchan Tomasz, Alvarez Nadir, Goldman Nick, Dessimoz Christophe

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom.

Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland.

出版信息

Mol Biol Evol. 2016 Jun;33(6):1590-605. doi: 10.1093/molbev/msw038. Epub 2016 Feb 17.

DOI:10.1093/molbev/msw038
PMID:26893301
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4868114/
Abstract

Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent-due to events such as incomplete lineage sorting or horizontal gene transfer-it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such "process-agnostic" approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward's method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl).

摘要

系统发育推断使用多个基因座的数据可能会得到一棵更准确的树。然而,如果基因座不一致——由于不完全谱系分选或水平基因转移等事件——推断一棵单一的树可能会产生误导。为了解决这个问题,许多先前的研究采用了机械方法,即对特定过程进行建模。或者,人们可以在不假设这些不一致如何产生的情况下对基因座进行聚类。这种“与过程无关”的方法通常会为每个基因座推断一棵树并对这些树进行聚类。然而,树距离和聚类方法有许多可能的组合;它们在树不一致情况下的比较性能在很大程度上是未知的。此外,由于标准的模型选择标准(如AIC)不能应用于拓扑数量可变的问题,因此对推断最佳聚类数的问题了解甚少。在这里,我们对系统发育距离和聚类方法进行了大规模模拟研究,以推断具有共同进化历史的基因座。我们观察到,表现最佳的组合是考虑分支长度的距离,其次是光谱聚类或沃德方法。我们还引入了两种统计检验来推断最佳聚类数,并表明它们的性能明显优于轮廓系数这一通用启发式方法。我们通过1)识别先前酵母物种系统发育分析中的错误,以及2)识别金莲花蝇属Chiastocheta新测序基因座之间的拓扑不一致,来说明该方法的有用性。我们发布了treeCl,一个用于聚类具有共同进化历史的基因的新程序(http://git.io/treeCl)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/5a78bf27bc92/msw038f9p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/f657a0ac2199/msw038f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/323111fa1ef7/msw038f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/c1b201a6d7c7/msw038f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/3dc4c56f91c2/msw038f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/d7f79855a0dd/msw038f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/e99022512a4f/msw038f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/25fc4c1ec99a/msw038f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/efb61bc738db/msw038f8p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/5a78bf27bc92/msw038f9p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/f657a0ac2199/msw038f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/323111fa1ef7/msw038f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/c1b201a6d7c7/msw038f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/3dc4c56f91c2/msw038f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/d7f79855a0dd/msw038f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/e99022512a4f/msw038f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/25fc4c1ec99a/msw038f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/efb61bc738db/msw038f8p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e848/4868114/5a78bf27bc92/msw038f9p.jpg

相似文献

1
Clustering Genes of Common Evolutionary History.具有共同进化历史的基因聚类
Mol Biol Evol. 2016 Jun;33(6):1590-605. doi: 10.1093/molbev/msw038. Epub 2016 Feb 17.
2
Species tree inference by minimizing deep coalescences.通过最小化深度合并来推断物种树。
PLoS Comput Biol. 2009 Sep;5(9):e1000501. doi: 10.1371/journal.pcbi.1000501. Epub 2009 Sep 11.
3
Comparing species tree estimation with large anchored phylogenomic and small Sanger-sequenced molecular datasets: an empirical study on Malagasy pseudoxyrhophiine snakes.比较大型锚定系统发育基因组学和小型桑格测序分子数据集的物种树估计:马达加斯加伪蝰蛇的实证研究
BMC Evol Biol. 2015 Oct 12;15:221. doi: 10.1186/s12862-015-0503-1.
4
A new fast method for inferring multiple consensus trees using k-medoids.一种利用 k -medoids 快速推断多个一致树的新方法。
BMC Evol Biol. 2018 Apr 5;18(1):48. doi: 10.1186/s12862-018-1163-8.
5
Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting.果蝇中基因树与物种树的广泛不一致:不完全谱系分选的证据。
PLoS Genet. 2006 Oct 27;2(10):e173. doi: 10.1371/journal.pgen.0020173. Epub 2006 Aug 28.
6
Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data.利用全基因组测序数据解析近缘物种的进化关系
Syst Biol. 2015 Nov;64(6):1000-17. doi: 10.1093/sysbio/syv045. Epub 2015 Jul 17.
7
A matter of phylogenetic scale: Distinguishing incomplete lineage sorting from lateral gene transfer as the cause of gene tree discord in recent versus deep diversification histories.一个关于系统发生尺度的问题:区分不完全谱系分选和水平基因转移作为导致近期和深远多样化历史中基因树分歧的原因。
Am J Bot. 2018 Mar;105(3):376-384. doi: 10.1002/ajb2.1064. Epub 2018 Apr 30.
8
Assessing the potential of RAD-sequencing to resolve phylogenetic relationships within species radiations: The fly genus Chiastocheta (Diptera: Anthomyiidae) as a case study.评估RAD测序在解析物种辐射内系统发育关系方面的潜力:以果蝇属Chiastocheta(双翅目:花蝇科)为例进行研究。
Mol Phylogenet Evol. 2017 Sep;114:189-198. doi: 10.1016/j.ympev.2017.06.012. Epub 2017 Jun 21.
9
Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data.基于分子数据的系统发育树估计的准确性。II. 基因频率数据。
J Mol Evol. 1983;19(2):153-70. doi: 10.1007/BF02300753.
10
Parsimonious inference of hybridization in the presence of incomplete lineage sorting.在不完全谱系分选存在的情况下进行简约的杂交推断。
Syst Biol. 2013 Sep;62(5):738-51. doi: 10.1093/sysbio/syt037. Epub 2013 Jun 4.

引用本文的文献

1
Distances Between Extension Spaces of Phylogenetic Trees.系统发育树扩展空间之间的距离。
IEEE Trans Comput Biol Bioinform. 2025 Mar-Apr;22(2):614-627. doi: 10.1109/TCBBIO.2025.3526422.
2
UNSUPERVISED CLUSTERING OF AIRWAY TREE STRUCTURES ON HIGH-RESOLUTION CT: THE MESA LUNG STUDY.高分辨率CT上气道树结构的无监督聚类:梅奥诊所肺研究
Proc IEEE Int Symp Biomed Imaging. 2021 Apr;2021:1568-1572. doi: 10.1109/isbi48211.2021.9434172. Epub 2021 May 25.
3
Analyzing microbial evolution through gene and genome phylogenies.通过基因和基因组系统发生分析微生物进化。

本文引用的文献

1
Asymmetrical nature of the Trollius-Chiastocheta interaction: insights into the evolution of nursery pollination systems.金莲花与 Chiastocheta 相互作用的不对称性质:对虫媒传粉系统进化的见解
Ecol Evol. 2015 Oct 8;5(21):4766-77. doi: 10.1002/ece3.1544. eCollection 2015 Nov.
2
Statistical binning enables an accurate coalescent-based estimation of the avian tree.统计分箱可实现基于合并的鸟类树的精确估计。
Science. 2014 Dec 12;346(6215):1250463. doi: 10.1126/science.1250463. Epub 2014 Dec 11.
3
The phylogenetic likelihood library.
Biostatistics. 2024 Jul 1;25(3):786-800. doi: 10.1093/biostatistics/kxad025.
4
Geodesics to characterize the phylogenetic landscape.测地线刻画系统发育景观。
PLoS One. 2023 Jun 23;18(6):e0287350. doi: 10.1371/journal.pone.0287350. eCollection 2023.
5
Developing a bioinformatics pipeline for comparative protein classification analysis.开发用于比较蛋白质分类分析的生物信息学管道。
BMC Genom Data. 2022 Jun 6;23(1):43. doi: 10.1186/s12863-022-01045-x.
6
Robust Analysis of Phylogenetic Tree Space.系统发育树空间的稳健分析。
Syst Biol. 2022 Aug 10;71(5):1255-1270. doi: 10.1093/sysbio/syab100.
7
A Semi-Automated SNP-Based Approach for Contaminant Identification in Biparental Polyploid Populations of Tropical Forage Grasses.一种基于单核苷酸多态性的半自动方法,用于热带饲草双亲多倍体群体中的污染物鉴定。
Front Plant Sci. 2021 Oct 22;12:737919. doi: 10.3389/fpls.2021.737919. eCollection 2021.
8
An Efficient, Nonphylogenetic Method for Detecting Genes Sharing Evolutionary Signals in Phylogenomic Data Sets.一种在基因组数据集检测具有进化信号的基因的高效、非系统发育方法。
Genome Biol Evol. 2021 Sep 1;13(9). doi: 10.1093/gbe/evab187.
9
DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics.DiscoSnp-RAD:用于RAD-Seq群体基因组学的小变异体从头检测
PeerJ. 2020 Jun 10;8:e9291. doi: 10.7717/peerj.9291. eCollection 2020.
10
A systematic pipeline for classifying bacterial operons reveals the evolutionary landscape of biofilm machineries.一个用于分类细菌操纵子的系统管道揭示了生物膜机械装置的进化景观。
PLoS Comput Biol. 2020 Apr 1;16(4):e1007721. doi: 10.1371/journal.pcbi.1007721. eCollection 2020 Apr.
系统发育似然库。
Syst Biol. 2015 Mar;64(2):356-62. doi: 10.1093/sysbio/syu084. Epub 2014 Oct 30.
4
Restriction site-associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference.基于限制位点相关 DNA 测序的种群遗传推断中的基因分型错误估计和从头组装优化。
Mol Ecol Resour. 2015 Jan;15(1):28-41. doi: 10.1111/1755-0998.12291. Epub 2014 Jul 3.
5
kdetrees: Non-parametric estimation of phylogenetic tree distributions.KD树:系统发育树分布的非参数估计
Bioinformatics. 2014 Aug 15;30(16):2280-7. doi: 10.1093/bioinformatics/btu258. Epub 2014 Apr 24.
6
PyRAD: assembly of de novo RADseq loci for phylogenetic analyses.PyRAD:用于系统发育分析的从头RADseq位点组装
Bioinformatics. 2014 Jul 1;30(13):1844-9. doi: 10.1093/bioinformatics/btu121. Epub 2014 Mar 5.
7
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.RAxML 版本 8:用于系统发育分析和大型系统发育后分析的工具。
Bioinformatics. 2014 May 1;30(9):1312-3. doi: 10.1093/bioinformatics/btu033. Epub 2014 Jan 21.
8
Computing the joint distribution of tree shape and tree distance for gene tree inference and recombination detection.计算用于基因树推断和重组检测的树形与树距离的联合分布。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct;10(5):1263-74. doi: 10.1109/TCBB.2013.109.
9
PhyBin: binning trees by topology.PhyBin:基于拓扑结构对树进行分类。
PeerJ. 2013 Oct 22;1:e187. doi: 10.7717/peerj.187. eCollection 2013.
10
Bio++: efficient extensible libraries and tools for computational molecular evolution.Bio++:用于计算分子进化的高效可扩展库和工具。
Mol Biol Evol. 2013 Aug;30(8):1745-50. doi: 10.1093/molbev/mst097. Epub 2013 May 21.