• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GIGA:基因组时代基因树推断的一种简单、高效的算法。

GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.

机构信息

Evolutionary Systems Biology Group, SRI International, Menlo Park, CA, USA.

出版信息

BMC Bioinformatics. 2010 Jun 9;11:312. doi: 10.1186/1471-2105-11-312.

DOI:10.1186/1471-2105-11-312
PMID:20534164
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2905364/
Abstract

BACKGROUND

Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost.

RESULTS

We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process.

CONCLUSIONS

GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.

摘要

背景

基因之间的系统发育关系不仅具有理论意义:通过在从细菌到果蝇和老鼠的众多模式生物中对其亲缘基因的实验工作,我们可以了解人类基因。然而,由于算法和生物学方面的诸多原因,最常用的用于重建基因树的计算算法可能不准确。除了基因序列数据之外,额外的信息已被证明可以提高重建的准确性,尽管计算成本很高。

结果

我们描述了一种简单、快速的推断基因系统发育的算法,该算法利用了基因组时代之前不可用的信息:即跨越生命之树的可靠物种树,以及物种基因组中完整基因组成的知识。该算法称为 GIGA,它使用序列的距离矩阵表示来进行聚类,使用简单的规则来合并这种基因组时代的信息。GIGA 将基因树概念化为由同源子树(仅包含物种形成事件)组成,通过其他进化事件(如基因复制或水平基因转移)将它们连接在一起。GIGA 的一个重要创新是,在聚类过程的每一步,都根据创建它的进化事件来解释/重新解释树。值得注意的是,即使使用非常简单的距离度量(成对序列差异)并且在树构建过程中不对类群进行距离平均,GIGA 的性能也很好。

结论

GIGA 效率高,允许对非常大的基因家族进行系统发育重建,并大规模确定同源基因。它非常稳健,可以添加更多基因序列,不仅为现存基因,而且为其共同祖先创建稳定的标识符。我们将 GIGA 生成的树与 TreeFam 数据库中的树进行了比较,它们总体上非常相似,大多数差异可能是由于对齐质量差造成的。但是,一些剩余的差异是算法性的,可以通过 GIGA 倾向于更强调最小化基因复制和缺失事件来解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/b44b0fd5d49e/1471-2105-11-312-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/03fb13fad385/1471-2105-11-312-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/57695c6f686d/1471-2105-11-312-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/07942010c235/1471-2105-11-312-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/81a58c2a5c16/1471-2105-11-312-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/08e9e0856317/1471-2105-11-312-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/508caa8b1017/1471-2105-11-312-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/dc8b9c41b515/1471-2105-11-312-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/39990ae99942/1471-2105-11-312-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/d3f47628cda2/1471-2105-11-312-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/8309d8745a7f/1471-2105-11-312-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/b44b0fd5d49e/1471-2105-11-312-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/03fb13fad385/1471-2105-11-312-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/57695c6f686d/1471-2105-11-312-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/07942010c235/1471-2105-11-312-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/81a58c2a5c16/1471-2105-11-312-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/08e9e0856317/1471-2105-11-312-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/508caa8b1017/1471-2105-11-312-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/dc8b9c41b515/1471-2105-11-312-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/39990ae99942/1471-2105-11-312-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/d3f47628cda2/1471-2105-11-312-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/8309d8745a7f/1471-2105-11-312-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae40/2905364/b44b0fd5d49e/1471-2105-11-312-11.jpg

相似文献

1
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.GIGA:基因组时代基因树推断的一种简单、高效的算法。
BMC Bioinformatics. 2010 Jun 9;11:312. doi: 10.1186/1471-2105-11-312.
2
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
3
Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes.用于计算基因组进化简约进化情景、最后共同祖先以及原核生物进化中水平基因转移主导地位的算法。
BMC Evol Biol. 2003 Jan 6;3:2. doi: 10.1186/1471-2148-3-2.
4
Genome trees constructed using five different approaches suggest new major bacterial clades.使用五种不同方法构建的基因组树表明了新的主要细菌进化枝。
BMC Evol Biol. 2001 Oct 20;1:8. doi: 10.1186/1471-2148-1-8.
5
A simple algorithm to infer gene duplication and speciation events on a gene tree.一种推断基因树上基因复制和物种形成事件的简单算法。
Bioinformatics. 2001 Sep;17(9):821-8. doi: 10.1093/bioinformatics/17.9.821.
6
GATC: a genetic algorithm for gene tree construction under the Duplication-Transfer-Loss model of evolution.GATC:一种在进化的复制-转移-丢失模型下构建基因树的遗传算法。
BMC Genomics. 2018 May 9;19(Suppl 2):102. doi: 10.1186/s12864-018-4455-x.
7
A new fast method for inferring multiple consensus trees using k-medoids.一种利用 k -medoids 快速推断多个一致树的新方法。
BMC Evol Biol. 2018 Apr 5;18(1):48. doi: 10.1186/s12862-018-1163-8.
8
Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models.在复制-缺失和深度合并成本模型下进行高效的基因组规模系统发育分析。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S42. doi: 10.1186/1471-2105-11-S1-S42.
9
Invariant transformers of Robinson and Foulds distance matrices for Convolutional Neural Network.不变的 Robinson 和 Foulds 距离矩阵变换用于卷积神经网络。
J Bioinform Comput Biol. 2022 Aug;20(4):2250012. doi: 10.1142/S0219720022500123. Epub 2022 Jul 6.
10
Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees.使用标记合并树在存在基因重复、丢失和深度合并的情况下进行最简约的协调。
Genome Res. 2014 Mar;24(3):475-86. doi: 10.1101/gr.161968.113. Epub 2013 Dec 5.

引用本文的文献

1
A compendium of human gene functions derived from evolutionary modelling.基于进化建模得出的人类基因功能概要。
Nature. 2025 Apr;640(8057):146-154. doi: 10.1038/s41586-025-08592-0. Epub 2025 Feb 26.
2
Stage-specific modulation of multinucleation, fusion, and resorption by the long non-coding RNA DLEU1 and miR-16 in human primary osteoclasts.长链非编码 RNA DLEU1 和 miR-16 对人原代破骨细胞多核化、融合和吸收的阶段特异性调节。
Cell Death Dis. 2024 Oct 11;15(10):741. doi: 10.1038/s41419-024-06983-1.
3
PANTHER: Making genome-scale phylogenetics accessible to all.

本文引用的文献

1
PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium.PANTHER 版本 7:改进了系统发育树、直系同源物,以及与基因本体论联盟的合作。
Nucleic Acids Res. 2010 Jan;38(Database issue):D204-10. doi: 10.1093/nar/gkp1019. Epub 2009 Dec 16.
2
The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species.基因本体论参考基因组计划:跨物种功能注释的统一框架。
PLoS Comput Biol. 2009 Jul;5(7):e1000431. doi: 10.1371/journal.pcbi.1000431. Epub 2009 Jul 3.
3
The tree versus the forest: the fungal tree of life and the topological diversity within the yeast phylome.
PANTHER:让所有人大开眼界的基因组系统发生学。
Protein Sci. 2022 Jan;31(1):8-22. doi: 10.1002/pro.4218. Epub 2021 Nov 25.
4
Bayesian parameter estimation for automatic annotation of gene functions using observational data and phylogenetic trees.贝叶斯参数估计在使用观测数据和系统发生树自动注释基因功能中的应用。
PLoS Comput Biol. 2021 Feb 18;17(2):e1007948. doi: 10.1371/journal.pcbi.1007948. eCollection 2021 Feb.
5
PhyloGenes: An online phylogenetics and functional genomics resource for plant gene function inference.PhyloGenes:一个用于植物基因功能推断的在线系统发育学和功能基因组学资源。
Plant Direct. 2020 Dec 30;4(12):e00293. doi: 10.1002/pld3.293. eCollection 2020 Dec.
6
Unilateral L4-dorsal root ganglion stimulation evokes pain relief in chronic neuropathic postsurgical knee pain and changes of inflammatory markers: part II whole transcriptome profiling.单侧 L4 背根神经节刺激可缓解慢性神经病理性术后膝关节痛,并改变炎症标志物:第二部分全转录组谱分析。
J Transl Med. 2019 Jun 19;17(1):205. doi: 10.1186/s12967-019-1952-x.
7
PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools.PANTHER 版本 14:更多基因组、一个新的 PANTHER GO-slim 和富集分析工具的改进。
Nucleic Acids Res. 2019 Jan 8;47(D1):D419-D426. doi: 10.1093/nar/gky1038.
8
Ancestral Genomes: a resource for reconstructed ancestral genes and genomes across the tree of life.祖先基因组:一个跨越生命之树重建祖先基因和基因组的资源。
Nucleic Acids Res. 2019 Jan 8;47(D1):D271-D279. doi: 10.1093/nar/gky1009.
9
GATC: a genetic algorithm for gene tree construction under the Duplication-Transfer-Loss model of evolution.GATC:一种在进化的复制-转移-丢失模型下构建基因树的遗传算法。
BMC Genomics. 2018 May 9;19(Suppl 2):102. doi: 10.1186/s12864-018-4455-x.
10
The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database.MEROPS 数据库收录了 2017 年的蛋白水解酶、其底物和抑制剂,以及与 PANTHER 数据库中肽酶的比较。
Nucleic Acids Res. 2018 Jan 4;46(D1):D624-D632. doi: 10.1093/nar/gkx1134.
树木与森林:真菌的生命之树与酵母系统发育组内的拓扑多样性。
PLoS One. 2009;4(2):e4357. doi: 10.1371/journal.pone.0004357. Epub 2009 Feb 3.
4
Phylogenetic and functional assessment of orthologs inference projects and methods.直系同源物推断项目和方法的系统发育与功能评估。
PLoS Comput Biol. 2009 Jan;5(1):e1000262. doi: 10.1371/journal.pcbi.1000262. Epub 2009 Jan 16.
5
nGASP--the nematode genome annotation assessment project.线虫基因组注释评估项目(nGASP)
BMC Bioinformatics. 2008 Dec 19;9:549. doi: 10.1186/1471-2105-9-549.
6
EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.EnsemblCompara基因树:脊椎动物中完整的、可识别基因复制的系统发育树。
Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24.
7
Phylogenetic inference using whole genomes.使用全基因组进行系统发育推断。
Annu Rev Genomics Hum Genet. 2008;9:217-31. doi: 10.1146/annurev.genom.9.081307.164407.
8
Model-based prediction of sequence alignment quality.基于模型的序列比对质量预测。
Bioinformatics. 2008 Oct 1;24(19):2165-71. doi: 10.1093/bioinformatics/btn414. Epub 2008 Aug 4.
9
Inferring trees.推断树。
Methods Mol Biol. 2008;452:287-309. doi: 10.1007/978-1-60327-159-2_14.
10
TreeFam: 2008 Update.树家族:2008年更新版
Nucleic Acids Res. 2008 Jan;36(Database issue):D735-40. doi: 10.1093/nar/gkm1005. Epub 2007 Dec 1.