Suppr超能文献

基因直系同源推断的计算方法。

Computational methods for Gene Orthology inference.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

Brief Bioinform. 2011 Sep;12(5):379-91. doi: 10.1093/bib/bbr030. Epub 2011 Jun 19.

Abstract

Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple 'tree-like' mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.

摘要

准确推断直系同源基因是大多数比较基因组学研究的前提,对于新基因组的功能注释也很重要。直系同源基因集的识别通常涉及系统发育树分析、基于序列保守性的启发式算法、同线性分析,或这些方法的某种组合。最直接的基于树的方法通常依赖于将单个基因树与物种树进行比较。一旦准确构建了这两棵树,就可以根据同源物是通过物种形成而不是基因复制在最近的起源点相关的定义,直接识别直系同源物。虽然从理论上讲,这种方法非常适合确定直系同源物,但对于大量基因和基因组来说,构建系统发育树的计算成本很高,而且它们通常包含错误,尤其是在较大的进化距离上。此外,在许多生物体中,特别是原核生物和病毒,进化似乎并没有遵循简单的“树状”模式,这使得传统的树整合方法不适用。其他启发式方法将最接近的同源对或一组基因识别为一组生物体中的可能直系同源物。这些方法比基于树的方法更快、更容易自动化,图形理论算法的高效实现使数千个基因组的比较成为可能。这两种方法的比较表明,尽管存在概念上的差异,但它们产生了相似的直系同源物集,尤其是在较短的进化距离上。同线性也有助于鉴定直系同源物。通常,基于树的、基于序列相似性的和基于同线性的方法可以组合成灵活的混合方法。

相似文献

1
Computational methods for Gene Orthology inference.基因直系同源推断的计算方法。
Brief Bioinform. 2011 Sep;12(5):379-91. doi: 10.1093/bib/bbr030. Epub 2011 Jun 19.
3
Inferring Orthology and Paralogy.推断直系同源和旁系同源关系。
Methods Mol Biol. 2019;1910:149-175. doi: 10.1007/978-1-4939-9074-0_5.
4
Integrating Sequence Evolution into Probabilistic Orthology Analysis.将序列进化纳入概率同源分析。
Syst Biol. 2015 Nov;64(6):969-82. doi: 10.1093/sysbio/syv044. Epub 2015 Jun 30.
8
Inferring orthology and paralogy.推断直系同源和旁系同源关系。
Methods Mol Biol. 2012;855:259-79. doi: 10.1007/978-1-61779-582-4_9.

引用本文的文献

本文引用的文献

6
Harvesting evolutionary signals in a forest of prokaryotic gene trees.在原核基因树森林中提取进化信号。
Mol Biol Evol. 2011 Apr;28(4):1393-405. doi: 10.1093/molbev/msq323. Epub 2010 Dec 20.
9
OMA 2011: orthology inference among 1000 complete genomes.OMA 2011:1000个完整基因组间的直系同源推断
Nucleic Acids Res. 2011 Jan;39(Database issue):D289-94. doi: 10.1093/nar/gkq1238. Epub 2010 Nov 27.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验