使用功能基因组学数据对直系同源物鉴定方法进行基准测试。

Benchmarking ortholog identification methods using functional genomics data.

作者信息

Hulsen Tim, Huynen Martijn A, de Vlieg Jacob, Groenen Peter M A

机构信息

Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen, Toernooiveld 1, Nijmegen, 6500 GL, The Netherlands.

出版信息

Genome Biol. 2006;7(4):R31. doi: 10.1186/gb-2006-7-4-r31. Epub 2006 Apr 13.

DOI:10.1186/gb-2006-7-4-r31

PMID:16613613

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1557999/

Abstract

BACKGROUND

The transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in different species. However, it has been demonstrated that orthologs often reveal significant functional similarity. Therefore, the quality of the orthology prediction is an important factor in the transfer of functional annotations (and other related information). To identify protein pairs with the highest possible functional similarity, it is important to qualify ortholog identification methods.

RESULTS

To measure the similarity in function of proteins from different species we used functional genomics data, such as expression data and protein interaction data. We tested several of the most popular ortholog identification methods. In general, we observed a sensitivity/selectivity trade-off: the functional similarity scores per orthologous pair of sequences become higher when the number of proteins included in the ortholog groups decreases.

CONCLUSION

By combining the sensitivity and the selectivity into an overall score, we show that the InParanoid program is the best ortholog identification method in terms of identifying functionally equivalent proteins.

摘要

背景

将模式生物蛋白质的功能注释转移到人类蛋白质上是比较基因组学的主要应用之一。根据直系同源的操作定义，使用了各种方法来分析跨物种直系同源关系。通常，直系同源的定义被错误地解释为对跨物种功能等效蛋白质的预测，而实际上它仅定义了不同物种中一个基因的共同祖先的存在。然而，已经证明直系同源物通常显示出显著的功能相似性。因此，直系同源预测的质量是功能注释（和其他相关信息）转移中的一个重要因素。为了识别具有尽可能高功能相似性的蛋白质对，对直系同源识别方法进行评估很重要。

结果

为了测量不同物种蛋白质的功能相似性，我们使用了功能基因组学数据，如表达数据和蛋白质相互作用数据。我们测试了几种最流行的直系同源识别方法。一般来说，我们观察到了灵敏度/选择性的权衡：当直系同源组中包含的蛋白质数量减少时，每对直系同源序列的功能相似性得分会变高。

结论

通过将灵敏度和选择性结合为一个总体得分，我们表明，就识别功能等效蛋白质而言，InParanoid程序是最佳的直系同源识别方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfce/1557999/c337ddf00ebb/gb-2006-7-4-r31-1.jpg

相似文献

Benchmarking ortholog identification methods using functional genomics data.使用功能基因组学数据对直系同源物鉴定方法进行基准测试。

Genome Biol. 2006;7(4):R31. doi: 10.1186/gb-2006-7-4-r31. Epub 2006 Apr 13.

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.通过成对物种比较对直系同源基因和旁系同源基因进行自动聚类。

J Mol Biol. 2001 Dec 14;314(5):1041-52. doi: 10.1006/jmbi.2000.5197.

Gene-oriented ortholog database: a functional comparison platform for orthologous loci.基因导向的直系同源物数据库：一个用于直系同源物的功能比较平台。

Database (Oxford). 2010;2010:baq002. doi: 10.1093/database/baq002. Epub 2010 Feb 10.

Orthology confers intron position conservation.直系同源赋予内含子位置保守性。

BMC Genomics. 2010 Jul 2;11:412. doi: 10.1186/1471-2164-11-412.

Assignment of orthologous genes via genome rearrangement.通过基因组重排进行直系同源基因的分配。

IEEE/ACM Trans Comput Biol Bioinform. 2005 Oct-Dec;2(4):302-15. doi: 10.1109/TCBB.2005.48.

MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement.MSOAR 2.0：基于基因组重排的串联重复整合到直系同源物分配中。

BMC Bioinformatics. 2010 Jan 6;11:10. doi: 10.1186/1471-2105-11-10.

MultiMSOAR 2.0: an accurate tool to identify ortholog groups among multiple genomes.MultiMSOAR 2.0：一种用于在多个基因组中识别直系同源物的精确工具。

PLoS One. 2011;6(6):e20892. doi: 10.1371/journal.pone.0020892. Epub 2011 Jun 21.

WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning.虫洞：通过机器学习进行的新型最小差异直系同源物预测

PLoS Comput Biol. 2016 Nov 3;12(11):e1005182. doi: 10.1371/journal.pcbi.1005182. eCollection 2016 Nov.

The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data.直系同源假说无法通过当前的基因本体论进行检验，但得到了 RNA 测序数据的支持。

PLoS Comput Biol. 2012;8(11):e1002784. doi: 10.1371/journal.pcbi.1002784. Epub 2012 Nov 29.

HCOP: the HGNC comparison of orthology predictions search tool.HCOP：同源性预测搜索工具的HGNC比较

Mamm Genome. 2005 Nov;16(11):827-8. doi: 10.1007/s00335-005-0103-2. Epub 2005 Nov 11.

引用本文的文献

Manipulation of a New Non-model Insect Genome Using Targeted CRISPR-Era Approaches.使用靶向CRISPR时代方法对新的非模式昆虫基因组进行操作。

Methods Mol Biol. 2025;2935:335-384. doi: 10.1007/978-1-0716-4583-3_15.

DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models.DNA序列分析全景：对DNA序列分析任务类型、数据库、数据集、词嵌入方法和语言模型的全面综述。

Front Med (Lausanne). 2025 Apr 8;12:1503229. doi: 10.3389/fmed.2025.1503229. eCollection 2025.

A new chromosome-level genome assembly and annotation of Cryptosporidium meleagridis.微小隐孢子虫的新染色体水平基因组组装与注释

Sci Data. 2024 Dec 18;11(1):1388. doi: 10.1038/s41597-024-04235-7.

A new chromosome-level genome assembly and annotation of .一个新的染色体水平的基因组组装及注释……（原文不完整，翻译至此）

bioRxiv. 2024 Feb 17:2024.02.16.580748. doi: 10.1101/2024.02.16.580748.

Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Genes.基于异构网络特征的混合深度学习方法用于基因功能注释

Int J Mol Sci. 2021 Sep 16;22(18):10019. doi: 10.3390/ijms221810019.

Genome-Wide Analysis of Four Pathotypes of Wheat Rust Pathogen () Reveals Structural Variations and Diversifying Selection.小麦锈病病原菌四种致病型的全基因组分析揭示了结构变异和多样化选择。

J Fungi (Basel). 2021 Aug 27;7(9):701. doi: 10.3390/jof7090701.

KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases.KinOrtho：一种在生命之树中映射人类激酶直系同源物并阐明研究不足的激酶的方法。

BMC Bioinformatics. 2021 Sep 18;22(1):446. doi: 10.1186/s12859-021-04358-3.

Domestication Shapes the Community Structure and Functional Metagenomic Content of the Yak Fecal Microbiota.驯化塑造了牦牛粪便微生物群的群落结构和功能宏基因组内容。

Front Microbiol. 2021 Mar 31;12:594075. doi: 10.3389/fmicb.2021.594075. eCollection 2021.

Conserved Patterns in Developmental Processes and Phases, Rather than Genes, Unite the Highly Divergent Bilateria.发育过程和阶段中的保守模式，而非基因，将高度分化的两侧对称动物统一起来。

Life (Basel). 2020 Sep 6;10(9):182. doi: 10.3390/life10090182.

A workflow for generating multi-strain genome-scale metabolic models of prokaryotes.一种用于生成原核生物多菌株基因组规模代谢模型的工作流程。

Nat Protoc. 2020 Jan;15(1):1-14. doi: 10.1038/s41596-019-0254-3. Epub 2019 Dec 20.

本文引用的文献

OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups.OrthoMCL-DB：查询直系同源基因组的全面多物种集合。

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D363-8. doi: 10.1093/nar/gkj123.

HCOP: the HGNC comparison of orthology predictions search tool.HCOP：同源性预测搜索工具的HGNC比较

Mamm Genome. 2005 Nov;16(11):827-8. doi: 10.1007/s00335-005-0103-2. Epub 2005 Nov 11.

Expression divergence between duplicate genes.重复基因之间的表达差异

Trends Genet. 2005 Nov;21(11):602-7. doi: 10.1016/j.tig.2005.08.006. Epub 2005 Sep 2.

Structural divergence and distant relationships in proteins: evolution of the globins.蛋白质中的结构差异与远缘关系：珠蛋白的进化

Curr Opin Struct Biol. 2005 Jun;15(3):290-301. doi: 10.1016/j.sbi.2005.05.008.

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.NCBI参考序列（RefSeq）：一个经过整理的基因组、转录本和蛋白质的非冗余序列数据库。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. doi: 10.1093/nar/gki025.

Comparative genomics for reliable protein-function prediction from genomic data.用于从基因组数据进行可靠蛋白质功能预测的比较基因组学。

Trends Genet. 2004 Aug;20(8):340-4. doi: 10.1016/j.tig.2004.06.003.

Coevolution of gene expression among interacting proteins.相互作用蛋白质间基因表达的协同进化。

Proc Natl Acad Sci U S A. 2004 Jun 15;101(24):9033-8. doi: 10.1073/pnas.0402591101. Epub 2004 Jun 2.

The Ensembl automatic gene annotation system.Ensembl自动基因注释系统。

Genome Res. 2004 May;14(5):942-50. doi: 10.1101/gr.1858004.

Expression and function of conserved nuclear receptor genes in Caenorhabditis elegans.秀丽隐杆线虫中保守核受体基因的表达与功能

Dev Biol. 2004 Feb 15;266(2):399-416. doi: 10.1016/j.ydbio.2003.10.014.

Phylogenomic inference of protein molecular function: advances and challenges.蛋白质分子功能的系统发育基因组学推断：进展与挑战

Bioinformatics. 2004 Jan 22;20(2):170-9. doi: 10.1093/bioinformatics/bth021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用功能基因组学数据对直系同源物鉴定方法进行基准测试。

Benchmarking ortholog identification methods using functional genomics data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献