MSOAR 2.0：基于基因组重排的串联重复整合到直系同源物分配中。

MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement.

机构信息

Department of Computer Science, University of California, Riverside, CA 92521, USA.

出版信息

BMC Bioinformatics. 2010 Jan 6;11:10. doi: 10.1186/1471-2105-11-10.

DOI:10.1186/1471-2105-11-10

PMID:20053291

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2821317/

Abstract

BACKGROUND

Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model). However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model).

RESULTS

In this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs), using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs), and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments, it is actually better than that of InParanoid in the simulation tests.

CONCLUSIONS

Our preliminary experimental results demonstrate that MSOAR 2.0 is a highly accurate tool for one-to-one ortholog assignment between closely related genomes. The software is available to the public for free and included as online supplementary material.

摘要

背景

直系同源物的分配是比较基因组学中的一个关键和基本问题，因为直系同源物被认为是不同物种中的功能对应物，可以用来从其他物种推断一个物种的分子功能。MSOAR 是一种最近开发的高通量系统，可在基因组范围内为密切相关的物种分配一对一的直系同源物。它试图根据基因组重排和基因复制事件来重建输入基因组的进化历史。它假设基因复制事件将一个复制的基因随机插入到感兴趣的基因组中（即随机复制模型）。然而，在实践中，生物学家认为基因通常通过串联复制进行复制，其中一个复制的基因位于原始拷贝的旁边（即串联复制模型）。

结果

在本文中，我们开发了 MSOAR 2.0，这是一种用于一对一直系同源物分配的改进系统。对于一对输入基因组，系统首先关注每个基因组中的串联重复基因，并尝试使用简单的系统发育树协调方法来识别其中那些在物种形成后复制的基因（即所谓的同基因）。对于每个这样的串联重复同基因集，除了一个基因之外，所有基因都将从相关基因组中删除（因为它们不可能出现在任何一对一的直系同源物对中），然后调用 MSOAR。使用模拟和真实数据实验，我们表明 MSOAR 2.0 能够实现比 MSOAR 更好的灵敏度和特异性。与著名的全基因组直系同源物分配工具 InParanoid、Ensembl 直系同源物数据库以及来自著名的全基因组多重比对程序 MultiZ 的同源信息相比，MSOAR 2.0 显示出最高的灵敏度。尽管在真实数据实验中，MSOAR 2.0 的特异性略低于 InParanoid，但实际上它在模拟测试中的特异性优于 InParanoid。

结论

我们的初步实验结果表明，MSOAR 2.0 是一种用于密切相关基因组之间一对一直系同源物分配的高度准确工具。该软件可供公众免费使用，并包含在在线补充材料中。

相似文献

MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement.

BMC Bioinformatics. 2010 Jan 6;11:10. doi: 10.1186/1471-2105-11-10.

MSOAR: a high-throughput ortholog assignment system based on genome rearrangement.

J Comput Biol. 2007 Nov;14(9):1160-75. doi: 10.1089/cmb.2007.0048.

MultiMSOAR 2.0: an accurate tool to identify ortholog groups among multiple genomes.

PLoS One. 2011;6(6):e20892. doi: 10.1371/journal.pone.0020892. Epub 2011 Jun 21.

Clustering of main orthologs for multiple genomes.

Comput Syst Bioinformatics Conf. 2007;6:195-201.

Clustering of main orthologs for multiple genomes.

J Bioinform Comput Biol. 2008 Jun;6(3):573-84. doi: 10.1142/s0219720008003540.

Assignment of orthologous genes via genome rearrangement.

IEEE/ACM Trans Comput Biol Bioinform. 2005 Oct-Dec;2(4):302-15. doi: 10.1109/TCBB.2005.48.

InParanoid 6: eukaryotic ortholog clusters with inparalogs.

Nucleic Acids Res. 2008 Jan;36(Database issue):D263-6. doi: 10.1093/nar/gkm1020. Epub 2007 Nov 30.

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

J Mol Biol. 2001 Dec 14;314(5):1041-52. doi: 10.1006/jmbi.2000.5197.

Genome rearrangements with duplications.

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S27. doi: 10.1186/1471-2105-11-S1-S27.

Correlation between sequence conservation and the genomic context after gene duplication.

Nucleic Acids Res. 2005 Oct 27;33(19):6164-71. doi: 10.1093/nar/gki913. Print 2005.

引用本文的文献

An Exact and Fast SAT Formulation for the DCJ Distance.

bioRxiv. 2024 Nov 8:2024.11.05.622153. doi: 10.1101/2024.11.05.622153.

Divergent evolutionary trajectories following speciation in two ectoparasitic honey bee mites.

Commun Biol. 2019 Oct 1;2:357. doi: 10.1038/s42003-019-0606-0. eCollection 2019.

Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers.

BMC Bioinformatics. 2018 May 3;19(1):166. doi: 10.1186/s12859-018-2148-8.

Orthonome - a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes.

BMC Genomics. 2017 Aug 31;18(1):673. doi: 10.1186/s12864-017-4079-6.

Systems analysis of cis-regulatory motifs in C4 photosynthesis genes using maize and rice leaf transcriptomic data during a process of de-etiolation.

J Exp Bot. 2016 Sep;67(17):5105-17. doi: 10.1093/jxb/erw275. Epub 2016 Jul 19.

Inferring Orthologs: Open Questions and Perspectives.

Genomics Insights. 2016 Feb 25;9:17-28. doi: 10.4137/GEI.S37925. eCollection 2016.

An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species.

Biomed Res Int. 2015;2015:748681. doi: 10.1155/2015/748681. Epub 2015 Oct 29.

Multi-walled carbon nanotube-induced gene expression in vitro: concordance with in vivo studies.

Toxicology. 2015 Feb 3;328:66-74. doi: 10.1016/j.tox.2014.12.012. Epub 2014 Dec 13.

Analysis of micro-rearrangements in 25 eukaryotic species pairs by SyntenyMapper.

PLoS One. 2014 Nov 6;9(11):e112341. doi: 10.1371/journal.pone.0112341. eCollection 2014.

Comparative analyses of C₄ and C₃ photosynthesis in developing leaves of maize and rice.

Nat Biotechnol. 2014 Nov;32(11):1158-65. doi: 10.1038/nbt.3019. Epub 2014 Oct 12.

本文引用的文献

An empirical test of the midpoint rooting method.

Biol J Linn Soc Lond. 2007 Dec;92(4):669-674. doi: 10.1111/j.1095-8312.2007.00864.x. Epub 2007 Dec 7.

Overview and comparison of ortholog databases.

Drug Discov Today Technol. 2006 Summer;3(2):137-43. doi: 10.1016/j.ddtec.2006.06.002.

EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.

Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24.

The quest for orthologs: finding the corresponding gene across genomes.

Trends Genet. 2008 Nov;24(11):539-51. doi: 10.1016/j.tig.2008.08.009. Epub 2008 Sep 24.

Tandemly arrayed genes in vertebrate genomes.

Comp Funct Genomics. 2008;2008:545269. doi: 10.1155/2008/545269.

Gene family evolution by duplication, speciation, and loss.

J Comput Biol. 2008 Oct;15(8):1043-62. doi: 10.1089/cmb.2008.0054.

Mapping and sequencing of structural variation from eight human genomes.

Nature. 2008 May 1;453(7191):56-64. doi: 10.1038/nature06862.

InParanoid 6: eukaryotic ortholog clusters with inparalogs.

Nucleic Acids Res. 2008 Jan;36(Database issue):D263-6. doi: 10.1093/nar/gkm1020. Epub 2007 Nov 30.

MSOAR: a high-throughput ortholog assignment system based on genome rearrangement.

J Comput Biol. 2007 Nov;14(9):1160-75. doi: 10.1089/cmb.2007.0048.

Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes.

Genome Res. 2007 Dec;17(12):1932-42. doi: 10.1101/gr.7105007. Epub 2007 Nov 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MSOAR 2.0：基于基因组重排的串联重复整合到直系同源物分配中。

MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement.

机构信息

Department of Computer Science, University of California, Riverside, CA 92521, USA.

出版信息

BMC Bioinformatics. 2010 Jan 6;11:10. doi: 10.1186/1471-2105-11-10.

DOI:10.1186/1471-2105-11-10

PMID:20053291

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2821317/

Abstract

BACKGROUND

RESULTS

CONCLUSIONS

摘要

MSOAR 2.0：基于基因组重排的串联重复整合到直系同源物分配中。

MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

MSOAR 2.0：基于基因组重排的串联重复整合到直系同源物分配中。

MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献